Version 0.8.0¶
Version 0.8.0 | |
---|---|
Release date: | 2019-02-05 |
Next release: | Version 0.9.0 |
Previous release: | Version 0.7.0 |
Wheels | |
MacOS | python-3.5 |
python-3.6 | |
python-3.7 | |
SDist | sources |
Frame¶
Method
.to_tuples()
converts a Frame into a list of tuples, each tuple representing a single row. #1439Method
.to_dict()
converts the Frame into a dictionary where the keys are column names and values are lists of elements in each column. #1439Methods
.head(n)
and.tail(n)
added, returning the first/lastn
rows of the Frame respectively. #1307Frame objects can now be pickled using the standard Python
pickle
interface. #1444 This also has an added benefit of reducing the potential for a deadlock when using themultiprocessing
module.Added function
dt.repeat(frame, n)
that creates a new Frame by row-bindingn
copies of theframe
. #1459Added functions
log
andlog10
for computing the natural and base-10 logarithms of a column. #1558Sorting functionality is now integrated into the
DT[i, j, ...]
call via the functionsort()
. If sorting is specified alongside a groupby, the values will be sorted within each group. #1531The primary datatable expression
DT[i, j, ...]
is now evaluated entirely in C++, improving performance and reliability.The column selector
j
inDT[i, j]
can now be a list/iterator of booleans. This list should have lengthDT.ncols
, and the entries in this list will indicate whether to select the corresponding column of the Frame or not. #1503 This can be used to implement a simple column filter, for example:del DT[:, (name.endswith("_tmp") for name in DT.names)]
A slice-valued
i
expression can now be combined with adt.by()
operator inDT[i, j, by]
. The result is that the slicei
is applied to each group produced byby()
, before thej
is evaluated. #1585Implemented sorting in reverse direction, via
sort(-col)
, wherecol
is any regular column selector such asf.A
orf[column]
. The-
sign is symbolic, no actual negation occurs. As such, this works even for string columns. #792The equality operators
==
/!=
can now be applied to string columns too. #1491Partial column update (i.e. expression of the form
DT[i, j] = R
) now works for string columns as well. #1523Improved the performance of setting
.nrows
. Now if the frame has multiple columns, a view will be created.Fixed rendering of “view” Frames in a Jupyter notebook. This bug caused the frame to display wrong data when viewed in a notebook. #1448
Fixed crash when an int-column
i
selector is applied to a Frame which already had another row filter applied. #1437When a
g.
-column is used but there is no join frame, an appropriate error message is now emitted. #1481.replace()
now works correctly when the replacement list contains+inf
or1.7976931348623157e+308
. #1510.replace()
now throws an error if called with 0 or 1 argument. #1525Fixed crash when viewing a frame obtained by resizing a 0-row frame. #1527
Function
count()
now returns correct result within theDT[i, j]
expression with non-trivial row filteri
. #1316Fixed groupby when it is applied to a Frame with view columns. #1542
When replacing an empty set of columns, the replacement frame can now be also empty (i.e. have shape
[0 x 0]
). #1544Fixed join results when join is applied to a view frame. #1540
Fixed
.replace()
in view string columns. #1549A 0-row integer column can now be used as
i
inDT[i, j]
. #1551A string column produced from a partial join now materializes correctly. #1556
Fixed incorrect result during “true division” of integer columns, when one of the values was negative and the other positive. #1562
.to_csv()
no longer crashes on Unix when writing an empty frame. #1565Fixed crash when the RHS of assignment
DT[i, j] = R
was a list of expressions. #1539Fixed crash when an empty
dt.by()
condition was used inDT[i, j, by]
. #1572Expression
DT[:, :, by(...)]
no longer produces duplicates of columns used in the by-clause. #1576In certain circumstances mixing computed and plain columns under groupby caused incorrect result. #1578
Fixed an internal error which was occurring when multiple row filters were applied to a Frame in sequence. #1592
Fixed rbinding of frames if one of them was a slice with a negative step. #1594
Fixed invalid result when cbinding several 0-row frames. #1604
Setting
.nrows
now always pads the frame with NAs, even if the frame has only 1 row. Previously changing.nrows
on a 1-row frame caused its value to be repeated. Use.repeat()
in order to expand the frame by copying its values.When no columns are selected in
DT[i, j]
, the returned frame will now have the same number of rows as if at least 1 column was selected. Previously an empty[0 x 0]
frame was returned.Assigning a value to a column
DT[:, 'A'] = x
will attempt to preserve the column’s stype; or if not possible, the column will be upcasted within its logical type.It is no longer possible to assign a value of an incompatible logical type to an existing column. For example, an assignment
DT[:, 'A'] = 3
is now legal only if column A is of integer or real type, but will raise an exception if A is a boolean or string..rbind()
method no longer has a return value. The method always updated the frame in-place, so it was confusing to both update in-place and return the original frame. #1610dt.min()
/dt.max()
over an empty or all-NA column now returnsNone
instead of+Inf
/-Inf
respectively. #1624Frame methods
.topython()
,.topandas()
and.tonumpy()
are now deprecated, to be removed in version 0.9.0. Please use.to_list()
,.to_pandas()
and.to_numpy()
instead.Calling a frame object
DT(rows=i, select=j, groupby=g, join=z, sort=s)
is now deprecated. Use the expressionDT[i, j, by(g), join(z), sort(s)]
instead, where symbolsdt.by()
,dt.join()
anddt.sort()
can all be imported from thedatatable
namespace. #1579Single-item Frame selectors are now prohibited:
DT[col]
is an error. In the future this expression will be interpreted as a row selector instead. Update: in version 0.9.0 this “single-selector” syntax was reinstated, but only for integer and string selectors.Internally, we now allow each Column in a Frame to have its own separate RowIndex. This will improve the performance, especially in join/cbind operations. Applications that use the
datatable
’s C API may need to be updated to account for this. #1188
General¶
Module
datatable
now exposes C API, to allow other C/C++ libraries interact with datatable Frames natively. #1469 See “datatable/include/datatable.h” for the description of the API functions.Installation from source distribution now works as expected. #1451
Function
dt.split_into_nhot()
now works correctly with view Frames. #1507The build process on MacOS now ensures that the
libomp.dylib
is properly referenced via@rpath
. This prevents installation problems caused by the dynamic dependencies referenced by their absolute paths which are not valid outside of the build machine. #1559Fixed a crash that occurred with the latest
pandas
0.24.0. #1600datatable
now uses integration with Codacy to keep track of code quality and potential errors.
Models¶
Contributors¶
This release was created with the help of 6 people who contributed code and documentation, and 12 more people who submitted bug reports and feature requests.
Code & documentation contributors:
Issues contributors: