Version 0.8.0¶

Version 0.8.0
Release date:	2019-02-05
Next release:	Version 0.9.0
Previous release:	Version 0.7.0
Wheels
MacOS	python-3.5
	python-3.6
	python-3.7
SDist	sources

Frame¶

Method .to_tuples() converts a Frame into a list of tuples, each tuple representing a single row. #1439
Method .to_dict() converts the Frame into a dictionary where the keys are column names and values are lists of elements in each column. #1439
Methods .head(n) and .tail(n) added, returning the first/last n rows of the Frame respectively. #1307
Frame objects can now be pickled using the standard Python pickle interface. #1444 This also has an added benefit of reducing the potential for a deadlock when using the multiprocessing module.
Added function dt.repeat(frame, n) that creates a new Frame by row-binding n copies of the frame. #1459
Added functions log and log10 for computing the natural and base-10 logarithms of a column. #1558
Sorting functionality is now integrated into the DT[i, j, ...] call via the function sort(). If sorting is specified alongside a groupby, the values will be sorted within each group. #1531
The primary datatable expression DT[i, j, ...] is now evaluated entirely in C++, improving performance and reliability.
The column selector j in DT[i, j] can now be a list/iterator of booleans. This list should have length DT.ncols, and the entries in this list will indicate whether to select the corresponding column of the Frame or not. #1503 This can be used to implement a simple column filter, for example:

del DT[:, (name.endswith("_tmp") for name in DT.names)]
A slice-valued i expression can now be combined with a dt.by() operator in DT[i, j, by]. The result is that the slice i is applied to each group produced by by(), before the j is evaluated. #1585
Implemented sorting in reverse direction, via sort(-col), where col is any regular column selector such as f.A or f[column]. The - sign is symbolic, no actual negation occurs. As such, this works even for string columns. #792
.copy() now retains the frame’s key, if any. #1443
The equality operators == / != can now be applied to string columns too. #1491
Partial column update (i.e. expression of the form DT[i, j] = R) now works for string columns as well. #1523
Improved the performance of setting .nrows. Now if the frame has multiple columns, a view will be created.
Fixed rendering of “view” Frames in a Jupyter notebook. This bug caused the frame to display wrong data when viewed in a notebook. #1448
Fixed crash when an int-column i selector is applied to a Frame which already had another row filter applied. #1437
When a g.-column is used but there is no join frame, an appropriate error message is now emitted. #1481
.replace() now works correctly when the replacement list contains +inf or 1.7976931348623157e+308. #1510
.replace() now throws an error if called with 0 or 1 argument. #1525
Fixed crash when viewing a frame obtained by resizing a 0-row frame. #1527
Function count() now returns correct result within the DT[i, j] expression with non-trivial row filter i. #1316
Fixed groupby when it is applied to a Frame with view columns. #1542
When replacing an empty set of columns, the replacement frame can now be also empty (i.e. have shape [0 x 0]). #1544
Fixed join results when join is applied to a view frame. #1540
Fixed .replace() in view string columns. #1549
A 0-row integer column can now be used as i in DT[i, j]. #1551
A string column produced from a partial join now materializes correctly. #1556
Fixed incorrect result during “true division” of integer columns, when one of the values was negative and the other positive. #1562
.to_csv() no longer crashes on Unix when writing an empty frame. #1565
Fixed crash when the RHS of assignment DT[i, j] = R was a list of expressions. #1539
Fixed crash when an empty dt.by() condition was used in DT[i, j, by]. #1572
Expression DT[:, :, by(...)] no longer produces duplicates of columns used in the by-clause. #1576
In certain circumstances mixing computed and plain columns under groupby caused incorrect result. #1578
Fixed an internal error which was occurring when multiple row filters were applied to a Frame in sequence. #1592
Fixed rbinding of frames if one of them was a slice with a negative step. #1594
Fixed invalid result when cbinding several 0-row frames. #1604
Setting .nrows now always pads the frame with NAs, even if the frame has only 1 row. Previously changing .nrows on a 1-row frame caused its value to be repeated. Use .repeat() in order to expand the frame by copying its values.
When no columns are selected in DT[i, j], the returned frame will now have the same number of rows as if at least 1 column was selected. Previously an empty [0 x 0] frame was returned.
Assigning a value to a column DT[:, 'A'] = x will attempt to preserve the column’s stype; or if not possible, the column will be upcasted within its logical type.
It is no longer possible to assign a value of an incompatible logical type to an existing column. For example, an assignment DT[:, 'A'] = 3 is now legal only if column A is of integer or real type, but will raise an exception if A is a boolean or string.
.rbind() method no longer has a return value. The method always updated the frame in-place, so it was confusing to both update in-place and return the original frame. #1610
dt.min() / dt.max() over an empty or all-NA column now returns None instead of +Inf / -Inf respectively. #1624
Frame methods .topython(), .topandas() and .tonumpy() are now deprecated, to be removed in version 0.9.0. Please use .to_list(), .to_pandas() and .to_numpy() instead.
Calling a frame object DT(rows=i, select=j, groupby=g, join=z, sort=s) is now deprecated. Use the expression DT[i, j, by(g), join(z), sort(s)] instead, where symbols dt.by(), dt.join() and dt.sort() can all be imported from the datatable namespace. #1579
Single-item Frame selectors are now prohibited: DT[col] is an error. In the future this expression will be interpreted as a row selector instead. Update: in version 0.9.0 this “single-selector” syntax was reinstated, but only for integer and string selectors.
Internally, we now allow each Column in a Frame to have its own separate RowIndex. This will improve the performance, especially in join/cbind operations. Applications that use the datatable’s C API may need to be updated to account for this. #1188

General¶

Module datatable now exposes C API, to allow other C/C++ libraries interact with datatable Frames natively. #1469 See “datatable/include/datatable.h” for the description of the API functions.
Installation from source distribution now works as expected. #1451
Function dt.split_into_nhot() now works correctly with view Frames. #1507
The build process on MacOS now ensures that the libomp.dylib is properly referenced via @rpath. This prevents installation problems caused by the dynamic dependencies referenced by their absolute paths which are not valid outside of the build machine. #1559
Fixed a crash that occurred with the latest pandas 0.24.0. #1600
datatable now uses integration with Codacy to keep track of code quality and potential errors.

Models¶

Added ability to train and fit an FTRL-Proximal (Follow The Regularized Leader) online learning algorithm on a data frame. #1389 The implementation is multi-threaded and has high performance.
FTRL algorithm now works correctly with view frames. #1502

Contributors¶

This release was created with the help of 6 people who contributed code and documentation, and 12 more people who submitted bug reports and feature requests.

Code & documentation contributors:

Issues contributors: