Version 0.8.0

Version 0.8.0
Release date:2019-02-05
Next release:Version 0.9.0
Previous release:Version 0.7.0
Wheels
MacOSpython-3.5
python-3.6
python-3.7
SDistsources

Frame

  • Method .to_tuples() converts a Frame into a list of tuples, each tuple representing a single row. #1439

  • Method .to_dict() converts the Frame into a dictionary where the keys are column names and values are lists of elements in each column. #1439

  • Methods .head(n) and .tail(n) added, returning the first/last n rows of the Frame respectively. #1307

  • Frame objects can now be pickled using the standard Python pickle interface. #1444 This also has an added benefit of reducing the potential for a deadlock when using the multiprocessing module.

  • Added function dt.repeat(frame, n) that creates a new Frame by row-binding n copies of the frame. #1459

  • Added functions log and log10 for computing the natural and base-10 logarithms of a column. #1558

  • Sorting functionality is now integrated into the DT[i, j, ...] call via the function sort(). If sorting is specified alongside a groupby, the values will be sorted within each group. #1531

  • The primary datatable expression DT[i, j, ...] is now evaluated entirely in C++, improving performance and reliability.

  • The column selector j in DT[i, j] can now be a list/iterator of booleans. This list should have length DT.ncols, and the entries in this list will indicate whether to select the corresponding column of the Frame or not. #1503 This can be used to implement a simple column filter, for example:

    del DT[:, (name.endswith("_tmp") for name in DT.names)]
  • A slice-valued i expression can now be combined with a dt.by() operator in DT[i, j, by]. The result is that the slice i is applied to each group produced by by(), before the j is evaluated. #1585

  • Implemented sorting in reverse direction, via sort(-col), where col is any regular column selector such as f.A or f[column]. The - sign is symbolic, no actual negation occurs. As such, this works even for string columns. #792

  • .copy() now retains the frame’s key, if any. #1443

  • The equality operators == / != can now be applied to string columns too. #1491

  • Partial column update (i.e. expression of the form DT[i, j] = R) now works for string columns as well. #1523

  • Improved the performance of setting .nrows. Now if the frame has multiple columns, a view will be created.

  • Fixed rendering of “view” Frames in a Jupyter notebook. This bug caused the frame to display wrong data when viewed in a notebook. #1448

  • Fixed crash when an int-column i selector is applied to a Frame which already had another row filter applied. #1437

  • When a g.-column is used but there is no join frame, an appropriate error message is now emitted. #1481

  • .replace() now works correctly when the replacement list contains +inf or 1.7976931348623157e+308. #1510

  • .replace() now throws an error if called with 0 or 1 argument. #1525

  • Fixed crash when viewing a frame obtained by resizing a 0-row frame. #1527

  • Function count() now returns correct result within the DT[i, j] expression with non-trivial row filter i. #1316

  • Fixed groupby when it is applied to a Frame with view columns. #1542

  • When replacing an empty set of columns, the replacement frame can now be also empty (i.e. have shape [0 x 0]). #1544

  • Fixed join results when join is applied to a view frame. #1540

  • Fixed .replace() in view string columns. #1549

  • A 0-row integer column can now be used as i in DT[i, j]. #1551

  • A string column produced from a partial join now materializes correctly. #1556

  • Fixed incorrect result during “true division” of integer columns, when one of the values was negative and the other positive. #1562

  • .to_csv() no longer crashes on Unix when writing an empty frame. #1565

  • Fixed crash when the RHS of assignment DT[i, j] = R was a list of expressions. #1539

  • Fixed crash when an empty dt.by() condition was used in DT[i, j, by]. #1572

  • Expression DT[:, :, by(...)] no longer produces duplicates of columns used in the by-clause. #1576

  • In certain circumstances mixing computed and plain columns under groupby caused incorrect result. #1578

  • Fixed an internal error which was occurring when multiple row filters were applied to a Frame in sequence. #1592

  • Fixed rbinding of frames if one of them was a slice with a negative step. #1594

  • Fixed invalid result when cbinding several 0-row frames. #1604

  • Setting .nrows now always pads the frame with NAs, even if the frame has only 1 row. Previously changing .nrows on a 1-row frame caused its value to be repeated. Use .repeat() in order to expand the frame by copying its values.

  • When no columns are selected in DT[i, j], the returned frame will now have the same number of rows as if at least 1 column was selected. Previously an empty [0 x 0] frame was returned.

  • Assigning a value to a column DT[:, 'A'] = x will attempt to preserve the column’s stype; or if not possible, the column will be upcasted within its logical type.

  • It is no longer possible to assign a value of an incompatible logical type to an existing column. For example, an assignment DT[:, 'A'] = 3 is now legal only if column A is of integer or real type, but will raise an exception if A is a boolean or string.

  • .rbind() method no longer has a return value. The method always updated the frame in-place, so it was confusing to both update in-place and return the original frame. #1610

  • dt.min() / dt.max() over an empty or all-NA column now returns None instead of +Inf / -Inf respectively. #1624

  • Frame methods .topython(), .topandas() and .tonumpy() are now deprecated, to be removed in version 0.9.0. Please use .to_list(), .to_pandas() and .to_numpy() instead.

  • Calling a frame object DT(rows=i, select=j, groupby=g, join=z, sort=s) is now deprecated. Use the expression DT[i, j, by(g), join(z), sort(s)] instead, where symbols dt.by(), dt.join() and dt.sort() can all be imported from the datatable namespace. #1579

  • Single-item Frame selectors are now prohibited: DT[col] is an error. In the future this expression will be interpreted as a row selector instead. Update: in version 0.9.0 this “single-selector” syntax was reinstated, but only for integer and string selectors.

  • Internally, we now allow each Column in a Frame to have its own separate RowIndex. This will improve the performance, especially in join/cbind operations. Applications that use the datatable’s C API may need to be updated to account for this. #1188

General

  • Module datatable now exposes C API, to allow other C/C++ libraries interact with datatable Frames natively. #1469 See “datatable/include/datatable.h” for the description of the API functions.

  • Installation from source distribution now works as expected. #1451

  • Function dt.split_into_nhot() now works correctly with view Frames. #1507

  • The build process on MacOS now ensures that the libomp.dylib is properly referenced via @rpath. This prevents installation problems caused by the dynamic dependencies referenced by their absolute paths which are not valid outside of the build machine. #1559

  • Fixed a crash that occurred with the latest pandas 0.24.0. #1600

  • datatable now uses integration with Codacy to keep track of code quality and potential errors.

Models

  • Added ability to train and fit an FTRL-Proximal (Follow The Regularized Leader) online learning algorithm on a data frame. #1389 The implementation is multi-threaded and has high performance.

  • FTRL algorithm now works correctly with view frames. #1502

Contributors

This release was created with the help of 6 people who contributed code and documentation, and 12 more people who submitted bug reports and feature requests.

Code & documentation contributors:

Issues contributors: