Version 0.7.0

Version 0.7.0
Release date:2018-11-16
Next release:Version 0.8.0
Previous release:Version 0.6.0
Wheels
MacOSpython-3.5
python-3.6
python-3.7
SDistsources

Frame

  • Added ability to read/write Jay files.

  • A Frame can now have a key column (or columns).

  • Frame can now be created from a list/dict of numpy arrays.

  • Frames can now be constructed via the keyword-args list of columns:

    Frame(A=..., B=...)
    
  • Frame constructor now accepts a list of tuples, which it treats as rows when creating the frame.

  • Frame can now be constructed from a list of named tuples, which will be treated as rows and field names will be used as column names.

  • Frame can now be constructed from a list of dictionaries, where each item in the list represents a single row.

  • Frame can now be created from a datetime64 numpy array. #1274

  • Key column(s) are saved when the frame is saved into a Jay file.

  • The error message when selecting a column that does not exist in the Frame now refers to similarly-named columns in that Frame, if there are any. At most 3 possible columns are reported, and they are ordered from most likely to least likely. #1253

  • Frame.copy() can now be used to create a copy of the Frame.

  • Frame.cbind() now accepts a list of frames as the argument.

  • Frame can now be sorted by multiple columns.

  • Implemented Frame.replace() function. #1319

  • Added HTML rendering of Frames inside a Jupyter notebook.

  • Added support for multi-column keys.

  • In Jupyter notebook columns now have visual indicators of their types. The logical types are color-coded, and the size of each element is given by the number of dots. #1428

  • The names argument in Frame constructor can no longer be a string – use a list or tuple of strings instead.

  • Method Frame.rename() removed – the name setter can be used instead.

  • Frame.resize() removed – same functionality is available via assigning to Frame.nrows.

  • The expression Frame([]) now creates a [0 x 0] Frame instead of [0 x 1].

  • Parameter inplace in Frame.cbind() removed (was deprecated). Instead of inplace=False use dt.cbind(...).

  • Frame.cbind() no longer returns anything (previously it returned self, but this was confusing w.r.t whether it modifies the target, or returns a modified copy).

  • default format for Frame.save() is now “jay”.

  • names parameter in Frame constructor is now checked for correctness.

  • Fixed saving view frames to csv.

  • If x is a Frame, then y = dt.Frame(x) now creates a shallow copy instead of a copy-by-reference.

  • Fixed rare crash when converting a string column from pandas DataFrame, when that column contains many non-ASCII characters.

  • Fixed crash when saving a frame with many boolean columns into CSV. #1278

  • Fixed incorrect stypes/ltypes property after calling cbind().

  • Fixed calculation of min/max values in internal rowindex upon row resizing.

  • Frame.sort() with no arguments no longer produces an error.

  • Fixed writing to disk of columns > 2GB in size. #1387

  • Fixed crash when sorting by multiple columns and the first column was of string type. #1401

DT[i, j] evaluation

  • Filters can now be used together with groupby expressions.

  • Implemented integer division // and modulo % operators.

  • Implemented logical operators “and” & and “or” | for eager evaluator.

  • A Frame can now be naturally-joined with a keyed Frame.

  • Columns can now be updated within join expressions.

  • Groupby calculations are now parallel.

  • Added ability to join Frames on multiple columns.

  • The performance of explicit element selection improved by a factor of 200x.

  • DT[i, j] now returns a python scalar value if i is integer, and j is integer/string. This is referred to as “explicit element selection”. In the unlikely scenario when a single element needs to be returned as a frame, you can always write DT[i:i+1, j] or DT[[i], j].

  • DT[col] syntax has been deprecated and now emits a warning. This will be converted to an error in version 0.8.0, and will be interpreted as row selector in 0.9.0.

  • Fixed bug with applying a cast expression to a view column.

  • Fixed memory leak in groupby operations.

  • Fixed crash when sorting string columns containins NA values.

  • Fixed crash when applying a filter to a 0-rows frame.

  • f-column-selectors should no longer throw errors and produce only unique ids when stringified. #1241

  • f-expressions now do not crash when reused with a different Frame.

  • g-columns can be properly selected in a join. #1352

General

  • Added function abs() to find the absolute value of elements in a frame.

  • fread()’s verbose output now includes time spent opening the input file.

  • new function split_into_nhot() to split a string column into fragments and then convert them into a set of indicator variables (“n-hot encode”).

  • Added ability to convert object columns into strings.

  • improved handling of Excel files by fread():

    • sheet name can now be used as a path component in the file name, causing only that particular sheet to be parsed;

    • further, a cell range can be specified as a path component after the sheet name, forcing fread to consider only the provided cell range;

    • fread can now handle the situation when a spreadsheet has multiple separate tables in the same sheet. They will now be detected automatically and returned to the user as separate Frame objects (the name of each frame will contain the sheet name and cell range from where the data was extracted).

  • Added set-theoretic functions: union(), intersect(), setdiff() and symdiff().

  • Building no longer requires an LLVM distribution.

  • Upgraded dependency version for typesentry, the previous version was not compatible with Python 3.7.

  • import datatable now takes only 0.13s, down from 0.6s.

  • Fixed bug in cbind() where the first Frame in the list was ignored.

  • Fixed occasional memory errors caused by a lack of available mmap handles.

  • bug in fread() with QR bump occurring out-of-sample.

  • fread() no longer wastes time reading the full input, if max_nrows option is used.

  • Fixed bug where max_nrows parameter was sometimes causing a segfault.

  • Fixed fread() performance bug caused by memory-mapped file being accidentally copied into RAM.

  • Fixed rare crash in fread when resizing the number of rows.

Contributors

This release was created with the help of 7 people who contributed code and documentation, and 12 more people who submitted bug reports and feature requests.

Code & documentation contributors:

Issues contributors: