|Next release:||Version 0.8.0|
|Previous release:||Version 0.6.0|
Added ability to read/write Jay files.
A Frame can now have a key column (or columns).
Frame can now be created from a list/dict of numpy arrays.
Frames can now be constructed via the keyword-args list of columns:
Frame constructor now accepts a list of tuples, which it treats as rows when creating the frame.
Frame can now be constructed from a list of named tuples, which will be treated as rows and field names will be used as column names.
Frame can now be constructed from a list of dictionaries, where each item in the list represents a single row.
Frame can now be created from a datetime64 numpy array. #1274
Key column(s) are saved when the frame is saved into a Jay file.
The error message when selecting a column that does not exist in the Frame now refers to similarly-named columns in that Frame, if there are any. At most 3 possible columns are reported, and they are ordered from most likely to least likely. #1253
.copy()can now be used to create a copy of the Frame.
.cbind()now accepts a list of frames as the argument.
Frame can now be sorted by multiple columns.
Added HTML rendering of Frames inside a Jupyter notebook.
Added support for multi-column keys.
In Jupyter notebook columns now have visual indicators of their types. The logical types are color-coded, and the size of each element is given by the number of dots. #1428
namesargument in Frame constructor can no longer be a string – use a list or tuple of strings instead.
.rename()removed – the
.namessetter can be used instead.
.resize()removed – same functionality is available via assigning to
Frame()now creates a
[0 x 0]Frame instead of
[0 x 1].
.cbind()removed (was deprecated). Instead of
.cbind()no longer returns anything (previously it returned self, but this was confusing w.r.t whether it modifies the target, or returns a modified copy).
default format for
.save()is now “jay”.
namesparameter in Frame constructor is now checked for correctness.
Fixed saving view frames to csv.
xis a Frame, then
y = dt.Frame(x)now creates a shallow copy instead of a copy-by-reference.
Fixed rare crash when converting a string column from pandas DataFrame, when that column contains many non-ASCII characters.
Fixed crash when saving a frame with many boolean columns into CSV. #1278
Fixed calculation of min/max values in internal rowindex upon row resizing.
.sort()with no arguments no longer produces an error.
Fixed writing to disk of columns > 2GB in size. #1387
Fixed crash when sorting by multiple columns and the first column was of string type. #1401
DT[i, j] evaluation¶
Filters can now be used together with groupby expressions.
Implemented integer division
Implemented logical operators “and”
|for eager evaluator.
A Frame can now be naturally-joined with a keyed Frame.
Columns can now be updated within join expressions.
Groupby calculations are now parallel.
Added ability to join Frames on multiple columns.
The performance of explicit element selection improved by a factor of 200x.
DT[i, j]now returns a python scalar value if
iis integer, and
jis integer/string. This is referred to as “explicit element selection”. In the unlikely scenario when a single element needs to be returned as a frame, you can always write
DT[col]syntax has been deprecated and now emits a warning. This will be converted to an error in version 0.8.0, and will be interpreted as row selector in 0.9.0.
Fixed bug with applying a cast expression to a view column.
Fixed memory leak in groupby operations.
Fixed crash when sorting string columns containins NA values.
Fixed crash when applying a filter to a 0-rows frame.
f-column-selectors should no longer throw errors and produce only unique ids when stringified. #1241
f-expressions now do not crash when reused with a different Frame.
g-columns can be properly selected in a join. #1352
dt.math.abs()to find the absolute value of elements in a frame.
fread()’s verbose output now includes time spent opening the input file.
split_into_nhot()to split a string column into fragments and then convert them into a set of indicator variables (“n-hot encode”).
Added ability to convert object columns into strings.
improved handling of Excel files by
sheet name can now be used as a path component in the file name, causing only that particular sheet to be parsed;
further, a cell range can be specified as a path component after the sheet name, forcing fread to consider only the provided cell range;
fread can now handle the situation when a spreadsheet has multiple separate tables in the same sheet. They will now be detected automatically and returned to the user as separate Frame objects (the name of each frame will contain the sheet name and cell range from where the data was extracted).
Building no longer requires an LLVM distribution.
Upgraded dependency version for
typesentry, the previous version was not compatible with Python 3.7.
import datatablenow takes only 0.13s, down from 0.6s.
Fixed bug in
cbind()where the first Frame in the list was ignored.
Fixed occasional memory errors caused by a lack of available mmap handles.
fread()with QR bump occurring out-of-sample.
fread()no longer wastes time reading the full input, if
max_nrowsoption is used.
Fixed bug where
max_nrowsparameter was sometimes causing a segfault.
fread()performance bug caused by memory-mapped file being accidentally copied into RAM.
Fixed rare crash in fread when resizing the number of rows.
This release was created with the help of 7 people who contributed code and documentation, and 12 more people who submitted bug reports and feature requests.
Code & documentation contributors:
- Pasha Stetsenko
- Oleksiy Kononenko
- Michal Raška
- Michal Malohlava
- Nishant Kalonia
- Suman Khanal
- Jan Gamec