Version 0.7.0¶
Version 0.7.0 | |
---|---|
Release date: | 2018-11-16 |
Next release: | Version 0.8.0 |
Previous release: | Version 0.6.0 |
Wheels | |
MacOS | python-3.5 |
python-3.6 | |
python-3.7 | |
SDist | sources |
Frame¶
Added ability to read/write Jay files.
A Frame can now have a key column (or columns).
Frame can now be created from a list/dict of numpy arrays.
Frames can now be constructed via the keyword-args list of columns:
Frame(A=..., B=...)
Frame constructor now accepts a list of tuples, which it treats as rows when creating the frame.
Frame can now be constructed from a list of named tuples, which will be treated as rows and field names will be used as column names.
Frame can now be constructed from a list of dictionaries, where each item in the list represents a single row.
Frame can now be created from a datetime64 numpy array. #1274
Key column(s) are saved when the frame is saved into a Jay file.
The error message when selecting a column that does not exist in the Frame now refers to similarly-named columns in that Frame, if there are any. At most 3 possible columns are reported, and they are ordered from most likely to least likely. #1253
Frame.copy()
can now be used to create a copy of the Frame.Frame.cbind()
now accepts a list of frames as the argument.Frame can now be sorted by multiple columns.
Implemented
Frame.replace()
function. #1319Added HTML rendering of Frames inside a Jupyter notebook.
Added support for multi-column keys.
In Jupyter notebook columns now have visual indicators of their types. The logical types are color-coded, and the size of each element is given by the number of dots. #1428
The
names
argument in Frame constructor can no longer be a string – use a list or tuple of strings instead.Method
Frame.rename()
removed – thename
setter can be used instead.Frame.resize()
removed – same functionality is available via assigning toFrame.nrows
.The expression
Frame([])
now creates a[0 x 0]
Frame instead of[0 x 1]
.Parameter
inplace
inFrame.cbind()
removed (was deprecated). Instead ofinplace=False
usedt.cbind(...)
.Frame.cbind()
no longer returns anything (previously it returned self, but this was confusing w.r.t whether it modifies the target, or returns a modified copy).default format for
Frame.save()
is now “jay”.names
parameter in Frame constructor is now checked for correctness.Fixed saving view frames to csv.
If
x
is a Frame, theny = dt.Frame(x)
now creates a shallow copy instead of a copy-by-reference.Fixed rare crash when converting a string column from pandas DataFrame, when that column contains many non-ASCII characters.
Fixed crash when saving a frame with many boolean columns into CSV. #1278
Fixed incorrect
stypes
/ltypes
property after callingcbind()
.Fixed calculation of min/max values in internal rowindex upon row resizing.
Frame.sort()
with no arguments no longer produces an error.Fixed writing to disk of columns > 2GB in size. #1387
Fixed crash when sorting by multiple columns and the first column was of string type. #1401
DT[i, j] evaluation¶
Filters can now be used together with groupby expressions.
Implemented integer division
//
and modulo%
operators.Implemented logical operators “and”
&
and “or”|
for eager evaluator.A Frame can now be naturally-joined with a keyed Frame.
Columns can now be updated within join expressions.
Groupby calculations are now parallel.
Added ability to join Frames on multiple columns.
The performance of explicit element selection improved by a factor of 200x.
DT[i, j]
now returns a python scalar value ifi
is integer, andj
is integer/string. This is referred to as “explicit element selection”. In the unlikely scenario when a single element needs to be returned as a frame, you can always writeDT[i:i+1, j]
orDT[[i], j]
.DT[col]
syntax has been deprecated and now emits a warning. This will be converted to an error in version 0.8.0, and will be interpreted as row selector in 0.9.0.Fixed bug with applying a cast expression to a view column.
Fixed memory leak in groupby operations.
Fixed crash when sorting string columns containins NA values.
Fixed crash when applying a filter to a 0-rows frame.
f-column-selectors should no longer throw errors and produce only unique ids when stringified. #1241
f-expressions now do not crash when reused with a different Frame.
g-columns can be properly selected in a join. #1352
General¶
Added function
abs()
to find the absolute value of elements in a frame.fread()
’s verbose output now includes time spent opening the input file.new function
split_into_nhot()
to split a string column into fragments and then convert them into a set of indicator variables (“n-hot encode”).Added ability to convert object columns into strings.
improved handling of Excel files by
fread()
:sheet name can now be used as a path component in the file name, causing only that particular sheet to be parsed;
further, a cell range can be specified as a path component after the sheet name, forcing fread to consider only the provided cell range;
fread can now handle the situation when a spreadsheet has multiple separate tables in the same sheet. They will now be detected automatically and returned to the user as separate Frame objects (the name of each frame will contain the sheet name and cell range from where the data was extracted).
Added set-theoretic functions:
union()
,intersect()
,setdiff()
andsymdiff()
.Building no longer requires an LLVM distribution.
Upgraded dependency version for
typesentry
, the previous version was not compatible with Python 3.7.import datatable
now takes only 0.13s, down from 0.6s.Fixed bug in
cbind()
where the first Frame in the list was ignored.Fixed occasional memory errors caused by a lack of available mmap handles.
bug in
fread()
with QR bump occurring out-of-sample.fread()
no longer wastes time reading the full input, ifmax_nrows
option is used.Fixed bug where
max_nrows
parameter was sometimes causing a segfault.Fixed
fread()
performance bug caused by memory-mapped file being accidentally copied into RAM.Fixed rare crash in fread when resizing the number of rows.
Contributors¶
This release was created with the help of 7 people who contributed code and documentation, and 12 more people who submitted bug reports and feature requests.
Code & documentation contributors:
- Pasha Stetsenko
- Oleksiy Kononenko
- Michal Raška
- Michal Malohlava
- Nishant Kalonia
- Suman Khanal
- Jan Gamec
Issues contributors: