Version 0.9.0¶
Version 0.9.0 | |
---|---|
Release date: | 2019-07-25 |
Next release: | Version 0.10.0 |
Previous release: | Version 0.8.0 |
Wheels | |
MacOS | python-3.5 |
python-3.6 | |
python-3.7 | |
Linux x86-64 | python-3.5 |
python-3.6 | |
python-3.7 | |
SDist | sources |
Frame¶
Method
Frame.len()
can be applied to a string column to obtain the lengths of strings in each row.Method
re_match()
applies to a string column, and produces boolean indicator whether each value matches the regular expressionre
or not. The method matches the entire string, not just the beginning. Thus, it most closely resembles Python functionre.fullmatch()
.Frame.__str__()
now returns a string containing the preview of the frame’s data. This allows datatable frames to be used withprint()
.New function
median()
can be used to compute median of a certain column or expression, either per group or for the entire Frame. #1530Frame
class is now defined fully in C++, improving code robustness and performance. The propertyFrame.internal
was removed, as it no longer represents anything. Certain internal properties ofFrame
can be accessed via functions declared in thedt.internal.
module.Frame.rbind()
can now also accept a list or tuple of frames (previously only a vararg sequence was allowed).Frame can now be treated as an iterable over the columns. Thus, a Frame object can now be used in a for-loop, producing its individual columns.
A Frame can now be treated as a mapping; in particular both
dict(frame)
and**frame
are now valid.Single-column frames can be be used as sources for Frame construction.
Added parameter
quoting=
to methodFrame.to_csv()
. The accepted values are the 4 constants from the standardcsv
module:csv.QUOTE_MINIMAL
(default),csv.QUOTE_ALL
,csv.QUOTE_NONNUMERIC
andcsv.QUOTE_NONE
.Frame.to_csv()
now quotes fields containing single-quote mark ('
).Added parameter
compression=
to methodFrame.to_csv()
, with possibility to request gzip compression for the output file. By default the compression method will be inferred from the file name..to_numpy()
now returns a numpymasked_array
if the frame has any NA values. #1619A Frame will no longer be shown in “interactive” mode in console by default. The previous behavior can be restored with
dt.options.display.interactive = True
. Alternatively, you can explore a Frame interactively usingframe.view(True)
.Improved performance of type-casting a view column: now the code avoids materializing the column before performing the cast.
Fixed crash in certain circumstances when a key was applied after a groupby. #1639
A keyed frame will now be rendered correctly when viewing it in python console via
Frame.view()
. #1672str32
column can no longer overflow during the.replace()
operation, or when converting from python, numpy or pandas, etc. In all these cases we will now transparently create astr64
column instead. #1694The reported frame size (
sys.getsizeof(DT)
) is now more accurate; in particular the content of string columns is no longer ignored. #1697Type casting into
str32
no longer produces an error if the resulting column is larger than 2GB. Now astr64
column will be returned instead. #1695Fixed memory leak during computation of a generic
DT[i, j]
expression. Another memory leak was during generation of string columns, now also fixed. #1705Pandas “boolean column with NAs” (of dtype
object
) now converts into datatablebool8
column when pandas DataFrame is converted into a datatable Frame. #1730Fixed conversion to numpy of a view Frame which contains NAs. #1738
Fixed issue with mis-aligned frame headers in IPython, caused by IPython inserting
Out[X]:
in front of the rendered Frame display. #1793Improved rendering of Frames in terminals with white background: we no longer use
'bright_white'
color for emphasis, only'bold'
. #1793Fixed crash when a new column was created via partial assignment, i.e.
DT[i, "new_col"] = expr
. #1800Fixed memory leaks/crashes when materializing an object column. #1805
Fixed creating a Frame from a pandas DataFrame that has duplicate column names. #1816
Fixed a UnicodeDecodeError that could be thrown when viewing a Frame with unicode characters in Jupyter notebook. The error only manifested for strings that were longer than 50 bytes in length. #1825
Fixed crash when
Frame.colindex()
was used without any arguments, now this properly raises an exception. #1834Fixed possible crash when writing to disk that doesn’t have enough free space on it. #1837
Fixed invalid Frame being created when reading a large string column (str64) with fread, and the column contains NA values.
Fixed crash that occurred when sorting by multiple columns, and the first column is of low cardinality. #1857
Fixed display of NA values produced during a join, when a Frame was displayed in Jupyter Lab. #1872
Fixed a crash when replacing values in a str64 column. #1890
cbind()
no longer throws an error when passed a generator producing temporary frames. #1905Fixed comparison of string columns vs. value
None
. #1912Fixed a crash when trying to select individual cells from a joined Frame, for the cells that were un-matched during the join. #1917
Fixed a crash when writing a joined frame into CSV. #1919
Fixed a crash when writing into CSV string view columns, especially of str64 type. #1921
Removed deprecated Frame methods
.topython()
,.topandas()
,.tonumpy()
, andFrame.__call__()
.Syntax
DT[col]
has been restored (was previously deprecated in 0.7.0), however it works only for col an integer or a string. Support for slices may be added in the future, or not: there is a potential to confuseDT[a:b]
for a row selection. A column slice may still be selected via the i-j selectorDT[:, a:b]
.The
nthreads=
parameter inFrame.to_csv()
was removed. If needed, please set the global optiondt.options.nthreads
.Frame method
.scalar()
is now deprecated and will be removed in release 0.10.0. Please useframe[0, 0]
instead.Frame method
.append()
is now deprecated and will be removed in release 0.10.0. Please use.rbind()
instead.Frame method
.save()
was renamed into.to_jay()
(for consistency with other.to_*()
methods). The old name is still usable, but marked as deprecated and will be removed in 0.10.0.
General¶
Added method
dt.options.describe()
, which will print the available options together with their values and descriptions.Added
dt.options.context(option=value)
, which can be used in a with- statement to temporarily change the value of one or more options, and then go back to their original values at the end of the with-block.Added options
fread.log.escape_unicode
(controls treatment of unicode characters in fread’s verbose log); anddisplay.use_colors
(allows to turn on/off colored output in the console).Some long-running operations in datatable will now show a progress bar. Its behavior can be controlled via
dt.options.progress
set of options.Added an internal function
dt.internal.compiler_version()
which reports the compiler version used for compiling the core_datatable
library.New
datatable.math
module is a library of various mathematical functions that can be applied to datatable Frames. The set of functions is close to what is available in the standard pythonmath
module. See documentation for more details.New module
datatable.sphinxext.dtframe_directive
, which can be used as a plugin for Sphinx. This module adds directive.. dtframe
that allows to easily include a Frame display in an .rst document.datatable no longer uses OpenMP for parallelism. Instead, we use our own thread pool to perform multi-threaded computations #1736.
dt.options
now helps the user when they make a typo: if an option with a certain name does not exist, the error message will suggest the correct spelling.Fixed crash upon exiting from a python terminal, if the user ever called function
frame_column_rowindex().type
. #1703datatable
can now be safely used withmultiprocessing
, or other modules that perform fork-without-exec.#1758 The child process will spawn its own thread pool that will have the same number of threads as the parent. Adjustdt.options.nthreads
in the child process(es) if different number of threads is required.The interactive mode is no longer improperly turned on in IPython. #1789
Models¶
Added function
dt.models.kfold()
to prepare indices for k-fold splitting. This function will returnnsplits
pairs of row selectors such that when these selectors are applied to annrows
-rows frame, that frame will be split into train and test part according to the K-fold splitting scheme.Added function
dt.models.kfold_random()
, which is similar todt.models.kfold()
, except that the assignment of rows into folds is randomized instead of being deterministic.Parameter
progress_fn
in functiondt.models.aggregate()
is removed. In its place you can set the global optiondt.options.progress.callback
.Added early stopping support to FTRL algo, that can now do binomial and multinomial classification for categorical targets, as well as regression for continuous targets.
Fixed FTRL model not resuming properly after unpickling. #1846
Contributors¶
This release was created with the help of 5 people who contributed code and documentation, and 16 more people who submitted bug reports and feature requests.
Code & documentation contributors:
Issues contributors: