Version 1.0.0¶

Version 1.0.0
Release date:	2021-07-01
Next release:	Version 1.1.0
Previous release:	Version 0.11.1
Wheels
MacOS	python-3.6
	python-3.7
	python-3.8
	python-3.9
Linux ppc64le	python-3.6
	python-3.7
	python-3.8
	python-3.9
Linux x86-64	python-3.6
	python-3.7
	python-3.8
	python-3.9
Windows	python-3.6
	python-3.7
	python-3.8
	python-3.9
SDist	sources

Frame¶

Property .types returns the list of dt.Type objects for each column of the frame. These types are a generalization of previous stypes, and will eventually replace them.

Property .type returns the common dt.Type for all columns of the frame (provided that it exists).
New column type dt.Type.date32 added, which can store a calendar date #2858:

import datetime DT = dt.Frame([datetime.date(2021, 2, 17)])
New column type dt.Type.time64 added, which cat store timestamps within a certain time zone (in a single column all times must be in the same time zone) #2911:

import datetime DT = dt.Frame([datetime.datetime(2021, 3, 17, 9, 0, 0)])
A Frame can now be constructed from an Arrow table:

DT = dt.Frame(arrow_table)

This process uses data Arrow C Data interface, and therefore does not entail data copying.
A Frame can now be converted into an Arrow table, using the .to_arrow() method:

pa_table = DT.to_arrow()
.meta property now provides access to frame’s meta information, if any, as set by datatable functions/methods or by the user.
Class dt.FExpr now has method .sum(), which behaves exactly as the base level function dt.sum().
Class dt.FExpr now has methods .max(), .min(), .mean(), and .median(), which behave exactly as the equivalent base level functions dt.max(), dt.min(), dt.mean(), and dt.median() respectively.
Class dt.FExpr now has methods for all the row functions (dt.rowsum(), dt.rowall(), etc).
Class dt.FExpr now has methods .sd(), .count(), .first(), .last(), and .shift(), which behave exactly as the equivalent base level functions dt.sd(), dt.count(), dt.first(), dt.last(), and dt.shift() respectively.
Added stats functions .skew() and .kurt(). #1084
The row selector i in the delete operation del DT[i, :] can now be an unsorted list. The list can also contain duplicate values.
When a keyed Frame is converted into a pandas DataFrame, the key columns will now become the DataFrame’s index, not regulat columns. #2883
When a Frame is shown in a python console, it will now display the stype of each column, as a second line under the column names. #2810
Parameter types= in Frame’s constructor can now accept arguments of class dt.Type, and also pyarrow’s types. #2986
A Frame can now be created properly from a list of numpy bool objects. #2762
Frames with 1000000+ columns will now be correctly stored in Jay. #2876
Passing an invalid value to the column= argument of the .to_numpy() method will no longer result in a crash.
Frame terminal display no longer overflows terminal’s width if it contains strings with special characters. #2844
Sorting in reverse order now works correctly in the presence of a groupby. #2838
Creating a Frame from a list of np.str_ objects now works correctly. #3026
Converting a frame with incompatible types into a numpy array will now raise an error (instead of auto-promoting to object type). However, if the user explicitly requests promotion into the object type then there won’t be any error.
Rbinding frames with columns of incompatible types will now raise an error instead of auto-promoting to string type. #2790
When a frame is converted into a numpy array of floatinng type, then we will produce a regular np.ndarray instead of a masked array.
Properties .stypes and .ltypes are now considered deprecated and will be removed in a future version. Currently they continue to work as before, however.
When a frame is created from a list of python objects of disparate types, we will no longer create a column of type object – instead, a dt.exceptions.TypeError will be thrown. An object column can still be created by an explicit request via the stype= parameter in the constructor.
Parameter stypes= in Frame constructor was renamed into types=, and similarly stype= into type=. The old parameter names are still recognized, but no longer documented.
Internal functions dt.internal.compiler_version() and dt.internal.in_debug_mode() removed and replaced with flags .compiler and .build_mode in dt.build_info. Function dt.interenal.regex_supported() removed entirely – datatable will now always have support for regular expressions. #2636

FExpr¶

Function ifelse() can now accept more than 3 arguments, implementing a chained-if functionality. This is equivalent to CASE WHEN in SQL. #2656
New function as_type() that allows casting columns into a different stype. This function is an alternative to the already existing functionality of using the stype itself as a cast function.
Function dt.time.ymd() can create date32 columns out of individual year/month/day parts.
Functions dt.time.year(), dt.time.month() and dt.time.day() for retrieving individual components of a date.
New function dt.time.day_of_week() for computing the day of week (Monday to Sunday) for the given date column.
New function dt.str.slice() for applying a slice to a string column. #1667
Function sort() can now accept argument na_positon=. It can take three values: "first" (default), "last" and "remove". The values describe the position assigned to NAs after sorting. #793
Function cut() can now accept argument bins=, that is a list or a tuple of frames containing edges of the binning intervals. #2819
When a whole column is updated within a DT[i, j, by()] call, the stype/ltype of that column us now allowed to change. #2685
Fix a crash that occurred when using median() on virtual columns of type ArrayView64. #2802
Fix poor performance when selecting columns from a frame with a large number of columns (10k+). #2873
Numpy scalars can now be used in expressions. #3027
f-expressions now accepts a list/tuple of column names/column positions/column types in the j section. #2797
Method dt.FExpr.len() has been deprecated and replaced with a function dt.str.len(). #3016
Method dt.FExpr.re_match() has been deprecated and replaced with a function dt.re.match(). #3017

fread¶

Fix an error when reading a file with uneven number of fields and having Windows-style newlines. #2681
Fread no longer throws an exception when the list of column types passed to parameter columns= contains str64. #2704
Fread no longer improperly detects separators within quoted strings. #922

Models¶

Implemented a linear model with stochastic gradient descent learning. It supports binomial and multinomial regressions, as well as regression for continous targets. #2871
FTRL now supports dt.Type.date32 and dt.Type.time64 feature types. #3007

General¶

Datatable no longer supports Python 3.5, because Python 3.5 itself has reached its end of life on 2020-09-13 and will no longer be supported. If you are still using Python 3.5, please consider upgrading. #2642
Removed function dt.open(), which was deprecated since version 0.10.0. #3018
Fixed a memory leak when creating a large number of datatable objects. #2701
Datatable can now be properly installed from a source distribution. #2846

Contributors¶

This release was created with the help of 6 people who contributed code and documentation, and 17 more people who submitted bug reports and feature requests.

Code & documentation contributors:

Issues contributors: