Version 1.1.0¶

Version 1.1.0
Previous release:	Version 1.0.0

Frame¶

Parameter force=True in method .rbind() and function dt.rbind() will now allow combining columns of incompatible types. #3062
Frames with columns of type obj64 can now be saved into CSV. The values in the object column will be stringified upon saving. #3064
.replace() now supports numpy scalars. #3164
.to_numpy() now has an option to control memory layout of the resulting numpy array. #3275
column types returned by the method .sum() are now consistent with the ones returned by the function dt.sum(), i.e. int64 for void, boolean and integer columns; float32 for float32 columns; float64 for float64 columns. #2904
.to_csv() now has an option sep to control the field separator. #3337
Void columns can now be used with dt.sort() and dt.by(). In addition, datatable will now skip sorting any column that it knows contains constant values. #3088 #3104 #3108 #3109
Saving a frame with a void column into Jay no longer leads to a crash. #3074 #3099 #3246
Joining with void columns now works correctly. #3094
dt.sum() now works correctly when called on grouped column. #3110
Fixed dt.sum() behavior when called on iterables and frames. #3406
Fixed a crash which could have occurred when sorting very long identical or nearly identical strings. #3134
It is now possible to sort all columns according to boolean flags in the reverse list #3168
Fixed support for .max_column_width option when rendering frames in Jupyter notebooks. #3160
Fixed a crash which in rare situations happened in .to_csv() due to multithreading. #3176
Fixed a crash in .to_pandas() when called on keyed frames. #3224
Fixed .to_csv() to quote missing values when quoting=”all” is specified. #3340
Fixed groupby behavior on columns that contain missing values. #3331
Fixed creating frames from numpy arrays, that contain unicode strings. #3420
.to_numpy() will now create a correctly shaped array in the case of zero-column frames. #3427
In the case a zero-column frame is created from a list of tuples or dictionaries, the number of rows will be equal to the number of elements in that list. #3428
Converting a column of void type into pandas now produces a pandas object column filled with Nones. Converting such column back into datatable produces a void column again. #3063
When creating Frame from a list of values, a floating-point nan value will now be treated as None. In particular, nans can now be safely mixed with values of other types, and a list consisting of only nans will turn into a Column of type void. #3083
Converting string or object columns to numpy no longer produces a masked array. Instead, we create a regular object array, filled with Nones in place of NAs. Similarly, converting a string or object column to pandas creates a Series with None values (instead of nans as before) in place of NAs. #3083

FExpr¶

Class dt.FExpr now has method .as_type(), which behaves exactly as the equivalent base level function dt.as_type().
Added functions dt.rowargmin() and dt.rowargmin() to find the index of the largest and smallest values among columns of each row. #2998
Added reducer function dt.prod() and the corresponding .prod() method to calculate product of values in columns. #3140
Added function dt.cumsum(), as well as .cumsum() method, to calculate the cumulative sum of values per column. #3279
Added functions dt.cummin() and dt.cummax(), as well as the corresponding .cummin() and .cummax() methods, to calculate the cumulative minimum and maximum of values per column. #3279
Added function dt.cumprod(), as well as .cumprod() method, to calculate the cumulative product of values per column. #3279
Added function dt.cumcount() and dt.ngroup(), to return the row number and group number respectively. #3279
Added reducer functions dt.countna() and dt.nunique(). #2999
Class dt.FExpr now has method .nunique(), which behaves exactly as the equivalent base level function dt.nunique().
Class dt.FExpr now has method .countna(), which behaves exactly as the equivalent base level function dt.countna().
Added function dt.fillna(), as well as .fillna() method, to impute missing values. #3279
Class dt.FExpr now has method .alias(), to assign new names to the selected columns. #2684
Added function dt.categories(), as well as the corresponding .categories() method, to retrieve categories for categorical columns. #3367
Added function dt.codes(), as well as the corresponding .codes() method, to retrieve codes for categorical columns. #3371
Function dt.re.match() now supports case insensitive matching. #3216
Function dt.qcut() can now be used in a groupby context. #3165
dt.qcut() won’t segfault anymore when used as an i-filter. #3061
Fixed selection of time64 columns by ltype. #3251
Fixed selection of time64 columns by python class name. #3253
Fixed dt.shift() behavior on grouped columns. #3269 #3272
Reducers and row-wise functions now support void columns. #3284
Fixed dt.median() when used in a groupby context with void columns. #3411
Allow chained reducers to be used for dt.FExprs. #3417

fread¶

When reading Excel files, datetime fields will now be converted into time64 columns in the resulting frame.
When reading Excel files, forward slash, backslash, and their mix are supported as separators for specifying subpath. #3221
fread() now supports reading from public S3 buckets, when the source has a format of s3://bucket-name/key-name. #3302
Header detection heuristics has been improved in the case when some of the column names are missing. #3363
Improved handling of very small and very large float values. #3447
fread() will no longer fail while reading mostly empty files. #3055
fread() will no longer fail when reading excel files on Windows. #3178
Parameter tempdir is now honored for memory limited fread() operation. #3244
Parameter sep= in fread() will no longer accept values '-', '+', or '.'. Previously, these values were allowed but they produced errors during parsing. #3065

Models¶

Fixed a bug in the LinearModel that in some cases resulted in the gradient and model coefficients blow up. #3234
Fixed undefined behavior when LinearModel predicted on frames with missing values. #3260
Fixed target column type detection in LinearModel. #3466

General¶

Datatable no longer supports Python 3.6, because it has reached its end of life on 2021-12-23 and will no longer be supported. If you are still using Python 3.6, please consider upgrading. #3376
Datatable no longer supports Python 3.7, because it has reached its end of life on 2023-06-27 and will no longer be supported. If you are still using Python 3.7, please consider upgrading. #3434
Added properties .is_array, .is_boolean, .is_categorical, .is_compound, .is_float, .is_integer, .is_numeric, .is_object, .is_string, .is_temporal, .is_void to class dt.Type. #3101 #3149
Added support for macOS Big Sur. #3175
Added support for Python 3.10. #3210
Added support for Python 3.11. #3374
datatable’s thread pool can now be used to parallelize external C++ applications and will have no specific datatable dependencies, when the code is built with DT_DISABLE variable being defined. #3306
Python built-in functoins min() and max() will continue working for list comprehensions even after dt.min() and dt.max() have been imported from datatable. #3409