Version 1.1.0¶
Version 1.1.0 | |
---|---|
Previous release: | Version 1.0.0 |
Frame¶
Parameter
force=True
in method.rbind()
and functiondt.rbind()
will now allow combining columns of incompatible types. #3062Frames with columns of type
obj64
can now be saved into CSV. The values in the object column will be stringified upon saving. #3064.replace()
now supports numpy scalars. #3164.to_numpy()
now has an option to control memory layout of the resulting numpy array. #3275column types returned by the method
.sum()
are now consistent with the ones returned by the functiondt.sum()
, i.e.int64
for void, boolean and integer columns;float32
forfloat32
columns;float64
forfloat64
columns. #2904.to_csv()
now has an optionsep
to control the field separator. #3337Void columns can now be used with
dt.sort()
anddt.by()
. In addition, datatable will now skip sorting any column that it knows contains constant values. #3088 #3104 #3108 #3109Saving a frame with a
void
column into Jay no longer leads to a crash. #3074 #3099 #3246Joining with void columns now works correctly. #3094
dt.sum()
now works correctly when called on grouped column. #3110Fixed
dt.sum()
behavior when called on iterables and frames. #3406Fixed a crash which could have occurred when sorting very long identical or nearly identical strings. #3134
It is now possible to sort all columns according to boolean flags in the reverse list #3168
Fixed support for
.max_column_width
option when rendering frames in Jupyter notebooks. #3160Fixed a crash which in rare situations happened in
.to_csv()
due to multithreading. #3176Fixed a crash in
.to_pandas()
when called on keyed frames. #3224Fixed
.to_csv()
to quote missing values when quoting=”all” is specified. #3340Fixed groupby behavior on columns that contain missing values. #3331
Fixed creating frames from numpy arrays, that contain unicode strings. #3420
.to_numpy()
will now create a correctly shaped array in the case of zero-column frames. #3427In the case a zero-column frame is created from a list of tuples or dictionaries, the number of rows will be equal to the number of elements in that list. #3428
Converting a column of
void
type into pandas now produces a pandasobject
column filled withNone
s. Converting such column back into datatable produces avoid
column again. #3063When creating Frame from a list of values, a floating-point
nan
value will now be treated asNone
. In particular,nan
s can now be safely mixed with values of other types, and a list consisting of onlynan
s will turn into a Column of typevoid
. #3083Converting string or object columns to numpy no longer produces a masked array. Instead, we create a regular
object
array, filled withNone
s in place of NAs. Similarly, converting a string or object column to pandas creates a Series withNone
values (instead ofnan
s as before) in place of NAs. #3083
FExpr¶
Class
dt.FExpr
now has method.as_type()
, which behaves exactly as the equivalent base level functiondt.as_type()
.Added functions
dt.rowargmin()
anddt.rowargmin()
to find the index of the largest and smallest values among columns of each row. #2998Added reducer function
dt.prod()
and the corresponding.prod()
method to calculate product of values in columns. #3140Added function
dt.cumsum()
, as well as.cumsum()
method, to calculate the cumulative sum of values per column. #3279Added functions
dt.cummin()
anddt.cummax()
, as well as the corresponding.cummin()
and.cummax()
methods, to calculate the cumulative minimum and maximum of values per column. #3279Added function
dt.cumprod()
, as well as.cumprod()
method, to calculate the cumulative product of values per column. #3279Added function
dt.cumcount()
anddt.ngroup()
, to return the row number and group number respectively. #3279Added reducer functions
dt.countna()
anddt.nunique()
. #2999Class
dt.FExpr
now has method.nunique()
, which behaves exactly as the equivalent base level functiondt.nunique()
.Class
dt.FExpr
now has method.countna()
, which behaves exactly as the equivalent base level functiondt.countna()
.Added function
dt.fillna()
, as well as.fillna()
method, to impute missing values. #3279Class
dt.FExpr
now has method.alias()
, to assign new names to the selected columns. #2684Added function
dt.categories()
, as well as the corresponding.categories()
method, to retrieve categories for categorical columns. #3367Added function
dt.codes()
, as well as the corresponding.codes()
method, to retrieve codes for categorical columns. #3371Function
dt.re.match()
now supports case insensitive matching. #3216Function
dt.qcut()
can now be used in a groupby context. #3165dt.qcut()
won’t segfault anymore when used as an i-filter. #3061Fixed selection of
time64
columns byltype
. #3251Fixed selection of
time64
columns by python class name. #3253Fixed
dt.shift()
behavior on grouped columns. #3269 #3272Reducers and row-wise functions now support
void
columns. #3284Fixed
dt.median()
when used in a groupby context withvoid
columns. #3411
fread¶
When reading Excel files, datetime fields will now be converted into
time64
columns in the resulting frame.When reading Excel files, forward slash, backslash, and their mix are supported as separators for specifying subpath. #3221
fread()
now supports reading from public S3 buckets, when the source has a format ofs3://bucket-name/key-name
. #3302Header detection heuristics has been improved in the case when some of the column names are missing. #3363
Improved handling of very small and very large float values. #3447
fread()
will no longer fail while reading mostly empty files. #3055fread()
will no longer fail when reading excel files on Windows. #3178Parameter
tempdir
is now honored for memory limitedfread()
operation. #3244Parameter
sep=
infread()
will no longer accept values'-'
,'+'
, or'.'
. Previously, these values were allowed but they produced errors during parsing. #3065
Models¶
Fixed a bug in the
LinearModel
that in some cases resulted in the gradient and model coefficients blow up. #3234Fixed undefined behavior when
LinearModel
predicted on frames with missing values. #3260Fixed target column type detection in
LinearModel
. #3466
General¶
Datatable no longer supports Python 3.6, because it has reached its end of life on 2021-12-23 and will no longer be supported. If you are still using Python 3.6, please consider upgrading. #3376
Datatable no longer supports Python 3.7, because it has reached its end of life on 2023-06-27 and will no longer be supported. If you are still using Python 3.7, please consider upgrading. #3434
Added properties
.is_array
,.is_boolean
,.is_categorical
,.is_compound
,.is_float
,.is_integer
,.is_numeric
,.is_object
,.is_string
,.is_temporal
,.is_void
to classdt.Type
. #3101 #3149Added support for macOS Big Sur. #3175
Added support for Python 3.10. #3210
Added support for Python 3.11. #3374
datatable’s thread pool can now be used to parallelize external C++ applications and will have no specific datatable dependencies, when the code is built with
DT_DISABLE
variable being defined. #3306Python built-in functoins
min()
andmax()
will continue working for list comprehensions even afterdt.min()
anddt.max()
have been imported from datatable. #3409