Version 1.0.0¶
Version 1.0.0 | |
---|---|
Release date: | 2021-07-01 |
Previous release: | Version 0.11.1 |
Frame¶
Property
.types
returns the list ofdt.Type
objects for each column of the frame. These types are a generalization of previous stypes, and will eventually replace them.Property
.type
returns the commondt.Type
for all columns of the frame (provided that it exists).New column type
dt.Type.date32
added, which can store a calendar date #2858:import datetime DT = dt.Frame([datetime.date(2021, 2, 17)])
New column type
dt.Type.time64
added, which cat store timestamps within a certain time zone (in a single column all times must be in the same time zone) #2911:import datetime DT = dt.Frame([datetime.datetime(2021, 3, 17, 9, 0, 0)])
A Frame can now be constructed from an Arrow table:
DT = dt.Frame(arrow_table)
This process uses data Arrow C Data interface, and therefore does not entail data copying.
A Frame can now be converted into an Arrow table, using the
.to_arrow()
method:pa_table = DT.to_arrow()
.meta
property now provides access to frame’s meta information, if any, as set by datatable functions/methods or by the user.Class
dt.FExpr
now has method.sum()
, which behaves exactly as the base level functiondt.sum()
.Class
dt.FExpr
now has methods.max()
,.min()
,.mean()
, and.median()
, which behaves exactly as the equivalent base level functionsdt.max()
,dt.min()
,dt.mean()
, anddt.median()
respectively.Class
dt.FExpr
now has methods for all the row functions (dt.rowsum()
,dt.rowall()
, etc).Class
dt.FExpr
now has methods.sd()
,.count()
,.first()
, and.last()
,.shift()
, which behaves exactly as the equivalent base level functionsdt.sd()
,dt.count()
,dt.first()
,dt.last()
anddt.shift()
respectively.The row selector
i
in the delete operationdel DT[i, :]
can now be an unsorted list. The list can also contain duplicate values.When a keyed Frame is converted into a pandas DataFrame, the key columns will now become the DataFrame’s index, not regulat columns. #2883
When a Frame is shown in a python console, it will now display the stype of each column, as a second line under the column names. #2810
Parameter
types=
in Frame’s constructor can now accept arguments of classdt.Type
, and also pyarrow’s types. #2986A Frame can now be created properly from a list of numpy bool objects. #2762
Frames with 1000000+ columns will now be correctly stored in Jay. #2876
Passing an invalid value to the
column=
argument of the.to_numpy()
method will no longer result in a crash.Frame terminal display no longer overflows terminal’s width if it contains strings with special characters. #2844
Sorting in reverse order now works correctly in the presence of a groupby. #2838
Creating a Frame from a list of
np.str_
objects now works correctly. #3026Converting a frame with incompatible types into a numpy array will now raise an error (instead of auto-promoting to object type). However, if the user explicitly requests promotion into the object type then there won’t be any error.
Rbinding frames with columns of incompatible types will now raise an error instead of auto-promoting to string type. #2790
When a frame is converted into a numpy array of floatinng type, then we will produce a regular
np.ndarray
instead of a masked array.Properties
.stypes
and.ltypes
are now considered deprecated and will be removed in a future version. Currently they continue to work as before, however.When a frame is created from a list of python objects of disparate types, we will no longer create a column of type
object
– instead, adt.exceptions.TypeError
will be thrown. Anobject
column can still be created by an explicit request via thestype=
parameter in the constructor.Parameter
stypes=
in Frame constructor was renamed intotypes=
, and similarlystype=
intotype=
. The old parameter names are still recognized, but no longer documented.Internal functions
dt.internal.compiler_version()
anddt.internal.in_debug_mode()
removed and replaced with flags.compiler
and.build_mode
indt.build_info
. Functiondt.interenal.regex_supported()
removed entirely – datatable will now always have support for regular expressions. #2636
FExpr¶
Function
ifelse()
can now accept more than 3 arguments, implementing a chained-if functionality. This is equivalent toCASE WHEN
in SQL. #2656New function
as_type()
that allows casting columns into a different stype. This function is an alternative to the already existing functionality of using the stype itself as a cast function.Function
dt.time.ymd()
can createdate32
columns out of individual year/month/day parts.Functions
dt.time.year()
,dt.time.month()
anddt.time.day()
for retrieving individual components of a date.New function
dt.time.day_of_week()
for computing the day of week (Monday to Sunday) for the given date column.New function
dt.str.slice()
for applying a slice to a string column. #1667Function
sort()
can now accept argumentna_positon=
. It can take three values:"first"
(default),"last"
and"remove"
. The values describe the position assigned to NAs after sorting. #793Function
cut()
can now accept argumentbins=
, that is a list or a tuple of frames containing edges of the binning intervals. #2819When a whole column is updated within a
DT[i, j, by()]
call, the stype/ltype of that column us now allowed to change. #2685Fix a crash that occurred when using
median()
on virtual columns of type ArrayView64. #2802Fix poor performance when selecting columns from a frame with a large number of columns (10k+). #2873
Numpy scalars can now be used in expressions. #3027
f-expressions now accepts a list/tuple of column names/column positions/column types in the
j
section. #2797Method
dt.FExpr.len()
has been deprecated and replaced with a functiondt.str.len()
. #3016Method
dt.FExpr.re_match()
has been deprecated and replaced with a functiondt.re.match()
. #3017
fread¶
Models¶
Implemented a linear model with stochastic gradient descent learning. It supports binomial and multinomial regressions, as well as regression for continous targets. #2871
FTRL now supports
dt.Type.date32
anddt.Type.time64
feature types. #3007
General¶
Datatable no longer supports Python 3.5, because Python 3.5 itself has reached its end of life on 2020-09-13 and will no longer be supported. If you are still using Python 3.5, please consider upgrading. #2642
Removed function
dt.open()
, which was deprecated since version 0.10.0. #3018Fixed a memory leak when creating a large number of datatable objects. #2701
Datatable can now be properly installed from a source distribution. #2846
Contributors¶
This release was created with the help of 6 people who contributed code and documentation, and 17 more people who submitted bug reports and feature requests.
Code & documentation contributors:
Issues contributors: