Version 1.0.0

Version 1.0.0
Release date:2021-07-01
Previous release:Version 0.11.1

Frame

  • Property .types returns the list of dt.Type objects for each column of the frame. These types are a generalization of previous stypes, and will eventually replace them.

    Property .type returns the common dt.Type for all columns of the frame (provided that it exists).

  • New column type dt.Type.date32 added, which can store a calendar date #2858:

    import datetime DT = dt.Frame([datetime.date(2021, 2, 17)])
  • New column type dt.Type.time64 added, which cat store timestamps within a certain time zone (in a single column all times must be in the same time zone) #2911:

    import datetime DT = dt.Frame([datetime.datetime(2021, 3, 17, 9, 0, 0)])
  • A Frame can now be constructed from an Arrow table:

    DT = dt.Frame(arrow_table)

    This process uses data Arrow C Data interface, and therefore does not entail data copying.

  • A Frame can now be converted into an Arrow table, using the .to_arrow() method:

    pa_table = DT.to_arrow()
  • .meta property now provides access to frame’s meta information, if any, as set by datatable functions/methods or by the user.

  • Class dt.FExpr now has method .sum(), which behaves exactly as the base level function dt.sum().

  • Class dt.FExpr now has methods .max(), .min(), .mean(), and .median(), which behaves exactly as the equivalent base level functions dt.max(), dt.min(), dt.mean(), and dt.median() respectively.

  • Class dt.FExpr now has methods for all the row functions (dt.rowsum(), dt.rowall(), etc).

  • Class dt.FExpr now has methods .sd(), .count(), .first(), and .last(), .shift(), which behaves exactly as the equivalent base level functions dt.sd(), dt.count(), dt.first(), dt.last() and dt.shift() respectively.

  • Added stats functions .skew() and .kurt(). #1084

  • The row selector i in the delete operation del DT[i, :] can now be an unsorted list. The list can also contain duplicate values.

  • When a keyed Frame is converted into a pandas DataFrame, the key columns will now become the DataFrame’s index, not regulat columns. #2883

  • When a Frame is shown in a python console, it will now display the stype of each column, as a second line under the column names. #2810

  • Parameter types= in Frame’s constructor can now accept arguments of class dt.Type, and also pyarrow’s types. #2986

  • A Frame can now be created properly from a list of numpy bool objects. #2762

  • Frames with 1000000+ columns will now be correctly stored in Jay. #2876

  • Passing an invalid value to the column= argument of the .to_numpy() method will no longer result in a crash.

  • Frame terminal display no longer overflows terminal’s width if it contains strings with special characters. #2844

  • Sorting in reverse order now works correctly in the presence of a groupby. #2838

  • Creating a Frame from a list of np.str_ objects now works correctly. #3026

  • Converting a frame with incompatible types into a numpy array will now raise an error (instead of auto-promoting to object type). However, if the user explicitly requests promotion into the object type then there won’t be any error.

  • Rbinding frames with columns of incompatible types will now raise an error instead of auto-promoting to string type. #2790

  • When a frame is converted into a numpy array of floatinng type, then we will produce a regular np.ndarray instead of a masked array.

  • Properties .stypes and .ltypes are now considered deprecated and will be removed in a future version. Currently they continue to work as before, however.

  • When a frame is created from a list of python objects of disparate types, we will no longer create a column of type object – instead, a dt.exceptions.TypeError will be thrown. An object column can still be created by an explicit request via the stype= parameter in the constructor.

  • Parameter stypes= in Frame constructor was renamed into types=, and similarly stype= into type=. The old parameter names are still recognized, but no longer documented.

  • Internal functions dt.internal.compiler_version() and dt.internal.in_debug_mode() removed and replaced with flags .compiler and .build_mode in dt.build_info. Function dt.interenal.regex_supported() removed entirely – datatable will now always have support for regular expressions. #2636

FExpr

  • Function ifelse() can now accept more than 3 arguments, implementing a chained-if functionality. This is equivalent to CASE WHEN in SQL. #2656

  • New function as_type() that allows casting columns into a different stype. This function is an alternative to the already existing functionality of using the stype itself as a cast function.

  • Function dt.time.ymd() can create date32 columns out of individual year/month/day parts.

  • Functions dt.time.year(), dt.time.month() and dt.time.day() for retrieving individual components of a date.

  • New function dt.time.day_of_week() for computing the day of week (Monday to Sunday) for the given date column.

  • New function dt.str.slice() for applying a slice to a string column. #1667

  • Function sort() can now accept argument na_positon=. It can take three values: "first" (default), "last" and "remove". The values describe the position assigned to NAs after sorting. #793

  • Function cut() can now accept argument bins=, that is a list or a tuple of frames containing edges of the binning intervals. #2819

  • When a whole column is updated within a DT[i, j, by()] call, the stype/ltype of that column us now allowed to change. #2685

  • Fix a crash that occurred when using median() on virtual columns of type ArrayView64. #2802

  • Fix poor performance when selecting columns from a frame with a large number of columns (10k+). #2873

  • Numpy scalars can now be used in expressions. #3027

  • f-expressions now accepts a list/tuple of column names/column positions/column types in the j section. #2797

  • Method dt.FExpr.len() has been deprecated and replaced with a function dt.str.len(). #3016

  • Method dt.FExpr.re_match() has been deprecated and replaced with a function dt.re.match(). #3017

fread

  • Fix an error when reading a file with uneven number of fields and having Windows-style newlines. #2681

  • Fread no longer throws an exception when the list of column types passed to parameter columns= contains str64. #2704

  • Fread no longer improperly detects separators within quoted strings. #922

Models

  • Implemented a linear model with stochastic gradient descent learning. It supports binomial and multinomial regressions, as well as regression for continous targets. #2871

  • FTRL now supports dt.Type.date32 and dt.Type.time64 feature types. #3007

General

  • Datatable no longer supports Python 3.5, because Python 3.5 itself has reached its end of life on 2020-09-13 and will no longer be supported. If you are still using Python 3.5, please consider upgrading. #2642

  • Removed function dt.open(), which was deprecated since version 0.10.0. #3018

  • Fixed a memory leak when creating a large number of datatable objects. #2701

  • Datatable can now be properly installed from a source distribution. #2846

Contributors

This release was created with the help of 6 people who contributed code and documentation, and 17 more people who submitted bug reports and feature requests.

Code & documentation contributors:

Issues contributors: