Version 0.11.0¶

Version 0.11.0
Release date:	2020-09-19
Next release:	Version 0.11.1
Previous release:	Version 0.10.1
Wheels
MacOS	python-3.5
	python-3.6
	python-3.7
	python-3.8
Linux x86-64	python-3.5
	python-3.6
	python-3.7
	python-3.8
Linux ppc64le	python-3.5
	python-3.6
	python-3.7
	python-3.8
Windows	python-3.5
	python-3.6
	python-3.7
	python-3.8
SDist	sources

Frame¶

Property .source contains the name of the file where the frame was loaded from. If the frame was modified after loading, or if it was created dynamically to begin with, this property will return None.
The expression len(DT) now works, and returns the number of columns in the Frame. This allows the Frame to be used in contexts where an iterable might be expected.
Added ability to cast string columns into numeric types: int, float or boolean. #1313
String columns now support comparison operators <, >, <= and >=. #2274
String columns can now be added together, similarly to how strings can be added in Python. #1839
Added a new function dt.cut() to bin numeric data to equal-width discrete intervals. #2483
Added a new function dt.qcut() to bin data to equal-population discrete intervals. #1680
Added function dt.math.round() which is the equivalent of Python’s built-in round(). #2285
Method .colindex() now accepts a column selector f-expression as an argument.
When creating a Frame from a python list, it is now possible to explicitly specify the stype of the resulting column by “dividing” the list by the type you need:

dt.Frame(A=[1, 5, 10] / dt.int64, B=[0, 0, 0] / dt.float64)
Added new argument bom=False to the .to_csv() method. If set to True, it will add the Byte-Order Mark (BOM) to the output CSV file. #2379
Casting a column into its own type is now a no-op. #2425
It is now possible to create a Frame from a pandas DataFrame with Categorical columns (which will be converted into strings). #2407
Method .cbind() now throws a dt.exceptions.InvalidOperationError instead of a ValueError if the argument frames have incompatible shapes.
Method .colindex() now throws an dt.exceptions.KeyError when given a column that doesn’t exist in the frame, or an dt.exceptions.IndexError if given a numeric column index outside of the allowed range. Previously it was throwing a ValueError in both cases.
When creating a Frame from a list containing mixed integers/floats and strings, the resulting Frame will now have stype str32. Previously an obj64 column was created instead. The new behavior is more consistent with fread’s behavior when reading CSV files.
Expression f[:] now excludes groupby columns when used in a groupby context. #2460
Parameters _strategy= in .to_csv() and .to_jay() were renamed into method=. The old parameter name still works, so this change is not breaking.
The behaviour of a method .sort() is made consistent with the function dt.sort(). When the list of columns to sort is empty, both will not sort any columns.
Deleting a key from the Frame (del DT.key) no longer causes a seg.fault. #2357
Casting a 0-row str32 column into str64 stype no longer goes into an infinite loop. #2369
Fixed creation of a str64 column from a python list of strings when the total size of all strings is greater than 2GB. #2368
Rbinding several str32 columns such that their combined string buffers have size over 2GB now properly creates a str64 column as a result. #2367
Fixed crash when writing to CSV a frame with many boolean columns when the option quoting="all" is used. #2382
It is no longer allowed to combine compression="gzip" and append=True in .to_csv().
Empty strings no longer get confused with NA strings in .replace(). #2502
dt.rbind()-ing an iterator of frames created on-the-fly no longer produces an undefined behavior. #2621

Fread¶

Added new function iread(), which is similar to fread(), but suitable for reading multiple sources at once. The function will return an iterator of Frames.

Use this function to read multiple files using a glob, or give it a list of files, or an archive containing multiple files inside, or an Excel file with multiple sheets, etc.

The function iread() has parameter errors= which controls what shouold happen when some of the sources cannot be read. Possible values are: "warn", "raise", "ignore" and "store". The latter will catch the exceptions that may occur when reading each input, and return those exception objects within the iterator. #2008
It is now possible to read multi-file .tar.gz files using iread(). #2392
Added parameter encoding which will force fread to decode the input using the specified encoding before attempting to read it. The decoding process uses standard python codecs, and is therefore single-threaded. The parameter accepts any value available via the standard python library codecs. #2395
Added parameter memory_limit which instructs fread to try to limit the amount of memory used when reading the input. This parameter is especially useful when reading files that are larger than the amount of available memory. #1750
Added parameter multiple_sources which controls fread’s behavior when multiple input sources are detected (for example, if you pass a name of an archive, and the archive contains multiple files). Possible values are: "warn" (default), "error", and "ignore".
Fread now displays a progress bar when downloading data from a URL. #2441
Fread now computes NA counts of all data while reading, storing them in per-column stats. For integer and floating point columns we also compute min/max value in each column. #1097
When reading from a URL, fread will now escape url-unsafe characters in that URL, so that the user doesn’t have to.
When reading Excel files, the cells with datetime or boolean types are now handled correctly, in particular a datetime value is converted into its string representation. #1701
Fread now properly detects \r-newlines in the presence of fields with quoted \n-newlines. #1343
Opening Jay file from a bytes object now produces a Frame that remains valid even after the bytes object is deleted. #2547
Function fread() now always returns a single Frame object; previously it could return a dict of Frames if multiple sources were detected. Use iread() if you need to read multi-source input.

General¶

datatable is now fully supported on Windows.
Added exception dt.exceptions.InvalidOperationError, which can be used to signal when an operation is requested that would be illegal for the given combination of parameters.
New option dt.options.debug.enabled will report all calls to the internal C++ core functions, together with their timings. This may help identify performance bottlenecks, or help troubleshooting user scripts.

Additional options debug.logger, debug.report_args and debug.max_arg_size allow more granular control over the logging process. #2452
Function ifelse(cond, expr_if_true, expr_if_false) can return one of the two values based on the condition. #2411

DT["max(x,y)"] = ifelse(f.x >= f.y, f.x, f.y)
datatable no longer has modules blessed and typesentry as dependencies. #1677 #1535
Added 2 new fields into the dt.build_info struct: .git_date is the UTC timestamp of the git revision from which that version of datatable was built, and .git_diff which will be non-empty for builds from code that was modified compared to the git revision they are based on.
During a fork the thread pool will now shut down completely, together with the monitor thread. The threads will then restart in both the parent and the child, when needed. #2438
Internal function dt.internal.frame_column_data_r() now works properly with virtual columns. #2269
Avoid rare deadlock when creating a frame from pandas DataFrame in a forked process, in the datatable compiled with gcc version before 7.0. #2272
Fix rare crash in the interrupt signal handler. #2282
Fixed possible crash in rbind() and union() when they were called with a string argument, or with an object that caused infinite recursion. #2386
Column names containing backticks now display properly in error messages. #2406
Fixed rare race condition when multiple threads tried to throw an exception at the same time. #2526
All exceptions thrown by datatable are now declared in the dt.exceptions module. These exceptions are now organized to derive from the common base class dt.exceptions.DtException.

The exception messages when stringified no longer contain backticks. The backticks are still emitted internally to help display the error in a color-supporting terminal, but when the exception is converted into a string via str() or repr(), these backticks will now be stripped. This change ensures that the exception message remains the same regardless of how it is rendered.

FTRL model¶

.nepochs, the number of epochs to train the model, can now be a float rather than an integer.
.fit() now throws dt.exceptions.TypeError when ltypes in the training and validation frames are not consistent.
.interactions now throws an dt.exceptions.ValueError instead of a dt.exceptions.TypeError when assigning interactions having zero features.
Fixed inconsistency in progress reporting. #2520

Contributors¶

This release was created with the help of 9 people who contributed code and documentation, and 18 more people who submitted bug reports and feature requests.

Code & documentation contributors:

Issues contributors: