Version 0.11.0

Version 0.11.0
Release date:2020-09-19
Next release:Version 0.11.1
Previous release:Version 0.10.1
Wheels
MacOSpython-3.5
python-3.6
python-3.7
python-3.8
Linux x86-64python-3.5
python-3.6
python-3.7
python-3.8
Linux ppc64lepython-3.5
python-3.6
python-3.7
python-3.8
Windowspython-3.5
python-3.6
python-3.7
python-3.8
SDistsources

Frame

  • Property .source contains the name of the file where the frame was loaded from. If the frame was modified after loading, or if it was created dynamically to begin with, this property will return None.

  • The expression len(DT) now works, and returns the number of columns in the Frame. This allows the Frame to be used in contexts where an iterable might be expected.

  • Added ability to cast string columns into numeric types: int, float or boolean. #1313

  • String columns now support comparison operators <, >, <= and >=. #2274

  • String columns can now be added together, similarly to how strings can be added in Python. #1839

  • Added a new function dt.cut() to bin numeric data to equal-width discrete intervals. #2483

  • Added a new function dt.qcut() to bin data to equal-population discrete intervals. #1680

  • Added function dt.math.round() which is the equivalent of Python’s built-in round(). #2285

  • Method .colindex() now accepts a column selector f-expression as an argument.

  • When creating a Frame from a python list, it is now possible to explicitly specify the stype of the resulting column by “dividing” the list by the type you need:

    dt.Frame(A=[1, 5, 10] / dt.int64, B=[0, 0, 0] / dt.float64)
  • Added new argument bom=False to the .to_csv() method. If set to True, it will add the Byte-Order Mark (BOM) to the output CSV file. #2379

  • Casting a column into its own type is now a no-op. #2425

  • It is now possible to create a Frame from a pandas DataFrame with Categorical columns (which will be converted into strings). #2407

  • Method .cbind() now throws a dt.exceptions.InvalidOperationError instead of a ValueError if the argument frames have incompatible shapes.

  • Method .colindex() now throws an dt.exceptions.KeyError when given a column that doesn’t exist in the frame, or an dt.exceptions.IndexError if given a numeric column index outside of the allowed range. Previously it was throwing a ValueError in both cases.

  • When creating a Frame from a list containing mixed integers/floats and strings, the resulting Frame will now have stype str32. Previously an obj64 column was created instead. The new behavior is more consistent with fread’s behavior when reading CSV files.

  • Expression f[:] now excludes groupby columns when used in a groupby context. #2460

  • Parameters _strategy= in .to_csv() and .to_jay() were renamed into method=. The old parameter name still works, so this change is not breaking.

  • The behaviour of a method .sort() is made consistent with the function dt.sort(). When the list of columns to sort is empty, both will not sort any columns.

  • Deleting a key from the Frame (del DT.key) no longer causes a seg.fault. #2357

  • Casting a 0-row str32 column into str64 stype no longer goes into an infinite loop. #2369

  • Fixed creation of a str64 column from a python list of strings when the total size of all strings is greater than 2GB. #2368

  • Rbinding several str32 columns such that their combined string buffers have size over 2GB now properly creates a str64 column as a result. #2367

  • Fixed crash when writing to CSV a frame with many boolean columns when the option quoting="all" is used. #2382

  • It is no longer allowed to combine compression="gzip" and append=True in .to_csv().

  • Empty strings no longer get confused with NA strings in .replace(). #2502

  • dt.rbind()-ing an iterator of frames created on-the-fly no longer produces an undefined behavior. #2621

Fread

  • Added new function iread(), which is similar to fread(), but suitable for reading multiple sources at once. The function will return an iterator of Frames.

    Use this function to read multiple files using a glob, or give it a list of files, or an archive containing multiple files inside, or an Excel file with multiple sheets, etc.

    The function iread() has parameter errors= which controls what shouold happen when some of the sources cannot be read. Possible values are: "warn", "raise", "ignore" and "store". The latter will catch the exceptions that may occur when reading each input, and return those exception objects within the iterator. #2008

  • It is now possible to read multi-file .tar.gz files using iread(). #2392

  • Added parameter encoding which will force fread to decode the input using the specified encoding before attempting to read it. The decoding process uses standard python codecs, and is therefore single-threaded. The parameter accepts any value available via the standard python library codecs. #2395

  • Added parameter memory_limit which instructs fread to try to limit the amount of memory used when reading the input. This parameter is especially useful when reading files that are larger than the amount of available memory. #1750

  • Added parameter multiple_sources which controls fread’s behavior when multiple input sources are detected (for example, if you pass a name of an archive, and the archive contains multiple files). Possible values are: "warn" (default), "error", and "ignore".

  • Fread now displays a progress bar when downloading data from a URL. #2441

  • Fread now computes NA counts of all data while reading, storing them in per-column stats. For integer and floating point columns we also compute min/max value in each column. #1097

  • When reading from a URL, fread will now escape url-unsafe characters in that URL, so that the user doesn’t have to.

  • When reading Excel files, the cells with datetime or boolean types are now handled correctly, in particular a datetime value is converted into its string representation. #1701

  • Fread now properly detects \r-newlines in the presence of fields with quoted \n-newlines. #1343

  • Opening Jay file from a bytes object now produces a Frame that remains valid even after the bytes object is deleted. #2547

  • Function fread() now always returns a single Frame object; previously it could return a dict of Frames if multiple sources were detected. Use iread() if you need to read multi-source input.

General

  • datatable is now fully supported on Windows.

  • Added exception dt.exceptions.InvalidOperationError, which can be used to signal when an operation is requested that would be illegal for the given combination of parameters.

  • New option dt.options.debug.enabled will report all calls to the internal C++ core functions, together with their timings. This may help identify performance bottlenecks, or help troubleshooting user scripts.

    Additional options debug.logger, debug.report_args and debug.max_arg_size allow more granular control over the logging process. #2452

  • Function ifelse(cond, expr_if_true, expr_if_false) can return one of the two values based on the condition. #2411

    DT["max(x,y)"] = ifelse(f.x >= f.y, f.x, f.y)
  • datatable no longer has modules blessed and typesentry as dependencies. #1677 #1535

  • Added 2 new fields into the dt.build_info struct: .git_date is the UTC timestamp of the git revision from which that version of datatable was built, and .git_diff which will be non-empty for builds from code that was modified compared to the git revision they are based on.

  • During a fork the thread pool will now shut down completely, together with the monitor thread. The threads will then restart in both the parent and the child, when needed. #2438

  • Internal function dt.internal.frame_column_data_r() now works properly with virtual columns. #2269

  • Avoid rare deadlock when creating a frame from pandas DataFrame in a forked process, in the datatable compiled with gcc version before 7.0. #2272

  • Fix rare crash in the interrupt signal handler. #2282

  • Fixed possible crash in rbind() and union() when they were called with a string argument, or with an object that caused infinite recursion. #2386

  • Column names containing backticks now display properly in error messages. #2406

  • Fixed rare race condition when multiple threads tried to throw an exception at the same time. #2526

  • All exceptions thrown by datatable are now declared in the dt.exceptions module. These exceptions are now organized to derive from the common base class dt.exceptions.DtException.

    The exception messages when stringified no longer contain backticks. The backticks are still emitted internally to help display the error in a color-supporting terminal, but when the exception is converted into a string via str() or repr(), these backticks will now be stripped. This change ensures that the exception message remains the same regardless of how it is rendered.

FTRL model

Contributors

This release was created with the help of 9 people who contributed code and documentation, and 18 more people who submitted bug reports and feature requests.

Code & documentation contributors:

Issues contributors: