Version 0.3.0

Version 0.3.0
Release date:2018-03-19
Next release:Version 0.3.1
Previous release:Version 0.2.2
Wheels
MacOSpython-3.5

General

  • Method Frame.tonumpy() now has argument stype which will force conversion into a numpy array of the specific stype.

  • Enums stype and ltype that encapsulate the type-system of the datatable module.

  • It is now possible to fread from a bytes object.

  • Allow columns to be renamed by setting the names property on the frame.

  • Internal “MemoryMapManager” will make datatable more robust when opening a frame with many columns on Linux systems. In particular, error 12 “not enough memory” should become much more rare now.

  • Number of threads used by fread can now be controlled via parameter nthreads.

  • It is now possible to supply string argument to dt.DataTable constructor, which in turn will try to interpret that argument via fread().

  • fread() can now read compressed .xz files.

  • fread() now automatically skips Ctrl+Z / NUL characters at the end of the file.

  • It is now possible to create a datatable from string numpy array.

  • Added parameters skip_blank_lines, strip_white, quotechar and dec to fread().

  • Single-column files with blank lines can now be read successfully.

  • Fread now recognizes \r\r\n as a valid line ending.

  • Added parameters url and cmd to fread(), as well as ability to detect URLs automatically. The url parameter downloads file from HTTP/HTTPS/FTP server into a temporary location and reads it from there. The cmd parameter executesthe provided shell command and then reads the data from the stdout.

  • It is now possible to pass file objects to fread() (or any objects exposing method read()).

  • File path given to fread() can now transparently select files within .zip archives. This doesn’t work with archives-within-archives.

  • GenericReader now supports auto-detecting and reading UTF-16 files.

  • GenericReader now attempts to detect whether the input file is an HTML, and if so raises an exception with the appropriate error message.

  • Datatable can now use either llvm-4.0 or llvm-5.0 depending on what the user has.

  • fread() now allows sep="", causing the file to be read line-by-line.

  • range arguments can now be passed to a DataTable constructor.

  • Datatable will now fall back to eager execution if it cannot detect LLVM runtime.

  • Added simple Excel file reader.

  • It is now possible to select columns from DataTable by type: df[int] selects all integer columns from df.

  • Allow creating DataTable from list, while forcing a specific stype(s).

  • Added ability to delete rows from a DataTable: del df[rows, :]

  • DataTable can now accept pandas/numpy frames with columns of float16 dtype (which will be automatically converted to float32).

  • isna() function now works on strings too.

  • Function save() is now a method of Frame class.

  • Warnings now have custom display hook.

  • Added global option nthreads which control the number of Omp threads used by datatable for parallel execution. Example: dt.options.nthreads = 1.

  • Add method .scalar() to quickly convert a 1x1 Frame into a python scalar.

  • New methods .min1(), .max1(), .mean1(), .sum1(), .sd1(), .countna1() that are similar to .min(), .max(), etc. but return a scalar instead of a Frame (however they only work with a 1-column Frames).

  • Implemented method .nunique() to compute the number of unique values in each column.

  • Added stats functions .mode() and .nmodal().

  • When writing “round” doubles/floats to CSV, they’ll now always have trailing zero. For example, [0.0, 1.0, 1e23] now produces "0.0,1.0,1.0e+23" instead of "0,1,1e+23".

  • df.stypes now returns a tuple of stype elements (previously it was returning a list of strings). Likewise, df.types was renamed into df.ltypes and now it returns a tuple of ltype elements instead of strings.

  • Parameter colnames= in DataTable constructor was renamed to names=. The old parameter may still be used, but it will result in a warning.

  • DataTable can no longer have duplicate column names. If such names are given, they will be mangled to make them unique, and a warning will be issued.

  • Special characters (in the ASCII range \x00 - \x1F) are no longer permitted in the column names. If encountered, they will be replaced with a dot ..

  • Fread now ignores trailing whitespace on each line, even if ' ' separator is used.

  • Fread on an empty file now produces an empty DataTable, instead of an exception.

  • Fread’s parameter skip_lines was replaced with skip_to_line, so that it’s more in sync with the similar argument skip_to_string.

  • When saving datatable containing obj64 columns, they will no longer be saved, and user warning will be shown (previously saving this column would eventually lead to a segfault).

  • DataTable class was renamed into Frame.

  • “eager” evaluation engine is now the default.

  • Parameter inplace of method dt.Frame.rbind() was removed: instead you can now rbind frames to an empty frame: dt.Frame().rbind(df1, df2).

  • datatable will no longer cause the C locale settings to change upon importing.

  • reading a csv file with invalid UTF-8 characters in column names will no longer throw an exception.

  • creating a DataTable from pandas.Series with explicit colnames will no longer ignore those column names.

  • fread(fill=True) will correctly fill missing fields with NAs.

  • fread(columns=set(...)) will correctly handle the case when the input contains multiple columns with the same names.

  • fread will no longer crash if the input dataset contains invalid utf8/win1252 data in the column headers. #594 #628

  • fixed bug in exception handling, which occasionally caused empty exception messages.

  • fixed bug in fread where string fields starting with “NaN” caused an assertion error.

  • Fixed bug when saving a DataTable with unicode column names into .nff format on systems where default encoding is not unicode-aware.

  • More robust newline handling in fread. #634 #641 #647

  • Quoted fields are now correctly unquoted in fread().

  • Fixed a bug in fread which occurred if the number of rows in the CSV file was estimated too low. #664

  • Fixed fread bug where an invalid DataTable was constructed if parameter max_nrows was used and there were any string columns. #671

  • Fixed a rare bug in fread which produced error message “Jump X did not finish reading where jump X+1 started”. #682

  • Prevented memory leak when using PyObject columns in conjunction with numpy.

  • View frames can now be properly saved.

  • Fixed crash when sorting view frame by a string column.

  • Deleting 0 columns is no longer an error.

  • Rows filter now works properly when applied to a view table and using “eager” evaluation engine.

  • Computed columns expression can now be combined with rows expression, or applied to a view Frame.

Contributors

This release was created with the help of 5 people who contributed code and documentation, and 6 more people who submitted bug reports and feature requests.

Code & documentation contributors:

Issues contributors: