|Next release:||Version 0.11.1|
|Previous release:||Version 0.10.1|
.sourcecontains the name of the file where the frame was loaded from. If the frame was modified after loading, or if it was created dynamically to begin with, this property will return
len(DT)now works, and returns the number of columns in the Frame. This allows the Frame to be used in contexts where an iterable might be expected.
Added ability to cast string columns into numeric types: int, float or boolean. #1313
String columns now support comparison operators
String columns can now be added together, similarly to how strings can be added in Python. #1839
.colindex()now accepts a column selector f-expression as an argument.
When creating a Frame from a python list, it is now possible to explicitly specify the stype of the resulting column by “dividing” the list by the type you need:
dt.Frame(A=[1, 5, 10] / dt.int64, B=[0, 0, 0] / dt.float64)
Casting a column into its own type is now a no-op. #2425
It is now possible to create a Frame from a pandas DataFrame with Categorical columns (which will be converted into strings). #2407
.colindex()now throws an
dt.exceptions.KeyErrorwhen given a column that doesn’t exist in the frame, or an
dt.exceptions.IndexErrorif given a numeric column index outside of the allowed range. Previously it was throwing a
ValueErrorin both cases.
When creating a Frame from a list containing mixed integers/floats and strings, the resulting Frame will now have stype
str32. Previously an
obj64column was created instead. The new behavior is more consistent with fread’s behavior when reading CSV files.
f[:]now excludes groupby columns when used in a groupby context. #2460
Deleting a key from the Frame (
del DT.key) no longer causes a seg.fault. #2357
Casting a 0-row
str64stype no longer goes into an infinite loop. #2369
Fixed creation of a
str64column from a python list of strings when the total size of all strings is greater than 2GB. #2368
str32columns such that their combined string buffers have size over 2GB now properly creates a
str64column as a result. #2367
Fixed crash when writing to CSV a frame with many boolean columns when the option
quoting="all"is used. #2382
It is no longer allowed to combine
Use this function to read multiple files using a glob, or give it a list of files, or an archive containing multiple files inside, or an Excel file with multiple sheets, etc.
errors=which controls what shouold happen when some of the sources cannot be read. Possible values are:
"store". The latter will catch the exceptions that may occur when reading each input, and return those exception objects within the iterator. #2008
encodingwhich will force fread to decode the input using the specified encoding before attempting to read it. The decoding process uses standard python codecs, and is therefore single-threaded. The parameter accepts any value available via the standard python library
memory_limitwhich instructs fread to try to limit the amount of memory used when reading the input. This parameter is especially useful when reading files that are larger than the amount of available memory. #1750
multiple_sourceswhich controls fread’s behavior when multiple input sources are detected (for example, if you pass a name of an archive, and the archive contains multiple files). Possible values are:
Fread now displays a progress bar when downloading data from a URL. #2441
Fread now computes NA counts of all data while reading, storing them in per-column stats. For integer and floating point columns we also compute min/max value in each column. #1097
When reading from a URL, fread will now escape url-unsafe characters in that URL, so that the user doesn’t have to.
When reading Excel files, the cells with datetime or boolean types are now handled correctly, in particular a datetime value is converted into its string representation. #1701
Fread now properly detects
\r-newlines in the presence of fields with quoted
Opening Jay file from a bytes object now produces a Frame that remains valid even after the bytes object is deleted. #2547
datatable is now fully supported on Windows.
dt.exceptions.InvalidOperationError, which can be used to signal when an operation is requested that would be illegal for the given combination of parameters.
dt.options.debug.enabledwill report all calls to the internal C++ core functions, together with their timings. This may help identify performance bottlenecks, or help troubleshooting user scripts.
debug.max_arg_sizeallow more granular control over the logging process. #2452
ifelse(cond, expr_if_true, expr_if_false)can return one of the two values based on the condition. #2411
DT["max(x,y)"] = ifelse(f.x >= f.y, f.x, f.y)
Added 2 new fields into the
.git_dateis the UTC timestamp of the git revision from which that version of datatable was built, and
.git_diffwhich will be non-empty for builds from code that was modified compared to the git revision they are based on.
During a fork the thread pool will now shut down completely, together with the monitor thread. The threads will then restart in both the parent and the child, when needed. #2438
Avoid rare deadlock when creating a frame from pandas DataFrame in a forked process, in the datatable compiled with gcc version before 7.0. #2272
Fix rare crash in the interrupt signal handler. #2282
Column names containing backticks now display properly in error messages. #2406
Fixed rare race condition when multiple threads tried to throw an exception at the same time. #2526
The exception messages when stringified no longer contain backticks. The backticks are still emitted internally to help display the error in a color-supporting terminal, but when the exception is converted into a string via
repr(), these backticks will now be stripped. This change ensures that the exception message remains the same regardless of how it is rendered.
.nepochs, the number of epochs to train the model, can now be a float rather than an integer.
Fixed inconsistency in progress reporting. #2520
This release was created with the help of 9 people who contributed code and documentation, and 18 more people who submitted bug reports and feature requests.
Code & documentation contributors:
- Pasha Stetsenko
- Oleksiy Kononenko
- Samuel Oranyeli
- Pradeep Krishnamurthy
- Liu Chi
- Wes Morgan
- Juliano Faccioni
- Michal Malohlava
- Bryce Boe