Version 0.11.0¶
| Version 0.11.0 | |
|---|---|
| Release date: | 2020-09-19 |
| Next release: | Version 0.11.1 |
| Previous release: | Version 0.10.1 |
| Wheels | |
| MacOS | python-3.5 |
| python-3.6 | |
| python-3.7 | |
| python-3.8 | |
| Linux x86-64 | python-3.5 |
| python-3.6 | |
| python-3.7 | |
| python-3.8 | |
| Linux ppc64le | python-3.5 |
| python-3.6 | |
| python-3.7 | |
| python-3.8 | |
| Windows | python-3.5 |
| python-3.6 | |
| python-3.7 | |
| python-3.8 | |
| SDist | sources |
Frame¶
Property
.sourcecontains the name of the file where the frame was loaded from. If the frame was modified after loading, or if it was created dynamically to begin with, this property will returnNone.The expression
len(DT)now works, and returns the number of columns in the Frame. This allows the Frame to be used in contexts where an iterable might be expected.Added ability to cast string columns into numeric types: int, float or boolean. #1313
String columns now support comparison operators
<,>,<=and>=. #2274String columns can now be added together, similarly to how strings can be added in Python. #1839
Added a new function
dt.cut()to bin numeric data to equal-width discrete intervals. #2483Added a new function
dt.qcut()to bin data to equal-population discrete intervals. #1680Added function
dt.math.round()which is the equivalent of Python’s built-inround(). #2285Method
.colindex()now accepts a column selector f-expression as an argument.When creating a Frame from a python list, it is now possible to explicitly specify the stype of the resulting column by “dividing” the list by the type you need:
dt.Frame(A=[1, 5, 10] / dt.int64, B=[0, 0, 0] / dt.float64)Added new argument
bom=Falseto the.to_csv()method. If set toTrue, it will add the Byte-Order Mark (BOM) to the output CSV file. #2379Casting a column into its own type is now a no-op. #2425
It is now possible to create a Frame from a pandas DataFrame with Categorical columns (which will be converted into strings). #2407
Method
.cbind()now throws adt.exceptions.InvalidOperationErrorinstead of aValueErrorif the argument frames have incompatible shapes.Method
.colindex()now throws andt.exceptions.KeyErrorwhen given a column that doesn’t exist in the frame, or andt.exceptions.IndexErrorif given a numeric column index outside of the allowed range. Previously it was throwing aValueErrorin both cases.When creating a Frame from a list containing mixed integers/floats and strings, the resulting Frame will now have stype
str32. Previously anobj64column was created instead. The new behavior is more consistent with fread’s behavior when reading CSV files.Expression
f[:]now excludes groupby columns when used in a groupby context. #2460Parameters
_strategy=in.to_csv()and.to_jay()were renamed intomethod=. The old parameter name still works, so this change is not breaking.The behavior of a method
.sort()is made consistent with the functiondt.sort(). When the list of columns to sort is empty, both will not sort any columns.Deleting a key from the Frame (
del DT.key) no longer causes a seg.fault. #2357Casting a 0-row
str32column intostr64stype no longer goes into an infinite loop. #2369Fixed creation of a
str64column from a python list of strings when the total size of all strings is greater than 2GB. #2368Rbinding several
str32columns such that their combined string buffers have size over 2GB now properly creates astr64column as a result. #2367Fixed crash when writing to CSV a frame with many boolean columns when the option
quoting="all"is used. #2382It is no longer allowed to combine
compression="gzip"andappend=Truein.to_csv().Empty strings no longer get confused with NA strings in
.replace(). #2502dt.rbind()-ing an iterator of frames created on-the-fly no longer produces an undefined behavior. #2621
Fread¶
Added new function
iread(), which is similar tofread(), but suitable for reading multiple sources at once. The function will return an iterator of Frames.Use this function to read multiple files using a glob, or give it a list of files, or an archive containing multiple files inside, or an Excel file with multiple sheets, etc.
The function
iread()has parametererrors=which controls what shouold happen when some of the sources cannot be read. Possible values are:"warn","raise","ignore"and"store". The latter will catch the exceptions that may occur when reading each input, and return those exception objects within the iterator. #2008It is now possible to read multi-file
.tar.gzfiles usingiread(). #2392Added parameter
encodingwhich will force fread to decode the input using the specified encoding before attempting to read it. The decoding process uses standard python codecs, and is therefore single-threaded. The parameter accepts any value available via the standard python librarycodecs. #2395Added parameter
memory_limitwhich instructs fread to try to limit the amount of memory used when reading the input. This parameter is especially useful when reading files that are larger than the amount of available memory. #1750Added parameter
multiple_sourceswhich controls fread’s behavior when multiple input sources are detected (for example, if you pass a name of an archive, and the archive contains multiple files). Possible values are:"warn"(default),"error", and"ignore".Fread now displays a progress bar when downloading data from a URL. #2441
Fread now computes NA counts of all data while reading, storing them in per-column stats. For integer and floating point columns we also compute min/max value in each column. #1097
When reading from a URL, fread will now escape url-unsafe characters in that URL, so that the user doesn’t have to.
When reading Excel files, the cells with datetime or boolean types are now handled correctly, in particular a datetime value is converted into its string representation. #1701
Fread now properly detects
\r-newlines in the presence of fields with quoted\n-newlines. #1343Opening Jay file from a bytes object now produces a Frame that remains valid even after the bytes object is deleted. #2547
Function
fread()now always returns a single Frame object; previously it could return a dict of Frames if multiple sources were detected. Useiread()if you need to read multi-source input.
General¶
datatable is now fully supported on Windows.
Added exception
dt.exceptions.InvalidOperationError, which can be used to signal when an operation is requested that would be illegal for the given combination of parameters.New option
dt.options.debug.enabledwill report all calls to the internal C++ core functions, together with their timings. This may help identify performance bottlenecks, or help troubleshooting user scripts.Additional options
debug.logger,debug.report_argsanddebug.max_arg_sizeallow more granular control over the logging process. #2452Function
ifelse(cond, expr_if_true, expr_if_false)can return one of the two values based on the condition. #2411DT["max(x,y)"] = ifelse(f.x >= f.y, f.x, f.y)datatable no longer has modules
blessedandtypesentryas dependencies. #1677 #1535Added 2 new fields into the
dt.build_infostruct:.git_dateis the UTC timestamp of the git revision from which that version of datatable was built, and.git_diffwhich will be non-empty for builds from code that was modified compared to the git revision they are based on.During a fork the thread pool will now shut down completely, together with the monitor thread. The threads will then restart in both the parent and the child, when needed. #2438
Internal function
dt.internal.frame_column_data_r()now works properly with virtual columns. #2269Avoid rare deadlock when creating a frame from pandas DataFrame in a forked process, in the datatable compiled with gcc version before 7.0. #2272
Fix rare crash in the interrupt signal handler. #2282
Fixed possible crash in
rbind()andunion()when they were called with a string argument, or with an object that caused infinite recursion. #2386Column names containing backticks now display properly in error messages. #2406
Fixed rare race condition when multiple threads tried to throw an exception at the same time. #2526
All exceptions thrown by datatable are now declared in the
dt.exceptionsmodule. These exceptions are now organized to derive from the common base classdt.exceptions.DtException.The exception messages when stringified no longer contain backticks. The backticks are still emitted internally to help display the error in a color-supporting terminal, but when the exception is converted into a string via
str()orrepr(), these backticks will now be stripped. This change ensures that the exception message remains the same regardless of how it is rendered.
FTRL model¶
.nepochs, the number of epochs to train the model, can now be a float rather than an integer..fit()now throwsdt.exceptions.TypeErrorwhen ltypes in the training and validation frames are not consistent..interactionsnow throws andt.exceptions.ValueErrorinstead of adt.exceptions.TypeErrorwhen assigning interactions having zero features.Fixed inconsistency in progress reporting. #2520
Contributors¶
This release was created with the help of 9 people who contributed code and documentation, and 18 more people who submitted bug reports and feature requests.
Code & documentation contributors:
- Pasha Stetsenko
- Oleksiy Kononenko
- Samuel Oranyeli
- Pradeep Krishnamurthy
- Liu Chi
- Wes Morgan
- Juliano Faccioni
- Michal Malohlava
- Bryce Boe
Issues contributors: