Version 0.11.0¶
Version 0.11.0 | |
---|---|
Release date: | 2020-09-19 |
Next release: | Version 0.11.1 |
Previous release: | Version 0.10.1 |
Wheels | |
MacOS | python-3.5 |
python-3.6 | |
python-3.7 | |
python-3.8 | |
Linux x86-64 | python-3.5 |
python-3.6 | |
python-3.7 | |
python-3.8 | |
Linux ppc64le | python-3.5 |
python-3.6 | |
python-3.7 | |
python-3.8 | |
Windows | python-3.5 |
python-3.6 | |
python-3.7 | |
python-3.8 | |
SDist | sources |
Frame¶
Property
Frame.source
will contain the name of the file where the frame was loaded from. If the frame was modified after loading, or if it was created dynamically to begin with, this property will returnNone
.The expression
len(DT)
now works, and returns the number of columns in the Frame. This allows the Frame to be used in contexts where an iterable might be expected.Added ability to cast string columns into numeric types: int, float or boolean. #1313
String columns now support comparison operators
<
,>
,<=
and>=
. #2274String columns can now be added together, similarly to how strings can be added in Python. #1839
Added a new function
cut()
to bin numeric data to equal-width discrete intervals. #2483Added a new function
qcut()
to bin data to equal-population discrete intervals. #1680Added function
round()
which is the equivalent of Python’s built-inround()
. #2285Method
colindex()
now accepts a column selector f-expression as an argument.When creating a Frame from a python list, it is now possible to explicitly specify the stype of the resulting column by “dividing” the list by the type you need:
dt.Frame(A=[1, 5, 10] / dt.int64, B=[0, 0, 0] / dt.float64)
Added new argument
bom=False
to theto_csv()
method. If set toTrue
, it will add the Byte-Order Mark (BOM) to the output CSV file. #2379Casting a column into its own type is now a no-op. #2425
It is now possible to create a Frame from a pandas DataFrame with Categorical columns (which will be converted into strings). #2407
Method
cbind()
now throws anInvalidOperationError
instead of aValueError
if the argument frames have incompatible shapes.Method
colindex()
now throws anKeyError
when given a column that doesn’t exist in the frame, or anIndexError
if given a numeric column index outside of the allowed range. Previously it was throwing aValueError
in both cases.When creating a Frame from a list containing mixed integers/floats and strings, the resulting Frame will now have stype
str32
. Previously anobj64
column was created instead. The new behavior is more consistent with fread’s behavior when reading CSV files.Expression f[:] now excludes groupby columns when used in a groupby context. #2460
Parameters
_strategy=
into_csv()
andto_jay()
were renamed intomethod=
. The old parameter name still works, so this change is not breaking.The behaviour of
sort()
is made consistent with func:sort(). When the list of columns to sort is empty, both will not sort any columns.Deleting a key from the Frame (
del DT.key
) no longer causes a seg.fault. #2357Casting a 0-row
str32
column intostr64
stype no longer goes into an infinite loop. #2369Fixed creation of a
str64
column from a python list of strings when the total size of all strings is greater than 2GB. #2368Rbinding several
str32
columns such that their combined string buffers have size over 2GB now properly creates astr64
column as a result. #2367Fixed crash when writing to CSV a frame with many boolean columns when the option
quoting="all"
is used. #2382It is no longer allowed to combine
compression="gzip"
andappend=True
into_csv()
.Empty strings no longer get confused with NA strings in
replace()
. #2502rbind()
-ing an iterator of frames created on-the-fly no longer produces an undefined behavior. #2621
Fread¶
Added new function
iread()
, which is similar tofread()
, but suitable for reading multiple sources at once. The function will return an iterator of Frames.Use this function to read multiple files using a glob, or give it a list of files, or an archive containing multiple files inside, or an Excel file with multiple sheets, etc.
The function
iread()
has parametererrors=
which controls what shouold happen when some of the sources cannot be read. Possible values are:"warn"
,"raise"
,"ignore"
and"store"
. The latter will catch the exceptions that may occur when reading each input, and return those exception objects within the iterator. #2008It is now possible to read multi-file
.tar.gz
files usingiread()
. #2392Added parameter
encoding
which will force fread to decode the input using the specified encoding before attempting to read it. The decoding process uses standard python codecs, and is therefore single-threaded. The parameter accepts any value available via the standard python librarycodecs
. #2395Added parameter
memory_limit
which instructs fread to try to limit the amount of memory used when reading the input. This parameter is especially useful when reading files that are larger than the amount of available memory. #1750Added parameter
multiple_sources
which controls fread’s behavior when multiple input sources are detected (for example, if you pass a name of an archive, and the archive contains multiple files). Possible values are:"warn"
(default),"error"
, and"ignore"
.Fread now displays a progress bar when downloading data from a URL. #2441
Fread now computes NA counts of all data while reading, storing them in per-column stats. For integer and floating point columns we also compute min/max value in each column. #1097
When reading from a URL, fread will now escape url-unsafe characters in that URL, so that the user doesn’t have to.
When reading Excel files, the cells with datetime or boolean types are now handled correctly, in particular a datetime value is converted into its string representation. #1701
Fread now properly detects
\r
-newlines in the presence of fields with quoted\n
-newlines. #1343Opening Jay file from a bytes object now produces a Frame that remains valid even after the bytes object is deleted. #2547
Function
fread()
now always returns a single Frame object; previously it could return a dict of Frames if multiple sources were detected. Useiread()
if you need to read multi-source input.
General¶
datatable is now fully supported on Windows.
Added exception
InvalidOperationError
, which can be used to signal when an operation is requested that would be illegal for the given combination of parameters.New option
dt.options.debug.enabled
will report all calls to the internal C++ core functions, together with their timings. This may help identify performance bottlenecks, or help troubleshooting user scripts.Additional options
debug.logger
,debug.report_args
anddebug.max_arg_size
allow more granular control over the logging process. #2452Function
ifelse(cond, expr_if_true, expr_if_false)
can return one of the two values based on the condition. #2411DT["max(x,y)"] = ifelse(f.x >= f.y, f.x, f.y)
datatable no longer has modules
blessed
andtypesentry
as dependencies. #1677 #1535Added 2 new fields into the
dt.build_info
struct:.git_date
is the UTC timestamp of the git revision from which that version of datatable was built, and.git_diff
which will be non-empty for builds from code that was modified compared to the git revision they are based on.During a fork the thread pool will now shut down completely, together with the monitor thread. The threads will then restart in both the parent and the child, when needed. #2438
Internal function
frame_column_data_r()
now works properly with virtual columns. #2269Avoid rare deadlock when creating a frame from pandas DataFrame in a forked process, in the datatable compiled with gcc version before 7.0. #2272
Fix rare crash in the interrupt signal handler. #2282
Fixed possible crash in
rbind()
andunion()
when they were called with a string argument, or with an object that caused infinite recursion. #2386Column names containing backticks now display properly in error messages. #2406
Fixed rare race condition when multiple threads tried to throw an exception at the same time. #2526
All exceptions thrown by datatable are now declared in the
datatable.exceptions
module. These exceptions are now organized to derive from the common base classDtException
.The exception messages when stringified no longer contain backticks. The backticks are still emitted internally to help display the error in a color-supporting terminal, but when the exception is converted into a string via str() or repr(), these backticks will now be stripped. This change ensures that the exception message remains the same regardless of how it is rendered.
FTRL model¶
models.Ftrl.nepochs
, the number of epochs to train the model, can now be a float rather than an integer.models.Ftrl.fit()
now throwsTypeError
when ltypes in the training and validation frames are not consistent.models.Ftrl.interactions
now throws anValueError
instead of aTypeError
when assigning interactions having zero features.Fixed inconsistency in progress reporting. #2520
Contributors¶
This release was created with the help of 9 people who contributed code and documentation, and 18 more people who submitted bug reports and feature requests.
Code & documentation contributors:
- Pasha Stetsenko
- Oleksiy Kononenko
- Samuel Oranyeli
- Pradeep Krishnamurthy
- Liu Chi
- Wes Morgan
- Juliano Faccioni
- Michal Malohlava
- Bryce Boe
Issues contributors: