Version 0.3.0¶
Version 0.3.0 | |
---|---|
Release date: | 2018-03-19 |
Next release: | Version 0.3.1 |
Previous release: | Version 0.2.2 |
Wheels | |
MacOS | python-3.5 |
Method
Frame.tonumpy()
now has argumentstype
which will force conversion into a numpy array of the specific stype.Enums
stype
andltype
that encapsulate the type-system of thedatatable
module.It is now possible to fread from a
bytes
object.Allow columns to be renamed by setting the
names
property on the frame.Internal “MemoryMapManager” will make datatable more robust when opening a frame with many columns on Linux systems. In particular, error 12 “not enough memory” should become much more rare now.
Number of threads used by fread can now be controlled via parameter
nthreads
.It is now possible to supply string argument to
dt.DataTable
constructor, which in turn will try to interpret that argument viafread()
.fread()
can now read compressed.xz
files.fread()
now automatically skips Ctrl+Z / NUL characters at the end of the file.It is now possible to create a datatable from string numpy array.
Added parameters
skip_blank_lines
,strip_white
,quotechar
anddec
tofread()
.Single-column files with blank lines can now be read successfully.
Fread now recognizes
\r\r\n
as a valid line ending.Added parameters
url
andcmd
tofread()
, as well as ability to detect URLs automatically. Theurl
parameter downloads file from HTTP/HTTPS/FTP server into a temporary location and reads it from there. Thecmd
parameter executesthe provided shell command and then reads the data from the stdout.It is now possible to pass
file
objects tofread()
(or any objects exposing methodread()
).File path given to
fread()
can now transparently select files within.zip
archives. This doesn’t work with archives-within-archives.GenericReader now supports auto-detecting and reading UTF-16 files.
GenericReader now attempts to detect whether the input file is an HTML, and if so raises an exception with the appropriate error message.
Datatable can now use either
llvm-4.0
orllvm-5.0
depending on what the user has.fread()
now allowssep=""
, causing the file to be read line-by-line.range
arguments can now be passed to a DataTable constructor.Datatable will now fall back to eager execution if it cannot detect LLVM runtime.
Added simple Excel file reader.
It is now possible to select columns from DataTable by type:
df[int]
selects all integer columns fromdf
.Allow creating DataTable from list, while forcing a specific stype(s).
Added ability to delete rows from a DataTable:
del df[rows, :]
DataTable can now accept pandas/numpy frames with columns of float16 dtype (which will be automatically converted to
float32
).isna()
function now works on strings too.save()
is now a method ofFrame
class.Warnings now have custom display hook.
Added global option
nthreads
which control the number of Omp threads used bydatatable
for parallel execution. Example:dt.options.nthreads = 1
.Add method
.scalar()
to quickly convert a 1x1 Frame into a python scalar.New methods
min1()
,max1()
,mean1()
,sum1()
,sd1()
,countna1()
that are similar tomin()
,max()
, etc. but return a scalar instead of a Frame (however they only work with a 1-column Frames).Implemented method
nunique()
to compute the number of unique values in each column.Added stats functions
mode()
andnmodal()
.When writing “round” doubles/floats to CSV, they’ll now always have trailing zero. For example,
[0.0, 1.0, 1e23]
now produces"0.0,1.0,1.0e+23"
instead of"0,1,1e+23"
.df.stypes
now returns a tuple ofstype
elements (previously it was returning a list of strings). Likewise,df.types
was renamed intodf.ltypes
and now it returns a tuple ofltype
elements instead of strings.Parameter
colnames=
in DataTable constructor was renamed tonames=
. The old parameter may still be used, but it will result in a warning.DataTable can no longer have duplicate column names. If such names are given, they will be mangled to make them unique, and a warning will be issued.
Special characters (in the ASCII range
\x00 - \x1F
) are no longer permitted in the column names. If encountered, they will be replaced with a dot.
.Fread now ignores trailing whitespace on each line, even if
' '
separator is used.Fread on an empty file now produces an empty DataTable, instead of an exception.
Fread’s parameter
skip_lines
was replaced withskip_to_line
, so that it’s more in sync with the similar argumentskip_to_string
.When saving datatable containing
obj64
columns, they will no longer be saved, and user warning will be shown (previously saving this column would eventually lead to a segfault).DataTable
class was renamed intoFrame
.“eager” evaluation engine is now the default.
Parameter
inplace
of methodrbind()
was removed: instead you can now rbind frames to an empty frame:dt.Frame().rbind(df1, df2)
.datatable
will no longer cause the C locale settings to change upon importing.reading a csv file with invalid UTF-8 characters in column names will no longer throw an exception.
creating a
DataTable
frompandas.Series
with explicitcolnames
will no longer ignore those column names.fread(fill=True)
will correctly fill missing fields with NAs.fread(columns=set(...))
will correctly handle the case when the input contains multiple columns with the same names.fread will no longer crash if the input dataset contains invalid utf8/win1252 data in the column headers. #594 #628
fixed bug in exception handling, which occasionally caused empty exception messages.
fixed bug in fread where string fields starting with “NaN” caused an assertion error.
Fixed bug when saving a
DataTable
with unicode column names into.nff
format on systems where default encoding is not unicode-aware.Quoted fields are now correctly unquoted in
fread()
.Fixed a bug in fread which occurred if the number of rows in the CSV file was estimated too low. #664
Fixed fread bug where an invalid
DataTable
was constructed if parametermax_nrows
was used and there were any string columns. #671Fixed a rare bug in fread which produced error message “Jump X did not finish reading where jump X+1 started”. #682
Prevented memory leak when using
PyObject
columns in conjunction withnumpy
.View frames can now be properly saved.
Fixed crash when sorting view frame by a string column.
Deleting 0 columns is no longer an error.
Rows filter now works properly when applied to a view table and using “eager” evaluation engine.
Computed columns expression can now be combined with rows expression, or applied to a view Frame.
Contributors¶
This release was created with the help of 5 people who contributed code and documentation, and 6 more people who submitted bug reports and feature requests.
Code & documentation contributors:
Issues contributors: