Using datatable

This section describes common functionality and commands that you can run in datatable.

Create Frame

You can create a Frame from a variety of sources, including numpy arrays, pandas DataFrames, raw Python objects, etc:

import datatable as dt import numpy as np np.random.seed(1) dt.Frame(np.random.randn(1000000))
C0
float64
01.62435
1-0.611756
2-0.528172
3-1.07297
40.865408
5-2.30154
61.74481
7-0.761207
80.319039
9-0.24937
101.46211
11-2.06014
12-0.322417
13-0.384054
141.13377
9999950.0595784
9999960.140349
999997-0.596161
9999981.18604
9999990.313398
import pandas as pd pf = pd.DataFrame({"A": range(1000)}) dt.Frame(pf)
A
int64
00
11
22
33
44
55
66
77
88
99
1010
1111
1212
1313
1414
995995
996996
997997
998998
999999
dt.Frame({"n": [1, 3], "s": ["foo", "bar"]})
ns
int32str32
01foo
13bar

Convert a Frame

Convert an existing Frame into a numpy array, a pandas DataFrame, or a pure Python object:

nparr = DT.to_numpy() pddfr = DT.to_pandas() pyobj = DT.to_list()

Parse Text (csv) Files

datatable provides fast and convenient parsing of text (csv) files:

DT = dt.fread("train.csv")

The datatable parser

  • Automatically detects separators, headers, column types, quoting rules, etc.

  • Reads from file, URL, shell, raw text, archives, glob

  • Provides multi-threaded file reading for maximum speed

  • Includes a progress indicator when reading large files

  • Reads both RFC4180-compliant and non-compliant files

Write the Frame

Write the Frame’s content into a csv file (also multi-threaded):

DT.to_csv("out.csv")

Save a Frame

Save a Frame into a binary format on disk, then open it later instantly, regardless of the data size:

DT.to_jay("out.jay") DT2 = dt.open("out.jay")

Basic Frame Properties

Basic Frame properties include:

print(DT.shape) # (nrows, ncols) print(DT.names) # column names print(DT.stypes) # column types

Compute Per-Column Summary Stats

Compute per-column summary stats using:

DT.sum() DT.max() DT.min() DT.mean() DT.sd() DT.mode() DT.nmodal() DT.nunique()

Select Subsets of Rows/Columns

Select subsets of rows and/or columns using:

DT[:, "A"] # select 1 column DT[:10, :] # first 10 rows DT[::-1, "A":"D"] # reverse rows order, columns from A to D DT[27, 3] # single element in row 27, column 3 (0-based)

Delete Rows/Columns

Delete rows and or columns using:

del DT[:, "D"] # delete column D del DT[f.A < 0, :] # delete rows where column A has negative values

Filter Rows

Filter rows via an expression using the following. In this example, mean, sd, f are all symbols imported from datatable:

DT[(f.x > mean(f.y) + 2.5 * sd(f.y)) | (f.x < -mean(f.y) - sd(f.y)), :]

Compute Columnar Expressions

Compute columnar expressions using:

DT[:, {"x": f.x, "y": f.y, "x+y": f.x + f.y, "x-y": f.x - f.y}]

Sort Columns

Sort columns using:

DT.sort("A") DT[:, :, sort(f.A)]

Perform Groupby Calculations

Perform groupby calculations using:

DT[:, mean(f.x), by("y")]

Append Rows/Columns

Append rows/columns to a Frame using Frame.cbind():

DT1.cbind(DT2, DT3) DT1.rbind(DT4, force=True)