Using datatable

This section describes common functionality and commands supported by datatable.

Create a Frame

Frame can be created from a variety of sources. For instance, from a numpy array:

import datatable as dt import numpy as np np.random.seed(1) NP = np.random.randn(1000000) dt.Frame(NP)
C0
float64
01.62435
1-0.611756
2-0.528172
3-1.07297
40.865408
5-2.30154
61.74481
7-0.761207
80.319039
9-0.24937
101.46211
11-2.06014
12-0.322417
13-0.384054
141.13377
9999950.0595784
9999960.140349
999997-0.596161
9999981.18604
9999990.313398

From pandas DataFrame:

import pandas as pd PD = pd.DataFrame({"A": range(1000)}) dt.Frame(PD)
A
int64
00
11
22
33
44
55
66
77
88
99
1010
1111
1212
1313
1414
995995
996996
997997
998998
999999

Or from a raw Python object:

dt.Frame({"n": [1, 3], "s": ["foo", "bar"]})
ns
int32str32
01foo
13bar

Convert a Frame

An existing frame DT can be converted to other formats, including numpy arrays, pandas DataFrames, Python objects, and CSV files:

NP = DT.to_numpy() PD = DT.to_pandas() PY = DT.to_list() DT.to_csv("out.csv")

Parse CSV Files

datatable provides fast and convenient way to parse CSV files via dt.fread() function:

DT = dt.fread("in.csv")

The datatable parser

  • Automatically detects separators, headers, column types, quoting rules, etc.

  • Reads from file, URL, shell, raw text, archives, glob

  • Provides multi-threaded file reading for maximum speed

  • Includes a progress indicator when reading large files

  • Reads both RFC4180-compliant and non-compliant files

Save a Frame

Save a Frame into a binary JAY format on disk, later open it instantly, regardless of the data size:

DT.to_jay("out.jay") DT2 = dt.open("out.jay")

Basic Frame Properties

Basic Frame properties include:

DT.shape # (nrows, ncols) DT.names # column names DT.types # column types

Compute Frame Statistics

Compute per-column summary statistics using the following Frame’s methods:

DT.sum() DT.max() DT.min() DT.mean() DT.sd() DT.mode() DT.nmodal() DT.nunique()

Select Subsets of Rows or Columns

Select subsets of rows or columns by using DT[i,j,...] selector:

DT[:, "A"] # select 1 column DT[:10, :] # first 10 rows DT[::-1, "A":"D"] # reverse rows order, columns from A to D DT[27, 3] # single element in row 27, column 3 (0-based)

Delete Rows or Columns

Delete rows or columns with del:

del DT[:, "D"] # delete column D del DT[f.A < 0, :] # delete rows where column A has negative values

Filter Rows

Filter rows via an f-expression:

from datatable import mean, sd, f DT[(f.A > mean(f.B) + 2.5 * sd(f.B)) | (f.A < -mean(f.B) - sd(f.B)), :]

Compute Columnar Expressions

f-expressions could also be used to compute columnar expressions:

DT[:, {"A": f.A, "B": f.B, "A+B": f.A + f.B, "A-B": f.A - f.B}]

Sort Columns

Sort columns via Frame.sort() or via dt.sort():

DT.sort("A") DT[:, :, dt.sort(f.A)]

Perform Groupby Calculations

Perform groupby calculations using:

DT[:, mean(f.A), dt.by("B")]

Append Rows or Columns

Append rows to the existing frame by using Frame.rbind():

DT.rbind(DT2)

Append columns by using Frame.cbind():

DT.cbind(DT2)