Frame

class datatable.Frame

Two-dimensional column-oriented table of data. Each column has its own name and type. Types may vary across columns but cannot vary within each column.

Internally the data is stored as C primitives, and processed using multithreaded native C++ code.

This is a primary data structure for the datatable module.

keys()
ltypes

The tuple of each column’s ltypes (“logical types”)

materialize()

Force all data in the Frame to be laid out physically.

In datatable, a Frame may contain “virtual” columns, i.e. columns whose data is computed on-the-fly. This allows us to have better performance for certain types of computations, while also reduce the total memory footprint. The use of virtual columns is generally transparent to the user, and datatable will materialize them as needed.

However there could be situations where you might want to materialize your Frame explicitly. In particular, materialization will carry out all delayed computations and break internal references on other Frames’ data. Thus, for example if you subset a large frame to create a smaller subset, then the new frame will carry an internal reference to the original, preventing it from being garbage-collected. However, if you materialize the small frame, then the data will be physically copied, allowing the original frame’s memory to be freed.

Parameters:to_memory (bool) –

If True, then, in addition to de-virtualizing all columns, this method will also copy all memory-mapped columns into the RAM.

When you open a Jay file, the Frame that is created will contain memory-mapped columns whose data still resides on disk. Calling .materialize(to_memory=True) will force the data to be loaded into the main memory. This may be beneficial if you are concerned about the disk speed, or if the file is on a removable drive, or if you want to delete the source file.

Returns:
Return type:None, this operation applies to the Frame and modifies it in-place.
max()
max1()
mean()
mean1()
min()
min1()
mode()
mode1()
ncols

Number of columns in the Frame

ndims

Number of dimensions in the Frame, always 2

nmodal()
nmodal1()
nrows

Number of rows in the Frame.

Assigning to this property will change the height of the Frame, either by truncating if the new number of rows is smaller than the current, or filling with NAs if the new number of rows is greater.

Increasing the number of rows of a keyed Frame is not allowed.

nunique()
nunique1()
rbind()

Append rows of frames to the current frame.

This is equivalent to list.extend() in Python: the frames are combined by rows, i.e. rbinding a frame of shape [n x k] to a Frame of shape [m x k] produces a frame of shape [(m + n) x k].

This method modifies the current frame in-place. If you do not want the current frame modified, then use dt.rbind() function.

If frame(s) being appended have columns of types different from the current frame, then these columns will be promoted to the largest of their types: bool -> int -> float -> string.

If you need to append multiple frames, then it is more efficient to collect them into an array first and then do a single rbind(), than it is to append them one-by-one.

Appending data to a frame opened from disk will force loading the current frame into memory, which may fail with an OutOfMemory exception if the frame is sufficiently big.

Parameters:
  • frames (sequence or list of Frames) – One or more frame to append. These frames should have the same columnar structure as the current frame (unless option force is used).
  • force (bool) – If True, then the frames are allowed to have mismatching set of columns. Any gaps in the data will be filled with NAs.
  • bynames (bool) – If True (default), the columns in frames are matched by their names. For example, if one frame has columns [“colA”, “colB”, “colC”] and the other [“colB”, “colA”, “colC”] then we will swap the order of the first two columns of the appended frame before performing the append. However if bynames is False, then the column names will be ignored, and the columns will be matched according to their order, i.e. i-th column in the current frame to the i-th column in each appended frame.
sd()
sd1()
shape

Tuple with (nrows, ncols) dimensions of the Frame

sort()

Sort frame by the specified column(s).

Parameters:cols (List[str | int]) – Names or indices of the columns to sort by. If no columns are given, the Frame will be sorted on all columns.
Returns:
  • New Frame sorted by the provided column(s). The current frame
  • remains unmodified.
stype

The common stype for all columns.

This property is well-defined only for frames where all columns share the same stype. For heterogeneous frames accessing this property will raise an error. For 0-column frames this property returns None.

stypes

The tuple of each column’s stypes (“storage types”)

sum()
sum1()
to_dict()

Convert the Frame into a dictionary of lists, by columns.

Returns a dictionary with ncols entries, each being the colname: coldata pair, where colname is a string, and coldata is an array of column’s data.

Examples

>>> DT = dt.Frame(A=[1, 2, 3], B=["aye", "nay", "tain"])
>>> DT.to_dict()
{"A": [1, 2, 3], "B": ["aye", "nay", "tain"]}
to_jay()

Save this frame to a binary file on disk, in .jay format.

Parameters:
  • path (str) – The destination file name. Although not necessary, we recommend using extension “.jay” for the file. If the file exists, it will be overwritten. If this argument is omitted, the file will be created in memory instead, and returned as a bytes object.
  • _strategy ('mmap' | 'write' | 'auto') – Which method to use for writing the file to disk. The “write” method is more portable across different operating systems, but may be slower. This parameter has no effect when path is omitted.
to_list()

Convert the Frame into a list of lists, by columns.

Returns a list of ncols lists, each inner list representing one column of the Frame.

Examples

>>> DT = dt.Frame(A=[1, 2, 3], B=["aye", "nay", "tain"])
>>> DT.to_list()
[[1, 2, 3], ["aye", "nay", "tain"]]
to_numpy()

Convert frame into a 2D numpy array, optionally forcing it into the specified stype/dtype.

In a limited set of circumstances the returned numpy array will be created as a data view, avoiding copying the data. This happens if all of these conditions are met:

  • the frame is not a view;
  • the frame has only 1 column;
  • the column’s type is not string;
  • the stype argument was not used.

In all other cases the returned numpy array will have a copy of the frame’s data. If the frame has multiple columns of different stypes, then the values will be upcasted into the smallest common stype.

If the frame has any NA values, then the returned numpy array will be an instance of numpy.ma.masked_array.

Parameters:
  • stype (datatable.stype, numpy.dtype or str) – Cast frame into this stype before converting it into a numpy array.
  • column (int) – Convert only the specified column; the returned value will be a 1D-array instead of a regular 2D-array.
to_pandas()

Convert this frame to a pandas DataFrame.

The pandas module is required to run this function.

to_tuples()

Convert the Frame into a list of tuples, by rows.

Returns a list having nrows tuples, where each tuple has length ncols and contains data from each respective row of the Frame.

Examples

>>> DT = dt.Frame(A=[1, 2, 3], B=["aye", "nay", "tain"])
>>> DT.to_tuples()
[(1, "aye"), (2, "nay"), (3, "tain")]
view()