datatable.Frame.init()¶

datatable.

Frame

(

_data=None

,

*

,

names=None

,

stypes=None

,

stype=None

,

**cols

)

source doc tests

Create a new Frame from a single or multiple sources.

Argument _data (or **cols) contains the source data for Frame’s columns. Column names are either derived from the data, given explicitly via the names argument, or generated automatically. Either way, the constructor ensures that column names are unique, non-empty, and do not contain certain special characters (see Name mangling for details).

Parameters¶

_data

Any

The first argument to the constructor represents the source from which to construct the Frame. If this argument is given, then the varkwd arguments **cols should not be used.

This argument can accept a wide range of data types; see the “Details” section below.

**cols

Any

Sequence of varkwd column initializers. The keys become column names, and the values contain column data. Using varkwd arguments is equivalent to passing a dict as the _data argument.

When varkwd initializers are used, the names parameter may not be given.

names

List[str|None]

Explicit list (or tuple) of column names. The number of elements in the list must be the same as the number of columns being constructed.

This parameter should not be used when constructing the frame from **cols.

stypes

List[stype-like] | Dict[str, stype-like]

Explicit list (or tuple) of column types. The number of elements in the list must be the same as the number of columns being constructed.

stype

stype | type

Similar to stypes, but provide a single type that will be used for all columns. This option cannot be specified together with stypes.

return

Frame

A Frame object is constructed and returned.

except

ValueError

The exception is raised if the lengths of names or stypes lists are different from the number of columns created, or when creating several columns and they have incompatible lengths.

Details¶

The shape of the constructed Frame depends on the type of the source argument _data (or **cols). The argument _data and varkwd arguments **cols are mutually exclusive: they cannot be used at the same time. However, it is possible to use neither and construct an empty frame:

dt.Frame()       # empty 0x0 frame
dt.Frame(None)   # same
dt.Frame([])     # same

The varkwd arguments **cols can be used to construct a Frame by columns. In this case the keys become column names, and the values are column initializers. This form is mostly used for convenience, it is equivalent to converting cols into a dict and passing as the first argument:

dt.Frame(A = range(7),
         B = [0.1, 0.3, 0.5, 0.7, None, 1.0, 1.5],
         C = ["red", "orange", "yellow", "green", "blue", "indigo", "violet"])
# equivalent to
dt.Frame({"A": range(7), "B": [0.1, 0.3, ...], "C": ["red", "orange", ...]})

The argument _data accepts a wide range of input types. The following list describes possible choices:

When the source is a non-empty list containing other lists or compound objects, then each item will be interpreted as a column initializer, and the resulting frame will have as many columns as the number of items in the list.

Each element in the list must produce a single column. Thus, it is not allowed to use multi-column Frames, or multi-dimensional numpy arrays or pandas DataFrames.

>>> dt.Frame([[1, 3, 5, 7, 11],
...           [12.5, None, -1.1, 3.4, 9.17]])
   | C0     C1
-- + --  -----
 0 |  1  12.5
 1 |  3  NA
 2 |  5  -1.1
 3 |  7   3.4
 4 | 11   9.17
--
[5 rows x 2 columns]

Note that unlike pandas and numpy, we treat a list of lists as a list of columns, not a list of rows. If you need to create a Frame from a row-oriented store of data, you can use a list of dictionaries or a list of tuples as described below.

List[Dict]

If the source is a list of dict objects, then each element in this list is interpreted as a single row. The keys in each dictionary are column names, and the values contain contents of each individual cell.

The rows don’t have to have the same number or order of entries: all missing elements will be filled with NAs:

>>> dt.Frame([{"A": 3, "B": 7},
...           {"A": 0, "B": 11, "C": -1},
...           {"C": 5}])
   |  A   B   C
-- + --  --  --
 0 |  3   7  NA
 1 |  0  11  -1
 2 | NA  NA   5
--
[3 rows x 3 columns]

If the names parameter is given, then only the keys given in the list of names will be taken into account, all extra fields will be discarded.

List[Tuple]

If the source is a list of tuples, then each tuple represents a single row. The tuples must have the same size, otherwise an exception will be raised:

>>> dt.Frame([(39, "Mary"),
...           (17, "Jasmine"),
...           (23, "Lily")], names=['age', 'name'])
   | age  name
-- + ---  -------
 0 |  39  Mary
 1 |  17  Jasmine
 2 |  23  Lily
--
[3 rows x 2 columns]

If the tuples are in fact namedtuples, then the field names will be used for the column names in the resulting Frame. No check is made whether the named tuples in fact belong to the same class.

List[Any]

If the list’s first element does not match any of the cases above, then it is considered a “list of primitives”. Such list will be parsed as a single column.

The entries are typically bools, ints, floats, strs, or Nones; numpy scalars are also allowed. If the list has elements of heterogeneous types, then we will attempt to convert them to the smallest common stype.

If the list contains only boolean values (or Nones), then it will create a column of type bool8.

If the list contains only integers (or Nones), then the resulting column will be int8 if all integers are 0 or 1; or int32 if all entries are less than \(2^{31}\) in magnitude; otherwise int64 if all entries are less than \(2^{63}\) in magnitude; or otherwise float64.

If the list contains floats, then the resulting column will have stype float64. Both None and math.nan can be used to input NA values.

Finally, if the list contains strings then the column produced will have stype str32 if the total size of the character is less than 2Gb, or str64 otherwise.

typed_list

A typed list can be created by taking a regular list and dividing it by an stype. It behaves similarly to a simple list of primitives, except that it is parsed into the specific stype.

>>> dt.Frame([1.5, 2.0, 3.87] / dt.float32).stype
stype.float32

Dict[str, Any]

The keys are column names, and values can be any objects from which a single-column frame can be constructed: list, range, np.array, single-column Frame, pandas series, etc.

Constructing a frame from a dictionary d is exactly equivalent to calling dt.Frame(list(d.values()), names=list(d.keys())).

range

Same as if the range was expanded into a list of integers, except that the column created from a range is virtual and its creation time is nearly instant regardless of the range’s length.

Frame

If the argument is a Frame, then a shallow copy of that frame will be created, same as copy().

str

If the source is a simple string, then the frame is created by fread-ing this string. In particular, if the string contains the name of a file, the data will be loaded from that file; if it is a URL, the data will be downloaded and parsed from that URL. Lastly, the string may simply contain a table of data.

>>> DT1 = dt.Frame("train.csv")
>>> DT2 = dt.Frame("""
...    Name    Age
...    Mary     39
...    Jasmine  17
...    Lily     23
... """)

pd.DataFrame | pd.Series

A pandas DataFrame (Series) will be converted into a datatable Frame. Column names will be preserved.

Column types will generally be the same, assuming they have a corresponding stype in datatable. If not, the column will be converted. For example, pandas date/time column will get converted into string, while float16 will be converted into float32.

If a pandas frame has an object column, we will attempt to refine it into a more specific stype. In particular, we can detect a string or boolean column stored as object in pandas.

np.array

A numpy array will get converted into a Frame of the same shape (provided that it is 2- or less- dimensional) and the same type.

If possible, we will create a Frame without copying the data (however, this is subject to numpy’s approval). The resulting frame will have a copy-on-write semantics.

None

When the source is not given at all, then a 0x0 frame will be created; unless a names parameter is provided, in which case the resulting frame will have 0 rows but as many columns as given in the names list.

datatable.Frame.__init__()¶

Parameters¶

Details¶

datatable.Frame.init()¶