datatable.Frame.__getitem__()¶
The main method for accessing data and computing on the frame. Sometimes
we also refer to it as the DT[i, j, ...]
call.
Since Python does not support keyword arguments inside square brackets,
all arguments are positional. The first is the row selector i
, the
second is the column selector j
, and the rest are optional. Thus,
DT[i, j]
selects rows i
and columns j
from frame DT
.
If an additional by
argument is present, then the selectors i
and
j
work within groups generated by the by()
expression. The sort
argument reorders the rows of the frame, and the join
argument allows
performing SQL joins between several frames.
The signature listed here is the most generic. But there are also
special-case signatures DT[j]
and DT[i, j]
described below.
Parameters¶
int
| slice
| Frame
| FExpr
| List[bool]
| List[Any]
The row selector.
If this is an integer or a slice, then the behavior is the same as in
Python when working on a list with .nrows
elements. In particular, the integer value must be within the range
[-nrows; nrows)
. On the other hand when i
is a slice, then either
its start or end or both may be safely outside the row-range of the
frame. The trivial slice :
always selects all rows.
i
may also be a single-column boolean Frame. It must have the
same number of rows as the current frame, and it serves as a mask for
which rows are to be selected: True
indicates that the row should
be included in the result, while False
and None
skips the row.
i
may also be a single-column integer Frame. Such column specifies
directly which row indices are to be selected. This is more flexible
compared to a boolean column: the rows may be repeated, reordered,
omitted, etc. All values in the column i
must be in the range
[0; nrows)
or an error will be thrown. In particular, negative
indices are not allowed. Also, if the column contains NA values, then
it would produce an “invalid row”, i.e. a row filled with NAs.
i
may also be an expression, which must evaluate into a single
column, either boolean or integer. In this case the result is the
same as described above for a single-column frame.
When i
is a list of booleans, then it is equivalent to a single-column
boolean frame. In particular, the length of the list must be equal to
.nrows
.
Finally, i
can be a list of any of the above (integers, slices, frames,
expressions, etc), in which case each element of the list is evaluated
separately and then all selected rows are put together. The list may
contain None
s, which will be simply skipped.
int
| str
| slice
| list
| dict
| type
| FExpr
| update
This argument may either select columns, or perform computations with the columns.
int
Select a single column at the specified index. A
dt.exceptions.IndexError
is raised ifj
is not in the range[-ncols; ncols)
.str
Select a single column by name. A
dt.exceptions.KeyError
is raised if the column with such a name does not exist.:
This is a trivial slice, and it means “select everything”, and is roughly equivalent to SQL’s
*
. In the simple case ofDT[i, j]
call “selecting everything” means all columns from frameDT
. However, when theby()
clause is added, then:
will now select all columns except those used in the groupby. And if the expression has ajoin()
, then “selecting everything” will produce all columns from all frames, excluding those that were duplicate during a natural join.slice[int]
An integer slice can be used to select a subset of columns. The behavior of a slice is exactly the same as in base Python.
slice[str]
A string slice is an expression like
"colA":"colZ"
. In this case all columns from"colA"
to"colZ"
inclusive are selected. And if"colZ"
appears before"colA
” in the frame, then the returned columns will be in the reverse order.Both endpoints of the slice must be valid columns (or omitted), or otherwise a
dt.exceptions.KeyError
will be raised.type
|stype
|ltype
Select only columns of the matching type.
FExpr
An expression formula is computed within the current evaluation context (i.e. it takes into account the current frame, the filter
i
, the presence of groupby/join parameters, etc). The result of this evaluation is used as-if that colum existed in the frame.List[bool]
If
j
is a list of boolean values, then it must have the length of.ncols
, and it describes which columns are to be selected into the result.List[Any]
The
j
can also be a list of elements of any other type listed above, with the only restriction that the items must be homogeneous. For example, you can mixint
s andslice[int]
s, but notint
s andFExpr
s, orint
s andstr
s.Each item in the list will be evaluated separately (as if each was the sole element in
j
), and then all the results will be put together.Dict[str, FExpr]
A dictionary can be used to select columns/expressions similarly to a list, but assigning them explicit names.
update
As a special case, the
j
argument may be theupdate()
function, which turns the selection operation into an update. That is, instead of returning the chosen rows/columns, they will be updated instead with the user-supplied values.
by
When by()
clause is present in the square brackets, the rest of the
computations are carried out within the “context of a groupby”. This
should generally be equivalent to (a) splitting the frame into separate
sub-frames corresponding to each group, (b) applying DT[i, j]
separately within each group, (c) row-binding the results for each
group. In practice the following operations are affected:
all reduction operators such as
dt.min()
ordt.sum()
now work separately within each group. Thus, instead of computing sum over the entire column, it is computed separately within each group inby()
, and the resulting column will have as many rows as the number of groups.certain
i
expressions are re-interpreted as being applied within each group. For example, ifi
is an integer or a slice, then it will now be selecting row(s) within each group.certain functions (such as
dt.shift()
) are also “group-aware”, and produce results that take into account the groupby context. Check documentation for each individual function to find out whether it has special treatment for groupby contexts.
In addition, by()
also affects the order of columns in the output
frame. Specifically, all columns listed as the groupby keys will be
automatically placed at the front of the resulting frame, and also
excluded from :
or f[:]
within j
.
sort
This argument can be used to rearrange rows in the resulting frame.
See sort()
for details.
join
Performs a JOIN operation with another frame. The
join()
clause will calculate how the rows
of the current frame match against the rows of the joined frame, and
allow you to refer to the columns of the joined frame within i
, j
or by
. In order to access columns of the joined frame use
namespace g.
.
This parameter may be listed multiple times if you need to join with
several frames. However, for the moment, g.
can only access
columns of the first joined frame.
Details¶
The order of evaluation of expressions is that first the join
clause(s)
are computed, creating a mapping between the rows of the current frame and
the joined frame(s). After that we evaluate by
+sort
. Next, the i
filter is applied creating the final index of rows that will be selected.
Lastly, we evaluate the j
part, taking into account the current groupby
and row index(es).
When evaluating j
, it is essentially converted into a tree (DAG) of
expressions, where each expression is evaluated from the bottom up. That
is, we start evaluating from the leaf nodes (which are usually column
selectors such as f[0]
), and then at each convert the set of columns
into a new set. Importantly, each subexpression node may produce columns
of 3 types: “scalar”, “grouped”, and “full-size”. Whenever subexpressions
of different levels are mixed together, they are upgraded to the highest
level. Thus, a scalar may be reused for each group, and a grouped column
can interoperate with a regular column by auto-expanding in such a way
that it becomes constant within each group.
If, after the j
is fully evaluated, it produces a column set of type
“grouped”, then the resulting frame will have as many rows as there are
groups. If, on the other hand, the column set is “full-size”, then the
resulting frame will have as many rows as the original frame.
See Also¶
DT[i, j, ...] = R
– update values in the frame.del DT[i, j, ...]
– delete rows/columns of the frame.
Extract a single column j
from the frame.
The single-argument version of DT[i, j]
works only for j
being
either an integer (indicating column index) or a string (column name).
If you need any other way of addressing column(s) of the frame, use the
more versatile DT[:, j]
form.
Parameters¶
int
| str
The index or name of a column to retrieve.
Frame
Single-column frame containing the column at the specified index or with the given name.
KeyError
| IndexError
raised if the column with the given name does not exist in the frame. |
|
raised if the column does not exist at the provided
index |