datatable.sort()¶
Sort clause for use in Frame’s square-bracket selector.
When a sort() object is present inside a DT[i, j, ...]
expression, it will sort the rows of the resulting Frame according
to the columns cols passed as the arguments to sort().
When used together with by(), the sort clause applies after the
group-by, i.e. we sort elements within each group. Note, however,
that because we use stable sorting, the operations of grouping and
sorting are commutative: the result of applying groupby and then sort
is the same as the result of sorting first and then doing groupby.
When used together with i (row filter), the i filter is
applied after the sorting. For example:
DT[:10, :, sort(f.Highscore, reverse=True)]
will select the first 10 records from the frame DT ordered by
the Highscore column.
Examples¶
from datatable import dt, f, by
DT = dt.Frame({"col1": ["A", "A", "B", None, "D", "C"],
"col2": [2, 1, 9, 8, 7, 4],
"col3": [0, 1, 9, 4, 2, 3],
"col4": [1, 2, 3, 3, 2, 1]})
DT
| col1 | col2 | col3 | col4 | ||
|---|---|---|---|---|---|
| str32 | int32 | int32 | int32 | ||
| 0 | A | 2 | 0 | 1 | |
| 1 | A | 1 | 1 | 2 | |
| 2 | B | 9 | 9 | 3 | |
| 3 | NA | 8 | 4 | 3 | |
| 4 | D | 7 | 2 | 2 | |
| 5 | C | 4 | 3 | 1 |
Sort by a single column:
DT[:, :, dt.sort("col1")]
| col1 | col2 | col3 | col4 | ||
|---|---|---|---|---|---|
| str32 | int32 | int32 | int32 | ||
| 0 | NA | 8 | 4 | 3 | |
| 1 | A | 2 | 0 | 1 | |
| 2 | A | 1 | 1 | 2 | |
| 3 | B | 9 | 9 | 3 | |
| 4 | C | 4 | 3 | 1 | |
| 5 | D | 7 | 2 | 2 |
Sort by multiple columns:
DT[:, :, dt.sort("col2", "col3")]
| col1 | col2 | col3 | col4 | ||
|---|---|---|---|---|---|
| str32 | int32 | int32 | int32 | ||
| 0 | A | 1 | 1 | 2 | |
| 1 | A | 2 | 0 | 1 | |
| 2 | C | 4 | 3 | 1 | |
| 3 | D | 7 | 2 | 2 | |
| 4 | NA | 8 | 4 | 3 | |
| 5 | B | 9 | 9 | 3 |
Sort in descending order:
DT[:, :, dt.sort(-f.col1)]
| col1 | col2 | col3 | col4 | ||
|---|---|---|---|---|---|
| str32 | int32 | int32 | int32 | ||
| 0 | NA | 8 | 4 | 3 | |
| 1 | D | 7 | 2 | 2 | |
| 2 | C | 4 | 3 | 1 | |
| 3 | B | 9 | 9 | 3 | |
| 4 | A | 2 | 0 | 1 | |
| 5 | A | 1 | 1 | 2 |
The frame can also be sorted in descending order by setting the reverse parameter to True:
DT[:, :, dt.sort("col1", reverse=True)]
| col1 | col2 | col3 | col4 | ||
|---|---|---|---|---|---|
| str32 | int32 | int32 | int32 | ||
| 0 | NA | 8 | 4 | 3 | |
| 1 | D | 7 | 2 | 2 | |
| 2 | C | 4 | 3 | 1 | |
| 3 | B | 9 | 9 | 3 | |
| 4 | A | 2 | 0 | 1 | |
| 5 | A | 1 | 1 | 2 |
By default, when sorting, null values are placed at the top; to relocate null values to the bottom, pass last to the na_position parameter:
DT[:, :, dt.sort("col1", na_position="last")]
| col1 | col2 | col3 | col4 | ||
|---|---|---|---|---|---|
| str32 | int32 | int32 | int32 | ||
| 0 | A | 2 | 0 | 1 | |
| 1 | A | 1 | 1 | 2 | |
| 2 | B | 9 | 9 | 3 | |
| 3 | C | 4 | 3 | 1 | |
| 4 | D | 7 | 2 | 2 | |
| 5 | NA | 8 | 4 | 3 |
Passing remove to na_position completely excludes any row with null values from the sorted output:
DT[:, :, dt.sort("col1", na_position="remove")]
| col1 | col2 | col3 | col4 | ||
|---|---|---|---|---|---|
| str32 | int32 | int32 | int32 | ||
| 0 | A | 2 | 0 | 1 | |
| 1 | A | 1 | 1 | 2 | |
| 2 | B | 9 | 9 | 3 | |
| 3 | C | 4 | 3 | 1 | |
| 4 | D | 7 | 2 | 2 |
Sort by multiple columns, descending and ascending order:
DT[:, :, dt.sort(-f.col2, f.col3)]
| col1 | col2 | col3 | col4 | ||
|---|---|---|---|---|---|
| str32 | int32 | int32 | int32 | ||
| 0 | B | 9 | 9 | 3 | |
| 1 | NA | 8 | 4 | 3 | |
| 2 | D | 7 | 2 | 2 | |
| 3 | C | 4 | 3 | 1 | |
| 4 | A | 2 | 0 | 1 | |
| 5 | A | 1 | 1 | 2 |
The same code above can be replicated by passing a list of booleans to reverse:
DT[:, :, dt.sort("col2", "col3", reverse=[True, False])]
| col1 | col2 | col3 | col4 | ||
|---|---|---|---|---|---|
| str32 | int32 | int32 | int32 | ||
| 0 | B | 9 | 9 | 3 | |
| 1 | NA | 8 | 4 | 3 | |
| 2 | D | 7 | 2 | 2 | |
| 3 | C | 4 | 3 | 1 | |
| 4 | A | 2 | 0 | 1 | |
| 5 | A | 1 | 1 | 2 |
In the presence of by(), sort() sorts within each group:
DT[:, :, by("col4"), dt.sort(f.col2)]
| col4 | col1 | col2 | col3 | ||
|---|---|---|---|---|---|
| int32 | str32 | int32 | int32 | ||
| 0 | 1 | A | 2 | 0 | |
| 1 | 1 | C | 4 | 3 | |
| 2 | 2 | A | 1 | 1 | |
| 3 | 2 | D | 7 | 2 | |
| 4 | 3 | NA | 8 | 4 | |
| 5 | 3 | B | 9 | 9 |