datatable.split_into_nhot()¶
Split and nhot-encode a single-column frame.
Each value in the frame, having a single string column, is split according
to the provided separator sep, the whitespace is trimmed, and
the resulting pieces (labels) are converted into the individual columns
of the output frame.
Parameters¶
FrameAn input single-column frame. The column stype must be either str32
or str64.
strSingle-character separator to be used for splitting.
boolAn option to control whether the resulting column names, i.e. labels,
should be sorted. If set to True, the column names are returned in
alphabetical order, otherwise their order is not guaranteed
due to the algorithm parallelization.
FrameThe output frame. It will have as many rows as the input frame, and as many boolean columns as there were unique labels found. The labels will also become the output column names.
ValueErrorThe exception is raised if the input frame is missing or it has more
than one column. It is also raised if sep is not a single-character
string.
TypeErrorThe exception is raised if the single column of frame has a type
different from string.
Examples¶
DT = dt.Frame(["cat,dog", "mouse", "cat,mouse", "dog,rooster", "mouse,dog,cat"])
| C0 | |
|---|---|
| ▪▪▪▪ | |
| 0 | cat,dog |
| 1 | mouse |
| 2 | cat,mouse |
| 3 | dog,rooster |
| 4 | mouse,dog,cat |
split_into_nhot(DT)
| cat | dog | mouse | rooster | |
|---|---|---|---|---|
| ▪ | ▪ | ▪ | ▪ | |
| 0 | 1 | 1 | 0 | 0 |
| 1 | 0 | 0 | 1 | 0 |
| 2 | 1 | 0 | 1 | 0 |
| 3 | 0 | 1 | 0 | 1 |
| 4 | 1 | 1 | 1 | 0 |