datatable.str.split_into_nhot()¶
Split and nhot-encode a single-column frame.
Each value in the frame, having a single string column, is split according
to the provided separator sep, the whitespace is trimmed, and
the resulting pieces (labels) are converted into the individual columns
of the output frame.
Parameters¶
FrameAn input single-column frame. The column stype must be either str32
or str64.
strSingle-character separator to be used for splitting.
boolAn option to control whether the resulting column names, i.e. labels,
should be sorted. If set to True, the column names are returned in
alphabetical order, otherwise their order is not guaranteed
due to the algorithm parallelization.
FrameThe output frame. It will have as many rows as the input frame, and as many boolean columns as there were unique labels found. The labels will also become the output column names.
ValueError | TypeErrordt.exceptions.ValueErrorRaised if the input frame is missing or it has more than one column. It is also raised if
sepis not a single-character string.dt.exceptions.TypeErrorRaised if the single column of
framehas non-string stype.
Examples¶
DT = dt.Frame(["cat,dog", "mouse", "cat,mouse", "dog,rooster", "mouse,dog,cat"])
DT
| C0 | ||
|---|---|---|
| str32 | ||
| 0 | cat,dog | |
| 1 | mouse | |
| 2 | cat,mouse | |
| 3 | dog,rooster | |
| 4 | mouse,dog,cat |
dt.split_into_nhot(DT)
| cat | dog | mouse | rooster | ||
|---|---|---|---|---|---|
| bool8 | bool8 | bool8 | bool8 | ||
| 0 | 1 | 1 | 0 | 0 | |
| 1 | 0 | 0 | 1 | 0 | |
| 2 | 1 | 0 | 1 | 0 | |
| 3 | 0 | 1 | 0 | 1 | |
| 4 | 1 | 1 | 1 | 0 |