datatable.split_into_nhot()

split_into_nhot
(
,
sep=","
,
sort=False
)

Split and nhot-encode a single-column frame.

Each value in the frame, having a single string column, is split according to the provided separator sep, the whitespace is trimmed, and the resulting pieces (labels) are converted into the individual columns of the output frame.

Parameters

frame
Frame

An input single-column frame. The column stype must be either str32 or str64.

sep
str

Single-character separator to be used for splitting.

sort
bool

An option to control whether the resulting column names, i.e. labels, should be sorted. If set to True, the column names are returned in alphabetical order, otherwise their order is not guaranteed due to the algorithm parallelization.

return
Frame

The output frame. It will have as many rows as the input frame, and as many boolean columns as there were unique labels found. The labels will also become the output column names.

except
ValueError | TypeError
dt.exceptions.ValueError

Raised if the input frame is missing or it has more than one column. It is also raised if sep is not a single-character string.

dt.exceptions.TypeError

Raised if the single column of frame has non-string stype.

Examples

DT = dt.Frame(["cat,dog", "mouse", "cat,mouse", "dog,rooster", "mouse,dog,cat"]) DT
C0
str32
0cat,dog
1mouse
2cat,mouse
3dog,rooster
4mouse,dog,cat
dt.split_into_nhot(DT)
catdogmouserooster
bool8bool8bool8bool8
01100
10010
21010
30101
41110