datatable.str.split_into_nhot()¶

datatable.str.

split_into_nhot

(

frame,

sep=",",

sort=False

)

source tests

Split and nhot-encode a single-column frame.

Each value in the frame, having a single string column, is split according to the provided separator sep, the whitespace is trimmed, and the resulting pieces (labels) are converted into the individual columns of the output frame.

Parameters¶

frame

Frame

An input single-column frame. The column stype must be either str32 or str64.

sep

str

Single-character separator to be used for splitting.

sort

bool

An option to control whether the resulting column names, i.e. labels, should be sorted. If set to True, the column names are returned in alphabetical order, otherwise their order is not guaranteed due to the algorithm parallelization.

return

Frame

The output frame. It will have as many rows as the input frame, and as many boolean columns as there were unique labels found. The labels will also become the output column names.

except

ValueError | TypeError

dt.exceptions.ValueError: Raised if the input frame is missing or it has more than one column. It is also raised if sep is not a single-character string.
dt.exceptions.TypeError: Raised if the single column of frame has non-string stype.

Examples¶

DT = dt.Frame(["cat,dog", "mouse", "cat,mouse", "dog,rooster", "mouse,dog,cat"])
DT
C0
str32
0cat,dog
1mouse
2cat,mouse
3dog,rooster
4mouse,dog,cat
5 rows × 1 column
dt.split_into_nhot(DT)
catdogmouserooster
bool8bool8bool8bool8
01100
10010
21010
30101
41110
5 rows × 4 columns

	C0
	str32
0	cat,dog
1	mouse
2	cat,mouse
3	dog,rooster
4	mouse,dog,cat

	cat	dog	mouse	rooster
	bool8	bool8	bool8	bool8
0	1	1	0	0
1	0	0	1	0
2	1	0	1	0
3	0	1	0	1
4	1	1	1	0