datatable.models.kfold_random()¶
Perform randomized k-fold split of data with nrows
rows into
nsplits
train/test subsets. The dataset itself is not passed to this
function: it is sufficient to know only the number of rows in order to decide
how the data should be split.
The train/test subsets produced by this function will have the following properties:
all test folds will be of approximately the same size
nrows/nsplits
;all observations have equal ex-ante chance of getting assigned into each fold;
the row indices in all train and test folds will be sorted.
The function uses single-pass parallelized algorithm to construct the folds.
Parameters¶
int
The number of rows in the frame that you want to split.
int
Number of folds, must be at least 2
, but not larger than nrows
.
int
Seed value for the random number generator used by this function. Calling the function several times with the same seed values will produce same results each time.
See Also¶
kfold()
– Perform k-fold split.