datatable.models.kfold_random()¶
Perform randomized k-fold split of data with nrows
rows into
nsplits
train/test subsets. The dataset itself is not passed to this
function: it is sufficient to know only the number of rows in order to decide
how the data should be split.
The train/test subsets produced by this function will have the following properties:
all test folds will be of approximately the same size
nrows/nsplits
;all observations have equal ex-ante chance of getting assigned into each fold;
the row indices in all train and test folds will be sorted.
The function uses single-pass parallelized algorithm to construct the folds.
Parameters¶
int
The number of rows in the frame that you want to split.
int
Number of folds, must be at least 2
, but not larger than nrows
.
int
Seed value for the random number generator used by this function. Calling the function several times with the same seed values will produce same results each time.