datatable.models.kfold()¶
Perform k-fold split of data with nrows
rows into nsplits
train/test
subsets. The dataset itself is not passed to this function:
it is sufficient to know only the number of rows in order to decide
how the data should be split.
The range [0; nrows)
is split into nsplits
approximately equal parts,
i.e. folds, and then each i
-th split will use the i
-th fold as a
test part, and all the remaining rows as the train part. Thus, i
-th split is
comprised of:
train rows:
[0; i*nrows/nsplits) + [(i+1)*nrows/nsplits; nrows)
;test rows:
[i*nrows/nsplits; (i+1)*nrows/nsplits)
.
where integer division is assumed.
Parameters¶
int
The number of rows in the frame that is going to be split.
int
Number of folds, must be at least 2
, but not larger than nrows
.
List[Tuple]
This function returns a list of nsplits
tuples (train_rows, test_rows)
,
where each component of the tuple is a rows selector that can be applied
to any frame with nrows
rows to select the desired folds.
Some of these row selectors will be simple python ranges,
others will be single-column Frame objects.
See Also¶
kfold_random()
– Perform randomized k-fold split.