Ftrl

class datatable.models.Ftrl

Follow the Regularized Leader (FTRL) model.

FTRL model is a datatable implementation of the FTRL-Proximal online learning algorithm for binomial logistic regression. It uses a hashing trick for feature vectorization and the Hogwild approach for parallelization. Multinomial classification and regression for continuous targets are implemented experimentally.

See this reference for more details: https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf

Parameters:
  • alpha (float) – alpha in per-coordinate learning rate formula, defaults to 0.005.
  • beta (float) – beta in per-coordinate learning rate formula, defaults to 1.
  • lambda1 (float) – L1 regularization parameter, defaults to 0.
  • lambda2 (float) – L2 regularization parameter, defaults to 0.
  • nbins (int) – Number of bins to be used for the hashing trick, defaults to 10**6.
  • mantissa_nbits (int) – Number of bits from mantissa to be used for hashing floats, defaults to 10.
  • nepochs (int) – Number of training epochs, defaults to 1.
  • double_precision (bool) – Whether to use double precision arithmetic or not, defaults to False.
  • negative_class (bool) – Whether to create and train on a ‘negative’ class in the case of multinomial classification.
  • interactions (list or tuple) – A list or a tuple of interactions. In turn, each interaction should be a list or a tuple of feature names, where each feature name is a column name from the training frame.
  • model_type (str) – Model type can be one of the following: ‘binomial’ for binomial classification, ‘multinomial’ for multinomial classification, and ‘regression’ for numeric regression. Defaults to ‘auto’, meaning that the model type will be automatically selected based on the target column stype.
alpha

alpha in per-coordinate learning rate formula.

beta

beta in per-coordinate learning rate formula.

colname_hashes

Column name hashes.

colnames

Column names.

double_precision

Whether to use double precision arithmetic or not.

feature_importances

Two-column frame with feature names and the corresponding feature importances normalized to [0; 1].

fit()

Train FTRL model on a dataset.

Parameters:
  • X_train (Frame) – Training frame of shape (nrows, ncols).
  • y_train (Frame) – Target frame of shape (nrows, 1).
  • X_validation (Frame) – Validation frame of shape (nrows, ncols).
  • y_validation (Frame) – Validation target frame of shape (nrows, 1).
  • nepochs_validation (float) – Parameter that specifies how often, in epoch units, validation error should be checked.
  • validation_error (float) – If within nepochs_validation relative validation error does not improve by at least validation_error, training stops.
  • validation_average_niterations (int) – Number of iterations that is used to calculate average loss. Here, each iteration corresponds to nepochs_validation epochs.
Returns:

  • A tuple consisting of two elements (epoch and loss, where)
  • epoch is the epoch at which model fitting stopped, and loss is the final
  • loss. When validation dataset is not provided, epoch returned is equal to
  • nepochs, and loss is float(‘nan’).

interactions

A list or a tuple of interactions. In turn, each interaction should be a list or a tuple of feature names, where each feature name is a column name from the training frame.

labels

Frame of labels used for classification.

lambda1

L1 regularization parameter.

lambda2

L2 regularization parameter.

mantissa_nbits

Number of bits from mantissa to be used for hashing floats.

model

Model frame of shape (nbins, 2 * nlabels), where nlabels is the total number of labels the model was trained on, and nbins is the number of bins used for the hashing trick. Odd frame columns contain z model coefficients, and even columns n model coefficients.

model_type

‘binomial’ for binomial classification, ‘multinomial’ for multinomial classification, ‘regression’ for numeric regression or ‘auto’ for automatic model type detection based on the target column stype. Default value is ‘auto’.

Type:The type of the model FTRL should build
model_type_trained

‘regression’, ‘binomial’, ‘multinomial’ or ‘none’ for untrained model.

Type:The model type FTRL has built
nbins

Number of bins to be used for the hashing trick.

negative_class

Whether to create and train on a ‘negative’ class in the case of multinomial classification.

nepochs

Number of training epochs.

params

FTRL model parameters.

predict()

Make predictions for a dataset.

Parameters:X (Frame) – Frame of shape (nrows, ncols) to make predictions for. It should have the same number of columns as the training frame.
Returns:
  • A new frame of shape (nrows, nlabels) with the predicted probabilities
  • for each row of frame X and each label the model was trained for.
reset()

Reset FTRL model by clearing all the model weights, labels and feature importance information.

Parameters:None
Returns:
Return type:None