Skip to content

Forust

Python API Reference

PyPI

Crates.io

The GradientBooster class is currently the only public facing class in the package, and can be used to train gradient boosted decision tree ensembles with multiple objective functions.

GradientBooster

GradientBooster(*, objective_type: str = 'LogLoss', iterations: int = 100, learning_rate: float = 0.3, max_depth: int = 5, max_leaves: int = sys.maxsize, l1: float = 0.0, l2: float = 1.0, gamma: float = 0.0, max_delta_step: float = 0.0, min_leaf_weight: float = 1.0, base_score: float = 0.5, nbins: int = 256, parallel: bool = True, allow_missing_splits: bool = True, monotone_constraints: Union[dict[Any, int], None] = None, subsample: float = 1.0, top_rate: float = 0.1, other_rate: float = 0.2, colsample_bytree: float = 1.0, seed: int = 0, missing: float = np.nan, create_missing_branch: bool = False, sample_method: str | None = None, grow_policy: str = 'DepthWise', evaluation_metric: str | None = None, early_stopping_rounds: int | None = None, initialize_base_score: bool = True, terminate_missing_features: Iterable[Any] | None = None, missing_node_treatment: str = 'None', log_iterations: int = 0, feature_importance_method: str = 'Gain', force_children_to_bound_parent: bool = False)

Gradient Booster Class, used to generate gradient boosted decision tree ensembles.

Parameters:

  • objective_type (str, default: 'LogLoss' ) –

    The name of objective function used to optimize. Valid options include "LogLoss" to use logistic loss as the objective function (binary classification), or "SquaredLoss" to use Squared Error as the objective function (continuous regression). Defaults to "LogLoss".

  • iterations (int, default: 100 ) –

    Total number of trees to train in the ensemble. Defaults to 100.

  • learning_rate (float, default: 0.3 ) –

    Step size to use at each iteration. Each leaf weight is multiplied by this number. The smaller the value, the more conservative the weights will be. Defaults to 0.3.

  • max_depth (int, default: 5 ) –

    Maximum depth of an individual tree. Valid values are 0 to infinity. Defaults to 5.

  • max_leaves (int, default: maxsize ) –

    Maximum number of leaves allowed on a tree. Valid values are 0 to infinity. This is the total number of final nodes. Defaults to sys.maxsize.

  • l1 (float, default: 0.0 ) –

    L1 regularization term applied to the weights of the tree. Valid values are 0 to infinity. Defaults to 0.0.

  • l2 (float, default: 1.0 ) –

    L2 regularization term applied to the weights of the tree. Valid values are 0 to infinity. Defaults to 1.0.

  • gamma (float, default: 0.0 ) –

    The minimum amount of loss required to further split a node. Valid values are 0 to infinity. Defaults to 0.0.

  • max_delta_step (float, default: 0.0 ) –

    Maximum delta step allowed at each leaf. This is the maximum magnitude a leaf can take. Setting to 0 results in no constrain. Defaults to 0..

  • min_leaf_weight (float, default: 1.0 ) –

    Minimum sum of the hessian values of the loss function required to be in a node. Defaults to 1.0.

  • base_score (float, default: 0.5 ) –

    The initial prediction value of the model. If initialize_base_score is set to True the base_score will automatically be updated based on the objective function at fit time. Defaults to 0.5.

  • nbins (int, default: 256 ) –

    Number of bins to calculate to partition the data. Setting this to a smaller number, will result in faster training time, while potentially sacrificing accuracy. If there are more bins, than unique values in a column, all unique values will be used. Defaults to 256.

  • parallel (bool, default: True ) –

    Should multiple cores be used when training and predicting with this model? Defaults to True.

  • allow_missing_splits (bool, default: True ) –

    Allow for splits to be made such that all missing values go down one branch, and all non-missing values go down the other, if this results in the greatest reduction of loss. If this is false, splits will only be made on non missing values. If create_missing_branch is set to True having this parameter be set to True will result in the missing branch further split, if this parameter is False then in that case the missing branch will always be a terminal node. Defaults to True.

  • monotone_constraints (dict[Any, int], default: None ) –

    Constraints that are used to enforce a specific relationship between the training features and the target variable. A dictionary should be provided where the keys are the feature index value if the model will be fit on a numpy array, or a feature name if it will be fit on a pandas Dataframe. The values of the dictionary should be an integer value of -1, 1, or 0 to specify the relationship that should be estimated between the respective feature and the target variable. Use a value of -1 to enforce a negative relationship, 1 a positive relationship, and 0 will enforce no specific relationship at all. Features not included in the mapping will not have any constraint applied. If None is passed no constraints will be enforced on any variable. Defaults to None.

  • subsample (float, default: 1.0 ) –

    Percent of records to randomly sample at each iteration when training a tree. Defaults to 1.0, meaning all data is used to training.

  • top_rate (float, default: 0.1 ) –

    Used only in goss. The retain ratio of large gradient data.

  • other_rate (float, default: 0.2 ) –

    Used only in goss. the retain ratio of small gradient data.

  • colsample_bytree (float, default: 1.0 ) –

    Specify the fraction of columns that should be sampled at each iteration, valid values are in the range (0.0,1.0].

  • seed (integer, default: 0 ) –

    Integer value used to seed any randomness used in the algorithm. Defaults to 0.

  • missing (float, default: nan ) –

    Value to consider missing, when training and predicting with the booster. Defaults to np.nan.

  • create_missing_branch (bool, default: False ) –

    An experimental parameter, that if True, will create a separate branch for missing, creating a ternary tree, the missing node will be given the same weight value as the parent node. If this parameter is False, missing will be sent down either the left or right branch, creating a binary tree. Defaults to False.

  • sample_method (str | None, default: None ) –

    Optional string value to use to determine the method to use to sample the data while training. If this is None, no sample method will be used. If the subsample parameter is less than 1 and no sample_method is provided this sample_method will be automatically set to "random". Valid options are "goss" and "random". Defaults to None.

  • grow_policy (str, default: 'DepthWise' ) –

    Optional string value that controls the way new nodes are added to the tree. Choices are DepthWise to split at nodes closest to the root, or LossGuide to split at nodes with the highest loss change.

  • evaluation_metric (str | None, default: None ) –

    Optional string value used to define an evaluation metric that will be calculated at each iteration if a evaluation_dataset is provided at fit time. The metric can be one of "AUC", "LogLoss", "RootMeanSquaredLogError", or "RootMeanSquaredError". If no evaluation_metric is passed, but an evaluation_dataset is passed, then "LogLoss", will be used with the "LogLoss" objective function, and "RootMeanSquaredLogError" will be used with "SquaredLoss".

  • early_stopping_rounds (int | None, default: None ) –

    If this is specified, and an evaluation_dataset is passed during fit, then an improvement in the evaluation_metric must be seen after at least this many iterations of training, otherwise training will be cut short.

  • initialize_base_score (bool, default: True ) –

    If this is specified, the base_score will be calculated at fit time using the sample_weight and y data in accordance with the requested objective_type. This will result in the passed base_score value being overridden.

  • terminate_missing_features (set[Any], default: None ) –

    An optional iterable of features (either strings, or integer values specifying the feature indices if numpy arrays are used for fitting), for which the missing node will always be terminated, even if allow_missing_splits is set to true. This value is only valid if create_missing_branch is also True.

  • missing_node_treatment (str, default: 'None' ) –

    Method for selecting the weight for the missing node, if create_missing_branch is set to True. Defaults to "None". Valid options are:

    • "None": Calculate missing node weight values without any constraints.
    • "AssignToParent": Assign the weight of the missing node to that of the parent.
    • "AverageLeafWeight": After training each tree, starting from the bottom of the tree, assign the missing node weight to the weighted average of the left and right child nodes. Next assign the parent to the weighted average of the children nodes. This is performed recursively up through the entire tree. This is performed as a post processing step on each tree after it is built, and prior to updating the predictions for which to train the next tree.
    • "AverageNodeWeight": Set the missing node to be equal to the weighted average weight of the left and the right nodes.
  • log_iterations (bool, default: 0 ) –

    Setting to a value (N) other than zero will result in information being logged about ever N iterations, info can be interacted with directly with the python logging module. For an example of how to utilize the logging information see the example here.

  • feature_importance_method (str, default: 'Gain' ) –

    The feature importance method type that will be used to calculate the feature_importances_ attribute on the booster.

  • force_children_to_bound_parent (bool, default: False ) –

    Setting this parameter to True will restrict children nodes, so that they always contain the parent node inside of their range. Without setting this it's possible that both, the left and the right nodes could be greater, than or less than, the parent node. Defaults to False.

Raises:

  • TypeError

    Raised if an invalid dtype is passed.

Example

Once, the booster has been initialized, it can be fit on a provided dataset, and performance field. After fitting, the model can be used to predict on a dataset. In the case of this example, the predictions are the log odds of a given record being 1.

# Small example dataset
from seaborn import load_dataset

df = load_dataset("titanic")
X = df.select_dtypes("number").drop(columns=["survived"])
y = df["survived"]

# Initialize a booster with defaults.
from forust import GradientBooster
model = GradientBooster(objective_type="LogLoss")
model.fit(X, y)

# Predict on data
model.predict(X.head())
# array([-1.94919663,  2.25863229,  0.32963671,  2.48732194, -3.00371813])

# predict contributions
model.predict_contributions(X.head())
# array([[-0.63014213,  0.33880048, -0.16520798, -0.07798772, -0.85083578,
#        -1.07720813],
#       [ 1.05406709,  0.08825999,  0.21662544, -0.12083538,  0.35209258,
#        -1.07720813],

best_iteration property

best_iteration: int | None

Get the best iteration if early_stopping_rounds was used when fitting.

Returns:

  • int | None

    int | None: The best iteration, or None if early_stopping_rounds wasn't used.

prediction_iteration property

prediction_iteration: int | None

The prediction_iteration that will be used when predicting, up to this many trees will be used.

Returns:

  • int | None

    int | None: Int if this is set, otherwise, None, in which case all trees will be used.

number_of_trees property

number_of_trees: int

The number of trees in the model.

Returns:

  • int ( int ) –

    The total number of trees in the model.

fit

fit(X: FrameLike, y: ArrayLike, sample_weight: Union[ArrayLike, None] = None, evaluation_data: None | list[tuple[FrameLike, ArrayLike, ArrayLike] | tuple[FrameLike, ArrayLike]] = None) -> GradientBooster

Fit the gradient booster on a provided dataset.

Parameters:

  • X (FrameLike) –

    Either a pandas DataFrame, or a 2 dimensional numpy array.

  • y (ArrayLike) –

    Either a pandas Series, or a 1 dimensional numpy array. If "LogLoss" was the objective type specified, then this should only contain 1 or 0 values, where 1 is the positive class being predicted. If "SquaredLoss" is the objective type, then any continuous variable can be provided.

  • sample_weight (Union[ArrayLike, None], default: None ) –

    Instance weights to use when training the model. If None is passed, a weight of 1 will be used for every record. Defaults to None.

  • evaluation_data (tuple[FrameLike, ArrayLike, ArrayLike] | tuple[FrameLike, ArrayLike], default: None ) –

    An optional list of tuples, where each tuple should contain a dataset, and equal length target array, and optional an equal length sample weight array. If this is provided metric values will be calculated at each iteration of training. If early_stopping_rounds is supplied, the last entry of this list will be used to determine if performance has improved over the last set of iterations, for which if no improvement is not seen in early_stopping_rounds training will be cut short.

predict

predict(X: FrameLike, parallel: Union[bool, None] = None) -> np.ndarray

Predict with the fitted booster on new data.

Parameters:

  • X (FrameLike) –

    Either a pandas DataFrame, or a 2 dimensional numpy array.

  • parallel (Union[bool, None], default: None ) –

    Optionally specify if the predict function should run in parallel on multiple threads. If None is passed, the parallel attribute of the booster will be used. Defaults to None.

Returns:

  • ndarray

    np.ndarray: Returns a numpy array of the predictions.

predict_contributions

predict_contributions(X: FrameLike, method: str = 'Average', parallel: Union[bool, None] = None) -> np.ndarray

Predict with the fitted booster on new data, returning the feature contribution matrix. The last column is the bias term.

When predicting with the data, the maximum iteration that will be used when predicting can be set using the set_prediction_iteration method. If early_stopping_rounds has been set, this will default to the best iteration, otherwise all of the trees will be used.

If early stopping was used, the evaluation history can be retrieved with the get_evaluation_history method.

Parameters:

  • X (FrameLike) –

    Either a pandas DataFrame, or a 2 dimensional numpy array.

  • method (str, default: 'Average' ) –

    Method to calculate the contributions, available options are:

    • "Average": If this option is specified, the average internal node values are calculated, this is equivalent to the approx_contribs parameter in XGBoost.
    • "Shapley": Using this option will calculate contributions using the tree shap algorithm.
    • "Weight": This method will use the internal leaf weights, to calculate the contributions. This is the same as what is described by Saabas here.
    • "BranchDifference": This method will calculate contributions by subtracting the weight of the node the record will travel down by the weight of the other non-missing branch. This method does not have the property where the contributions summed is equal to the final prediction of the model.
    • "MidpointDifference": This method will calculate contributions by subtracting the weight of the node the record will travel down by the mid-point between the right and left node weighted by the cover of each node. This method does not have the property where the contributions summed is equal to the final prediction of the model.
    • "ModeDifference": This method will calculate contributions by subtracting the weight of the node the record will travel down by the weight of the node with the largest cover (the mode node). This method does not have the property where the contributions summed is equal to the final prediction of the model.
    • "ProbabilityChange": This method is only valid when the objective type is set to "LogLoss". This method will calculate contributions as the change in a records probability of being 1 moving from a parent node to a child node. The sum of the returned contributions matrix, will be equal to the probability a record will be 1. For example, given a model, model.predict_contributions(X, method="ProbabilityChange") == 1 / (1 + np.exp(-model.predict(X)))
  • parallel (Union[bool, None], default: None ) –

    Optionally specify if the predict function should run in parallel on multiple threads. If None is passed, the parallel attribute of the booster will be used. Defaults to None.

Returns:

  • ndarray

    np.ndarray: Returns a numpy array of the predicted contributions.

predict_leaf_indices

predict_leaf_indices(X: FrameLike) -> np.ndarray

Predict the leaf indices for each tree. This will be the node ID number, this can be used to identify the leaf node a record will fall into for each row, this could be paired directly with the trees_to_dataframe output. The data returned will be a matrix, where each column corresponds to a tree, thus the data will be of the shape (rows in X, prediction_iteration)

Parameters:

  • X (FrameLike) –

    Either a pandas DataFrame, or a 2 dimensional numpy array.

Returns:

  • ndarray

    np.ndarray: Returns a numpy array of the predicted leaf indices..

set_prediction_iteration

set_prediction_iteration(iteration: int)

Set the iteration that should be used when predicting. If early_stopping_rounds has been set, this will default to the best iteration, otherwise all of the trees will be used.

Parameters:

  • iteration (int) –

    Iteration number to use, this will use all trees, up to this index. Setting this to 10, would result in trees 0 through 9 used for predictions.

partial_dependence

partial_dependence(X: FrameLike, feature: Union[str, int], samples: int | None = 100, exclude_missing: bool = True, percentile_bounds: tuple[float, float] = (0.2, 0.98)) -> np.ndarray

Calculate the partial dependence values of a feature. For each unique value of the feature, this gives the estimate of the predicted value for that feature, with the effects of all features averaged out. This information gives an estimate of how a given feature impacts the model.

Parameters:

  • X (FrameLike) –

    Either a pandas DataFrame, or a 2 dimensional numpy array. This should be the same data passed into the models fit, or predict, with the columns in the same order.

  • feature (Union[str, int]) –

    The feature for which to calculate the partial dependence values. This can be the name of a column, if the provided X is a pandas DataFrame, or the index of the feature.

  • samples (int | None, default: 100 ) –

    Number of evenly spaced samples to select. If None is passed all unique values will be used. Defaults to 100.

  • exclude_missing (bool, default: True ) –

    Should missing excluded from the features? Defaults to True.

  • percentile_bounds (tuple[float, float], default: (0.2, 0.98) ) –

    Upper and lower percentiles to start at when calculating the samples. Defaults to (0.2, 0.98) to cap the samples selected at the 5th and 95th percentiles respectively.

Raises:

  • ValueError

    An error will be raised if the provided X parameter is not a pandas DataFrame, and a string is provided for the feature.

Returns:

  • ndarray

    np.ndarray: A 2 dimensional numpy array, where the first column is the sorted unique values of the feature, and then the second column is the partial dependence values for each feature value.

Example

This information can be plotted to visualize how a feature is used in the model, like so.

from seaborn import lineplot
import matplotlib.pyplot as plt

pd_values = model.partial_dependence(X=X, feature="age", samples=None)

fig = lineplot(x=pd_values[:,0], y=pd_values[:,1],)
plt.title("Partial Dependence Plot")
plt.xlabel("Age")
plt.ylabel("Log Odds")

We can see how this is impacted if a model is created, where a specific constraint is applied to the feature using the monotone_constraint parameter.

model = GradientBooster(
    objective_type="LogLoss",
    monotone_constraints={"age": -1},
)
model.fit(X, y)

pd_values = model.partial_dependence(X=X, feature="age")
fig = lineplot(
    x=pd_values[:, 0],
    y=pd_values[:, 1],
)
plt.title("Partial Dependence Plot with Monotonicity")
plt.xlabel("Age")
plt.ylabel("Log Odds")

calculate_feature_importance

calculate_feature_importance(method: str = 'Gain', normalize: bool = True) -> dict[int, float] | dict[str, float]

Feature importance values can be calculated with the calculate_feature_importance method. This function will return a dictionary of the features and their importance values. It should be noted that if a feature was never used for splitting it will not be returned in importance dictionary.

Parameters:

  • method (str, default: 'Gain' ) –

    Variable importance method. Defaults to "Gain". Valid options are:

    • "Weight": The number of times a feature is used to split the data across all trees.
    • "Gain": The average split gain across all splits the feature is used in.
    • "Cover": The average coverage across all splits the feature is used in.
    • "TotalGain": The total gain across all splits the feature is used in.
    • "TotalCover": The total coverage across all splits the feature is used in.
  • normalize (bool, default: True ) –

    Should the importance be normalized to sum to 1? Defaults to True.

Returns:

  • dict[int, float] | dict[str, float]

    dict[str, float]: Variable importance values, for features present in the model.

Example
model.calculate_feature_importance("Gain")
# {
#   'parch': 0.0713072270154953,
#   'age': 0.11609109491109848,
#   'sibsp': 0.1486879289150238,
#   'fare': 0.14309120178222656,
#   'pclass': 0.5208225250244141
# }

text_dump

text_dump() -> list[str]

Return all of the trees of the model in text form.

Returns:

  • list[str]

    list[str]: A list of strings, where each string is a text representation of the tree.

Example:

model.text_dump()[0]
# 0:[0 < 3] yes=1,no=2,missing=2,gain=91.50833,cover=209.388307
#       1:[4 < 13.7917] yes=3,no=4,missing=4,gain=28.185467,cover=94.00148
#             3:[1 < 18] yes=7,no=8,missing=8,gain=1.4576768,cover=22.090348
#                   7:[1 < 17] yes=15,no=16,missing=16,gain=0.691266,cover=0.705011
#                         15:leaf=-0.15120,cover=0.23500
#                         16:leaf=0.154097,cover=0.470007

json_dump

json_dump() -> str

Return the booster object as a string.

Returns:

  • str ( str ) –

    The booster dumped as a json object in string form.

load_booster classmethod

load_booster(path: str) -> GradientBooster

Load a booster object that was saved with the save_booster method.

Parameters:

  • path (str) –

    Path to the saved booster file.

Returns:

save_booster

save_booster(path: str)

Save a booster object, the underlying representation is a json file.

Parameters:

  • path (str) –

    Path to save the booster object.

insert_metadata

insert_metadata(key: str, value: str)

Insert data into the models metadata, this will be saved on the booster object.

Parameters:

  • key (str) –

    Key to give the inserted value in the metadata.

  • value (str) –

    String value to assign to the key.

get_metadata

get_metadata(key: str) -> str

Get the value associated with a given key, on the boosters metadata.

Parameters:

  • key (str) –

    Key of item in metadata.

Returns:

  • str ( str ) –

    Value associated with the provided key in the boosters metadata.

get_evaluation_history

get_evaluation_history() -> np.ndarray | None

Get the results of the evaluation_metric calculated on the evaluation_dataset passed to fit, at each iteration. If no evaluation_dataset was passed, this will return None.

Returns:

  • ndarray | None

    np.ndarray | None: A numpy array equal to the shape of the number

  • ndarray | None

    of evaluation datasets passed, and the number of trees in the model.

Example
model = GradientBooster(objective_type="LogLoss")
model.fit(X, y, evaluation_data=[(X, y)])

model.get_evaluation_history()[0:3]

# array([[588.9158873 ],
#        [532.01055803],
#        [496.76933646]])

get_best_iteration

get_best_iteration() -> int | None

Get the best iteration if early_stopping_rounds was used when fitting.

Returns:

  • int | None

    int | None: The best iteration, or None if early_stopping_rounds wasn't used.

get_params

get_params(deep=True) -> dict[str, Any]

Get all of the parameters for the booster.

Parameters:

  • deep (bool, default: True ) –

    This argument does nothing, and is simply here for scikit-learn compatibility.. Defaults to True.

Returns:

  • dict[str, Any]

    dict[str, Any]: The parameters of the booster.

set_params

set_params(**params: Any) -> GradientBooster

Set the parameters of the booster, this has the same effect as reinstating the booster.

Returns:

get_node_lists

get_node_lists(map_features_names: bool = True) -> list[list[Node]]

Return the tree structures representation as a list of python objects.

Parameters:

  • map_features_names (bool, default: True ) –

    Should the feature names tried to be mapped to a string, if a pandas dataframe was used. Defaults to True.

Returns:

  • list[list[Node]]

    list[list[Node]]: A list of lists where each sub list is a tree, with all of it's respective nodes.

Example

This can be run directly to get the tree structure as python objects.

fmod = GradientBooster(max_depth=2)
fmod.fit(X, y=y)

fmod.get_node_lists()[0]

# [Node(num=0, weight_value...,
# Node(num=1, weight_value...,
# Node(num=2, weight_value...,
# Node(num=3, weight_value...,
# Node(num=4, weight_value...,
# Node(num=5, weight_value...,
# Node(num=6, weight_value...,]

trees_to_dataframe

trees_to_dataframe() -> pd.DataFrame

Return the tree structure as a pandas DataFrame object.

Returns:

  • DataFrame

    pd.DataFrame: Trees in a pandas dataframe.

Example

This can be used directly to print out the tree structure as a pandas dataframe. The Leaf values will have the "Gain" column replaced with the weight value.

model.trees_to_dataframe().head()
Tree Node ID Feature Split Yes No Missing Gain Cover
0 0 0 0-0 pclass 3 0-1 0-2 0-2 91.5083 209.388
1 0 1 0-1 fare 13.7917 0-3 0-4 0-4 28.1855 94.0015

Logging output

Info is logged while the model is being trained if the log_iterations parameter is set to a value greater than 0 while fitting the booster. The logs can be printed to stdout while training like so.

import logging
logging.basicConfig()
logging.getLogger().setLevel(logging.INFO)

fmod = GradientBooster(log_iterations=1)
fmod.fit(X, y, evaluation_data=[(X, y)])

# INFO:forust_ml.gradientbooster:Iteration 0 evaluation data values: 0.2828
# INFO:forust_ml.gradientbooster:Completed iteration 0 of 10
# INFO:forust_ml.gradientbooster:Iteration 1 evaluation data values: 0.2807
# INFO:forust_ml.gradientbooster:Completed iteration 1 of 10
# INFO:forust_ml.gradientbooster:Iteration 2 evaluation data values: 0.2787
# INFO:forust_ml.gradientbooster:Completed iteration 2 of 10

The log output can also be captured in a file also using the logging.basicConfig() filename option.

import logging
logging.basicConfig(filename="training-info.log")
logging.getLogger().setLevel(logging.INFO)

fmod = GradientBooster(log_iterations=10)
fmod.fit(X, y, evaluation_data=[(X, y)])