Cross Validation





Kerry Back

  • Cross-validation (CV) is a way to choose optimal hyperparameters using the training data
  • Split the training data into subsets, e.g., A, B, C, D, E
  • Define a finite set of hyperparemeter combinations (a grid) to choose from
    • Example: {“max_depth”: [3, 4], “learning_rate”: [0.05, 0.1]}
    • Example: {“hidden_layer_sizes:[[4, 2], [8, 4, 2], [16, 8, 4]]}

  • Use one of the subsets (e.g., A) as the validation set
  • Train with each of the hyperparameter combinations on the union of the remaining subsets (e.g., B \(\cup\) C \(\cup\) D \(\cup\) E)
  • Compute the trained model scores on A
  • Repeat with B as the validation set, etc.
  • For each hyperparameter combination, end up with as many validation scores as there are subsets

  • Average the validation scores to get a single score for each hyperparameter combination
  • Choose the hyperparameters with the highest average score
  • All of this together is “search over the grid using cross-validation to find the best hyperparameters”
  • It is implemented by scikit-learn’s GridSearchCV function

Example

  • Same data as in 3a-trees
    • agr, bm, idiovol, mom12m, roeq
    • data = 2021-12 (training data)
  • Quantile transform features and ret

Cross validation for gradient boosting

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV

param_grid = {
  "max_depth": [3, 4], 
  "learning_rate": [0.05, 0.1]
}

cv = GridSearchCV(
  estimator=GradientBoostingRegressor(),
  param_grid=param_grid,
)

_ = cv.fit(Xtrain, ytrain)
pd.DataFrame(cv.cv_results_).iloc[:, 4:]

param_learning_rate param_max_depth params split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score rank_test_score
0 0.05 3 {'learning_rate': 0.05, 'max_depth': 3} 0.215701 0.203541 0.125528 0.050929 0.171487 0.153437 0.060000 1
1 0.05 4 {'learning_rate': 0.05, 'max_depth': 4} 0.192419 0.200274 0.111733 0.019720 0.196899 0.144209 0.070421 2
2 0.1 3 {'learning_rate': 0.1, 'max_depth': 3} 0.173772 0.180420 0.121137 0.033443 0.152488 0.132252 0.053554 3
3 0.1 4 {'learning_rate': 0.1, 'max_depth': 4} 0.166758 0.179369 0.089260 -0.007269 0.160962 0.117816 0.070011 4