Cross Validation

Cross-validation (CV) is a way to choose optimal hyperparameters using the training data
Split the training data into subsets, e.g., A, B, C, D, E
Define a finite set of hyperparemeter combinations (a grid) to choose from
- Example: {“max_depth”: [3, 4], “learning_rate”: [0.05, 0.1]}
- Example: {“hidden_layer_sizes:[[4, 2], [8, 4, 2], [16, 8, 4]]}

Use one of the subsets (e.g., A) as the validation set
Train with each of the hyperparameter combinations on the union of the remaining subsets (e.g., B \(\cup\) C \(\cup\) D \(\cup\) E)
Compute the trained model scores on A
Repeat with B as the validation set, etc.
For each hyperparameter combination, end up with as many validation scores as there are subsets

Average the validation scores to get a single score for each hyperparameter combination
Choose the hyperparameters with the highest average score
All of this together is “search over the grid using cross-validation to find the best hyperparameters”
It is implemented by scikit-learn’s GridSearchCV function

Example

Same data as in 3a-trees
- agr, bm, idiovol, mom12m, roeq
- data = 2021-12 (training data)
Quantile transform features and ret

Cross validation for gradient boosting

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV

param_grid = {
  "max_depth": [3, 4], 
  "learning_rate": [0.05, 0.1]
}

cv = GridSearchCV(
  estimator=GradientBoostingRegressor(),
  param_grid=param_grid,
)

_ = cv.fit(Xtrain, ytrain)
pd.DataFrame(cv.cv_results_).iloc[:, 4:]

	param_learning_rate	param_max_depth	params	split0_test_score	split1_test_score	split2_test_score	split3_test_score	split4_test_score	mean_test_score	std_test_score	rank_test_score
0	0.05	3	{'learning_rate': 0.05, 'max_depth': 3}	0.215701	0.203541	0.125528	0.050929	0.171487	0.153437	0.060000	1
1	0.05	4	{'learning_rate': 0.05, 'max_depth': 4}	0.192419	0.200274	0.111733	0.019720	0.196899	0.144209	0.070421	2
2	0.1	3	{'learning_rate': 0.1, 'max_depth': 3}	0.173772	0.180420	0.121137	0.033443	0.152488	0.132252	0.053554	3
3	0.1	4	{'learning_rate': 0.1, 'max_depth': 4}	0.166758	0.179369	0.089260	-0.007269	0.160962	0.117816	0.070011	4