Some Cross Validation Results





Kerry Back

  • Same features as in 3a-trees
    • agr, bm, idiovol, mom12m, roeq
  • Quantile transform features and ret in each cross-section
  • Cross-validate on one year (2019) just for illustration

Custom Scorer

  • For picking stocks based on return predictions, the important issue is whether stocks predicted to have higher returns actually do have higher returns.
  • The numerical values of the predictions and the numerical errors matter less.
  • To calculate the extent to which higher predictions correspond to higher returns, Spearman’s rank correlation is useful.

from scipy.stats import spearmanr
from sklearn.metrics import make_scorer

scorer = make_scorer(
    lambda a, b: spearmanr(a, b).statistic, 
    greater_is_better=True
)

GridSearchCV for Random Forest

model = RandomForestRegressor()

cv = GridSearchCV(
  model,
  param_grid={
    "max_depth": range(1, 11)
  },
  scoring=scorer
)
_ = cv.fit(X, y)

GridSearchCV for Multi-Layer Perceptron

param_grid={
    "hidden_layer_sizes": [
        [16, 8, 4, 2], 
        [16, 8, 4],
        [8, 4, 2], 
        [16, 8],
        [16, 4],
        [8, 4],
        [4, 4],
        [4, 2]
    ]
}

cv = GridSearchCV(
  MLPRegressor(max_iter=500)
  param_grid=param_grid,
  scoring=scorer
)
_ = cv.fit(X, y)

RandomizedSearchCV for Gradient Boosting

from scipy.stats import uniform
u = uniform(scale=0.2)

cv = RandomizedSearchCV(
    GradientBoostingRegressor(),
    param_distributions={
        "learning_rate": u,
        "max_depth": range(2, 10, 2)
    },
    scoring=scorer,
)
_ = cv.fit(X, y)