Some Cross Validation Results

Same features as in 3a-trees
- agr, bm, idiovol, mom12m, roeq
Quantile transform features and ret in each cross-section
Cross-validate on one year (2019) just for illustration

Custom Scorer

For picking stocks based on return predictions, the important issue is whether stocks predicted to have higher returns actually do have higher returns.
The numerical values of the predictions and the numerical errors matter less.
To calculate the extent to which higher predictions correspond to higher returns, Spearman’s rank correlation is useful.

from scipy.stats import spearmanr
from sklearn.metrics import make_scorer

scorer = make_scorer(
    lambda a, b: spearmanr(a, b).statistic, 
    greater_is_better=True
)

GridSearchCV for Random Forest

model = RandomForestRegressor()

cv = GridSearchCV(
  model,
  param_grid={
    "max_depth": range(1, 11)
  },
  scoring=scorer
)
_ = cv.fit(X, y)

GridSearchCV for Multi-Layer Perceptron

param_grid={
    "hidden_layer_sizes": [
        [16, 8, 4, 2], 
        [16, 8, 4],
        [8, 4, 2], 
        [16, 8],
        [16, 4],
        [8, 4],
        [4, 4],
        [4, 2]
    ]
}

cv = GridSearchCV(
  MLPRegressor(max_iter=500)
  param_grid=param_grid,
  scoring=scorer
)
_ = cv.fit(X, y)

RandomizedSearchCV for Gradient Boosting

from scipy.stats import uniform
u = uniform(scale=0.2)

cv = RandomizedSearchCV(
    GradientBoostingRegressor(),
    param_distributions={
        "learning_rate": u,
        "max_depth": range(2, 10, 2)
    },
    scoring=scorer,
)
_ = cv.fit(X, y)