Additional Considerations
Kerry Back
Stock universe
Which size group of stocks do we want to trade?
Small caps, mid caps, and/or large caps?
Our data only has NYSE and Nasdaq listed stocks, so the very smallest public stocks are excluded.
Impose a price filter?
Price > $5.00 is imposed in our data to rule out “penny stocks”
Industry or sector focus?
Buy industries or best stocks in each industry?
By including industry dummies, our models have a chance of finding best stocks in each industry
Including deviations of characteristics from industry means or medians in each cross-section may also be helpful
Return prediction models do not include risk analysis. Imposing some industry balance could be useful to control risks.
Add squares and products of features
We can put scikit-learn’s PolynomialFeatures in a pipeline to add squares and products (or cubes, etc.)
In a linear model,
Adding industry dummy variables
\(\times\)
features allows the feature slope coefficients to vary by industry
Adding products of features produces a model like
\[b_1 x_1 + b_2 x_2 + c x_1x_2 = (b_1 + cx_2) x_1 + b_2x_2\]
Train and predict in a loop
Loop over dates
At each date,
Train on past
Predict for upcoming period
Sort and form portfolios based on predictions
Record the return of each portfolio over the upcoming period
Model and hyperparameters
Linear regression is a poor model. Penalized (regularized) regression is a bit better.
Random forests, boosted forests, and neural nets are substantially better
Need to tune hyperparameters
Could apply GridSearchCV at each training date to find best hyperparameters for past data
Maybe use correlation between predicted and actual as score rather than MSE.