Additional Considerations

Kerry Back

Stock universe

Which size group of stocks do we want to trade?
- Small caps, mid caps, and/or large caps?
- Our data only has NYSE and Nasdaq listed stocks, so the very smallest public stocks are excluded.
Impose a price filter?
- Price > $5.00 is imposed in our data to rule out “penny stocks”
Industry or sector focus?

Buy industries or best stocks in each industry?

By including industry dummies, our models have a chance of finding best stocks in each industry
Including deviations of characteristics from industry means or medians in each cross-section may also be helpful
Return prediction models do not include risk analysis. Imposing some industry balance could be useful to control risks.

Add squares and products of features

We can put scikit-learn’s PolynomialFeatures in a pipeline to add squares and products (or cubes, etc.)
In a linear model,
- Adding industry dummy variables \(\times\) features allows the feature slope coefficients to vary by industry
- Adding products of features produces a model like

\[b_1 x_1 + b_2 x_2 + c x_1x_2 = (b_1 + cx_2) x_1 + b_2x_2\]

Train and predict in a loop

Loop over dates
At each date,
- Train on past
- Predict for upcoming period
- Sort and form portfolios based on predictions
- Record the return of each portfolio over the upcoming period

Model and hyperparameters

Linear regression is a poor model. Penalized (regularized) regression is a bit better.
Random forests, boosted forests, and neural nets are substantially better
Need to tune hyperparameters
- Could apply GridSearchCV at each training date to find best hyperparameters for past data
- Maybe use correlation between predicted and actual as score rather than MSE.