Additional Considerations





Kerry Back

Stock universe

  • Which size group of stocks do we want to trade?
    • Small caps, mid caps, and/or large caps?
    • Our data only has NYSE and Nasdaq listed stocks, so the very smallest public stocks are excluded.
  • Impose a price filter?
    • Price > $5.00 is imposed in our data to rule out “penny stocks”
  • Industry or sector focus?

Buy industries or best stocks in each industry?

  • By including industry dummies, our models have a chance of finding best stocks in each industry
  • Including deviations of characteristics from industry means or medians in each cross-section may also be helpful
  • Return prediction models do not include risk analysis. Imposing some industry balance could be useful to control risks.

Add squares and products of features

  • We can put scikit-learn’s PolynomialFeatures in a pipeline to add squares and products (or cubes, etc.)
  • In a linear model,
    • Adding industry dummy variables \(\times\) features allows the feature slope coefficients to vary by industry
    • Adding products of features produces a model like

\[b_1 x_1 + b_2 x_2 + c x_1x_2 = (b_1 + cx_2) x_1 + b_2x_2\]

Train and predict in a loop

  • Loop over dates
  • At each date,
    • Train on past
    • Predict for upcoming period
    • Sort and form portfolios based on predictions
    • Record the return of each portfolio over the upcoming period

Model and hyperparameters

  • Linear regression is a poor model. Penalized (regularized) regression is a bit better.
  • Random forests, boosted forests, and neural nets are substantially better
  • Need to tune hyperparameters
    • Could apply GridSearchCV at each training date to find best hyperparameters for past data
    • Maybe use correlation between predicted and actual as score rather than MSE.