Backtest: Looping





Kerry Back

  • We’ll stick with monthly portfolio formation.
  • At the beginning of each month, choose what seems to be the best portfolio (possibly including shorts) based on the data available at that time.
  • Hold the portfolio for a month and record the return.
  • Rinse and repeat.

  • Example: Jan 1, 2020. All data prior to 2020 is regarded as training data.
  • We can use it to train models, compare hyparameters, and compare models.
  • To compare hyperparameters and models and avoid over-fitting, we should split the training data into train and test, or cross validate.
  • To do extensive model and hyperparameter search at each month in a long time period will be slow.

  • Given limited computer resources, we will simplify. We will just train each month, rather than train/validate/test each month.
  • To simplify further, we will just train some months and use the trained model to form predictions for several following months. Example: train every five years:
df = df.set_index(["date", "ticker"])
dates = ["2005-01", "2010-01", "2015-01", "2020-01", "3000-01"]
for train_date, end_date in zip(dates[:-1], dates[1:]):
  train at train_date
  predict at train_date ... up to but not including end_date
  store predictions in a (date, ticker) indexed series
df["predict"] = predictions

predictions = None
for train_date, end_date in zip(dates[:-1], dates[1:]):
  fltr1 = df.index.get_level_values("date") < train_date
  fltr2 = df.index.get_level_values("date") < end_date
  train = df[fltr1]
  test = df[~fltr1 & fltr2]
  Xtrain = train[features]
  ytrain = train["ret"]
  Xtest = test[features]
  pipe.fit(Xtrain, ytrain)
  pred = pipe.predict(Xtest)
  pred = pd.Series(pred, index=test.index)
  predictions = pd.concat((predictions, pred))
df["predict"] = predictions