Transforming Features and Target





Kerry Back

Example of untransformed data

  • Same data as in last session (roeq and ret in 2020-01)

Scikit-learn transformers

  • StandardScaler: y = (x-x.mean())/x.std()
  • PowerTransformer: nonlinear transformations to generate approximate normal distribution
  • QuantileTransformer
    • uniform maps data to [0,1]
    • normal maps to standard normal distribution

Normal quantile transformer example

from sklearn.preprocessing import QuantileTransformer
qt = QuantileTransformer(output_distribution="normal")
d = qt.fit_transform(df[["roeq", "ret"]])
d = pd.DataFrame(d, columns=["roeq", "ret"])
px.scatter_matrix(d)

Transforming cross sections

  • We should probably transform each cross-section separately
  • Transform returns each month separately to remove the effect of the market being up or down
  • Transform features each month separately to remove time trends in the features
  • Can transform each month separately by applying the fit_transform method within a groupby.