Neural Nets

Kerry Back

Multi-layer perceptrons

  • A multi-layer perceptron (MLP) consists of “neurons” arranged in layers.
  • A neuron is a mathematical function. It takes inputs \(x_1, \ldots, x_n\), calculates a function \(y=f(x_1, \ldots, x_n)\) and passes \(y\) to the neurons in the next level.
  • The inputs in the first layer are the features.
  • The inputs in successive layers are the calculations from the prior level.
  • The last layer is a single neuron that produces the output.


  • inputs \(x_1, x_2, x_3, x_4\)
  • variables \(y_1, \ldots, y_5\) are calculated in hidden layer
  • output depends on \(y_1, \ldots, y_5\)

Rectified linear units

  • The usual function for the neurons (except in the last layer) is

\[ y = \max(0,b+w_1x_1 + \cdots + w_nx_n)\]

  • Parameters \(b\) (called bias) and \(w_1, \ldots w_n\) (called weights) are different for different neurons.
  • This function is called a rectified linear unit (RLU).

Analogy to neurons firing

  • If \(w_i>0\) then \(y>0\) only when \(x_i\) are large enough.
  • A neuron fires when it is sufficiently stimulated by signals from other neurons (in prior layer).

Output function

  • The output doesn’t have a truncation, so it can be negative.
  • For regression problems, it is linear:

\[z = b+w_1y_1 + \cdots + w_ny_n\]

  • For classification, there is a linear function for each class and the prediction is the class with the largest value.

Deep versus shallow learning

  • Deep learning means a neural network with many layers. It is behind facial recognition, self-driving cars, …
  • Shallow learning seems to work better for return prediction
  • Probably due to high noise to signal ratio


  • Same data as in 3a-trees
    • agr, bm, idiovol, mom12m, roeq
    • training data = 2021-12
    • test data = 2022-01
  • Quantile transform features and ret in each cross-section

Fitting a neural network

from sklearn.neural_network import MLPRegressor

Xtrain = df['2021-12'][features]
ytrain = df['2021-12']["ret"]

model = MLPRegressor(
  hidden_layer_sizes=(4, 2),
), y)

Complexity and scores