A multi-layer perceptron (MLP) consists of “neurons” arranged in layers.
A neuron is a mathematical function. It takes inputs \(x_1, \ldots, x_n\), calculates a function \(y=f(x_1, \ldots, x_n)\) and passes \(y\) to the neurons in the next level.
The inputs in the first layer are the features.
The inputs in successive layers are the calculations from the prior level.
The last layer is a single neuron that produces the output.
Illustration
inputs \(x_1, x_2, x_3, x_4\)
variables \(y_1, \ldots, y_5\) are calculated in hidden layer
output depends on \(y_1, \ldots, y_5\)
Rectified linear units
The usual function for the neurons (except in the last layer) is
\[ y = \max(0,b+w_1x_1 + \cdots + w_nx_n)\]
Parameters \(b\) (called bias) and \(w_1, \ldots w_n\) (called weights) are different for different neurons.
This function is called a rectified linear unit (RLU).
Analogy to neurons firing
If \(w_i>0\) then \(y>0\) only when \(x_i\) are large enough.
A neuron fires when it is sufficiently stimulated by signals from other neurons (in prior layer).
Output function
The output doesn’t have a truncation, so it can be negative.
For regression problems, it is linear:
\[z = b+w_1y_1 + \cdots + w_ny_n\]
For classification, there is a linear function for each class and the prediction is the class with the largest value.
Deep versus shallow learning
Deep learning means a neural network with many layers. It is behind facial recognition, self-driving cars, …
Shallow learning seems to work better for return prediction
Probably due to high noise to signal ratio
Example
Same data as in 3a-trees
agr, bm, idiovol, mom12m, roeq
training data = 2021-12
test data = 2022-01
Quantile transform features and ret in each cross-section