Split data sequentially into subsets based on the value of a single feature
Above a threshold into one group
Below the threshold into the other
Prediction in each subset is the plurality class (for classification) or the cell mean (for regression).
Try to minimize impurity in classification and (usually) mean squared error in regression.
Example
Example: train from 2021-12, predict for 2022-01
Get data from the SQL database
df = pd.read_sql( """ select ticker, date, ag, bm, idiovol, mom12m, roeq, ret from data where date in ('2021-12', '2022-01') """, conn)features = ["ag", "bm", "idiovol", "mom12m", "roeq"]