How do we do time series cross validation in machine learning (xgboost etc). I’ve done high-level econometrics, so I’m more interested in the applied code than the theory.
There are many snippets of the process, but no information on the whole pipleline (even less examples use multivariate series).
CV + tuning → best model => forecast.
Bonus marks for ensembles. Eg using the predictions as part of a 2nd round of learning with further CV (stacking).
I’d really appreciate any code or resources on the matter.
I’ve been using Sktime do a lot of the sliding window and grid-search stuff.
My problem is a 1-step forecast. So for cross validation I build a time series of 1-step predictions. It’s a much harsher score than the one that comes out of grid-search. I use this score as my model validation score.
Now, I’m working on the ensemble part of the problem. To ensemble, I use the predictions/errors as inputs into another CV model.
I’m learning sktime pipelines so i can build many models off different/sub datasets, then combine the smaller models with a second layer of models.
P.S. do you have any material on time series PCA? How do i build a series of the principal or secondary component? A simple time-series PCA over a large period. Bonus karma for anything with rolling PCA, preferably a PCA that occurs at each time period (if that’s even possible). If I use a garch and DCC model, surely I can get a covariance matrix at each point in time, then take the eigen values. What tripping me up is the reconstruction of the 1st, 2nd etc… components’ time series.