Paper The following article is Open access

Handling Massive Proportion of Missing Labels in Multivariate Long-Term Time Series Forecasting

, , , , , , , , and

Published under licence by IOP Publishing Ltd
, , Citation Jr Cristovão Iglesias et al 2021 J. Phys.: Conf. Ser. 2090 012170 DOI 10.1088/1742-6596/2090/1/012170

1742-6596/2090/1/012170

Abstract

Training Deep Learning (DL) models with missing labels is a challenge in diverse engineering applications. Missing value imputation methods have been proposed to try to address this problem, but their performance is affected with Massive Proportion of Missing Labels (MPML). This paper presents a approach for handling MPML in Multivariate Long-Term Time Series Forecasting. It is an two-step process where interpolation (using Gaussian Processes Regression (GPR) and domain knowledge from experts) and prediction model are separated to enable the integration of prior domain knowledge. First, a set of samples of the possible interpolation of the missing outputs are generated by the GPR based on the domain knowledge. Second, the observed input sensor data and interpolated labels from GPR are used to train the prediction model. We evaluated our approach with the development of a soft-sensor with one real datasets to forecast the biomass during recombinant adeno-associated virus (rAAV) production in bioreactors. Our experimental results demonstrate the potential of the approach through quantitative evaluation of the generated forecasts in a case that would be extremely difficult to train a DL model due to MPML.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.