05 Error Metrics¶
[Estimated execution time: 15 min]
The toolkit allows to evaluate standard error metrics automatically. This notebook will continue the air quality data set used in the previous tutorial 04 Model Training.
import mogptk
import torch
import numpy as np
import pandas as pd
torch.manual_seed(1);
Air Quality MOGP¶
For this tutorial we will use the air quality dataset. The data set contains hourly averaged responses from an array of five metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device. The device was located in the field in a significantly polluted area in an Italian city. Data were recorded for one year from March 2004 representing the longest freely available recordings of a deployed air quality chemical sensor device.
We will only use five columns: CO(GT), NMHC(GT), C6H6(GT), NOx(GT), NO2(GT). For more information on data loading check out the tutorial 01 Data Loading. For more information on data handling check out the tutorial 02 Data Preparation.
For each sensor the minimum value is -200, which is also the default value when there is an error in the measurements. We will ignore them by converting them to NaN
.
df = pd.read_csv('data/AirQualityUCI.csv', delimiter=';')
# Replace missing values with NaN
df.replace(-200.0, np.nan, inplace=True)
# First two columns are date and time
# We convert it to a single column with datetime format
df['Date'] = pd.to_datetime(df['Date'] + ' ' + df['Time'], format='%d/%m/%Y %H.%M.%S')
# Define an initial date to compare all other to it
ini_date = pd.Timestamp('2004-03-10 00:00:00.0')
# Get elapsed hours
df['Time'] = (df['Date'] - ini_date) / pd.Timedelta(hours=1)
# Use only the first eight days of data
df2 = df[df['Date'] < pd.Timestamp('2004-03-19 00:00:00.0')]
Remove aditional data to simulate sensor failure. In this case for each channel we will first remove 50% of the observations and then remove complete sectors in order to get reconstructions from the other channels through learned cross correlations.
We will also use data transformation as defined in the data preparation and transformation tutorial 02 Data Preparation where each channel is normalized so it has zero mean and unit variance.
cols = ['CO(GT)', 'NMHC(GT)', 'C6H6(GT)', 'NOx(GT)', 'NO2(GT)']
dataset = mogptk.LoadDataFrame(df2, x_col='Time', y_col=cols)
for channel in dataset:
channel.remove_randomly(pct=0.5)
# drop relative ranges to simulate sensor failure
dataset[0].remove_relative_range(0.2, 0.3)
dataset[1].remove_relative_range(0.8, 1.0)
dataset[2].remove_relative_range(0.9, 1.0)
dataset[3].remove_relative_range(0.8, 1.0)
dataset[4].remove_relative_range(0.0, 0.2)
for channel in dataset:
channel.transform(mogptk.TransformDetrend(degree=1))
channel.transform(mogptk.TransformStandard())
dataset.plot();
method = 'Adam'
iters = 1000
lr = 0.015
Independent channels with spectral mixture kernels¶
For each channel we will use four mixtures.
sm = mogptk.SM(dataset, Q=4)
sm.init_parameters('BNSE')
sm.train(method=method, lr=lr, iters=iters, verbose=True, error='MAE', plot=True);
Starting optimization using Adam ‣ Model: Exact ‣ Kernel: IndependentMultiOutputKernel ‣ Likelihood: GaussianLikelihood ‣ Channels: 5 ‣ Parameters: 65 ‣ Training points: 401 ‣ Iterations: 1000 0/1000 0:00:01 loss= 493.812 error= 33.5505 (warmup) 30/1000 0:00:10 loss= 482.051 error= 34.6115 330/1000 0:00:20 loss= 417.233 error= 33.6955 636/1000 0:00:30 loss= 400.423 error= 32.9908 951/1000 0:00:40 loss= 395.632 error= 32.8146 1000/1000 0:00:41 loss= 395.184 error= 32.7584 Optimization finished in 41.631 seconds
Evaluating errors¶
Using the mogptk.error
function given a test input and output it calculates:
- Mean absolute error (MAE)
- Mean absolute percentage error (MAPE)
- Root mean squared error (RMSE)
If only the raw values $y_{true} - y_{pred}$ are desired, the flag simple
must be passed.
Multiple models for the same test set (X, Y) can be passed at once, the result will be a list with one element for each model passed, where each element is another list of length equal to the number or channels where each element is the error for said model and channel. This enables to obtain errors for multiple models for the same test set where each channel can have different numbers of test points.
sm.plot_prediction(title='SM on Air Quality Data');
mogptk.error(sm, disp=True, per_channel=False);
MAE | MAPE | RMSE | |
---|---|---|---|
Name | |||
SM | 32.758392 | 51.100423 | 60.292392 |
Multi Output Spectral Mixture (MOSM)¶
Next we use the multi output spectral mixture kernel (Parra et al, 2016).
mosm = mogptk.MOSM(dataset, Q=4)
mosm.init_parameters('BNSE')
mosm.train(method=method, lr=lr, iters=iters, verbose=True, error='MAE', plot=True)
mosm.plot_prediction(title='MOSM on Air Quality Data');
Starting optimization using Adam ‣ Model: Exact ‣ Kernel: MultiOutputSpectralMixtureKernel ‣ Likelihood: GaussianLikelihood ‣ Channels: 5 ‣ Parameters: 105 ‣ Training points: 401 ‣ Iterations: 1000 0/1000 0:00:04 loss= 465.023 error= 31.6777 (warmup) 2/1000 0:00:20 loss= 466.379 error= 32.2202 3/1000 0:00:20 loss= 464.399 error= 31.9518 122/1000 0:00:30 loss= 418.562 error= 29.907 251/1000 0:00:40 loss= 391.92 error= 30.59 374/1000 0:00:50 loss= 373.975 error= 30.546 504/1000 0:01:00 loss= 366.26 error= 30.8178 629/1000 0:01:10 loss= 363.238 error= 30.9285 751/1000 0:01:20 loss= 362.178 error= 31.0244 881/1000 0:01:30 loss= 361.64 error= 31.1205 1000/1000 0:01:39 loss= 361.385 error= 31.2423 Optimization finished in 1 minute 39 seconds
Cross Spectral Mixture (CSM)¶
Then we use the cross spectral mixture kernel (Ulrich et al, 2015).
csm = mogptk.CSM(dataset, Q=4)
csm.init_parameters()
csm.train(method=method, lr=lr, iters=iters, verbose=True, error='MAE', plot=True)
csm.plot_prediction(title='CSM on Air Quality Data');
Starting optimization using Adam ‣ Model: Exact ‣ Kernel: MixtureKernel.CrossSpectralKernel ‣ Likelihood: GaussianLikelihood ‣ Channels: 5 ‣ Parameters: 53 ‣ Training points: 401 ‣ Iterations: 1000 0/1000 0:00:09 loss= 458.857 error= 28.2007 (warmup) 1/1000 0:00:10 loss= 460.084 error= 30.3136 (warmup) 2/1000 0:00:59 loss= 454.891 error= 29.0431 3/1000 0:00:59 loss= 454.519 error= 28.7162 4/1000 0:00:59 loss= 455.341 error= 28.9863 5/1000 0:01:00 loss= 454.292 error= 29.0775 6/1000 0:01:00 loss= 452.94 error= 29.1527 66/1000 0:01:10 loss= 431.544 error= 28.33 127/1000 0:01:20 loss= 414.01 error= 27.9179 192/1000 0:01:30 loss= 399.045 error= 27.5828 255/1000 0:01:40 loss= 388.186 error= 27.3395 318/1000 0:01:50 loss= 380.639 error= 27.1727 380/1000 0:02:00 loss= 375.885 error= 27.056 441/1000 0:02:10 loss= 373.094 error= 26.9746 501/1000 0:02:20 loss= 371.484 error= 26.9425 561/1000 0:02:30 loss= 370.54 error= 26.922 619/1000 0:02:40 loss= 369.962 error= 26.9104 680/1000 0:02:50 loss= 369.578 error= 26.8815 740/1000 0:03:00 loss= 369.217 error= 26.9088 799/1000 0:03:10 loss= 368.964 error= 26.9079 858/1000 0:03:20 loss= 368.749 error= 26.9212 919/1000 0:03:30 loss= 368.565 error= 26.9302 979/1000 0:03:40 loss= 368.414 error= 26.9437 1000/1000 0:03:43 loss= 368.367 error= 26.9469 Optimization finished in 3 minutes 43 seconds
Spectral Mixture - Linear Model of Corregionalization (SM-LMC)¶
Lastly we fit the spectral mixture linear model of corregionalization (Wilson, 2014).
smlmc = mogptk.SM_LMC(dataset, Q=4)
smlmc.init_parameters()
smlmc.train(method=method, lr=lr, iters=iters, verbose=True, error='MAE', plot=True)
smlmc.plot_prediction(title='SM-LMC on Air Quality Data');
Starting optimization using Adam ‣ Model: Exact ‣ Kernel: LinearModelOfCoregionalizationKernel ‣ Likelihood: GaussianLikelihood ‣ Channels: 5 ‣ Parameters: 33 ‣ Training points: 401 ‣ Iterations: 1000 0/1000 0:00:06 loss= 462.438 error= 27.8478 (warmup) 2/1000 0:00:45 loss= 461.582 error= 27.9806 3/1000 0:00:45 loss= 461.076 error= 27.8748 4/1000 0:00:45 loss= 460.679 error= 27.8053 5/1000 0:00:45 loss= 460.307 error= 27.7862 45/1000 0:00:50 loss= 444.873 error= 27.584 139/1000 0:01:00 loss= 415.802 error= 27.2393 232/1000 0:01:10 loss= 397.499 error= 27.1011 325/1000 0:01:20 loss= 387.23 error= 27.0715 418/1000 0:01:30 loss= 382.125 error= 27.0886 510/1000 0:01:40 loss= 379.879 error= 27.0907 617/1000 0:01:50 loss= 378.659 error= 27.0857 721/1000 0:02:00 loss= 378.005 error= 27.0828 820/1000 0:02:10 loss= 377.559 error= 27.0787 913/1000 0:02:20 loss= 377.219 error= 27.078 1000/1000 0:02:29 loss= 376.999 error= 27.0684 Optimization finished in 2 minutes 29 seconds
Convolutional Gaussian (CONV)¶
conv = mogptk.CONV(dataset, Q=4)
conv.init_parameters()
conv.train(method=method, lr=lr, iters=iters, verbose=True, error='MAE', plot=True)
conv.plot_prediction(title='CONV on Air Quality Data');
/home/taco/src/mogptk/mogptk/gpr/parameter.py:219: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /opt/conda/conda-bld/pytorch_1699449201336/work/torch/csrc/utils/tensor_new.cpp:261.) return torch.tensor(value, device=config.device, dtype=config.dtype)
Starting optimization using Adam ‣ Model: Exact ‣ Kernel: MixtureKernel.GaussianConvolutionProcessKernel ‣ Likelihood: GaussianLikelihood ‣ Channels: 5 ‣ Parameters: 49 ‣ Training points: 401 ‣ Iterations: 1000 0/1000 0:00:06 loss= 596.674 error= 23.5805 (warmup) 2/1000 0:00:33 loss= 595.608 error= 23.5612 3/1000 0:00:34 loss= 595.075 error= 23.5515 4/1000 0:00:34 loss= 594.543 error= 23.5418 36/1000 0:00:40 loss= 577.754 error= 23.2108 90/1000 0:00:50 loss= 550.194 error= 22.6027 146/1000 0:01:00 loss= 521.932 error= 21.8585 200/1000 0:01:10 loss= 494.961 error= 20.9856 254/1000 0:01:20 loss= 468.328 error= 20.0551 308/1000 0:01:30 loss= 442.135 error= 19.1864 366/1000 0:01:40 loss= 414.681 error= 18.3462 422/1000 0:01:50 loss= 389.117 error= 17.6593 479/1000 0:02:00 loss= 364.429 error= 17.0883 533/1000 0:02:10 loss= 342.67 error= 16.6215 588/1000 0:02:20 loss= 322.49 error= 16.2286 645/1000 0:02:30 loss= 303.941 error= 15.9123 699/1000 0:02:40 loss= 288.696 error= 15.6565 754/1000 0:02:50 loss= 275.501 error= 15.4196 811/1000 0:03:00 loss= 264.203 error= 15.207 865/1000 0:03:10 loss= 255.502 error= 15.0134 919/1000 0:03:20 loss= 248.387 error= 14.7984 974/1000 0:03:30 loss= 242.306 error= 14.5389 1000/1000 0:03:34 loss= 239.709 error= 14.4087 Optimization finished in 3 minutes 35 seconds
Compare errors¶
We will take the mean MAE, MAPE, and RMSE for all the channels and compare all the models.
mogptk.error(sm, mosm, csm, smlmc, conv, disp=True)
MAE | MAPE | RMSE | |
---|---|---|---|
Name | |||
SM | 32.758392 | 51.100423 | 60.292392 |
MOSM | 31.242253 | 47.680504 | 57.020081 |
CSM | 26.946854 | 38.145051 | 52.273070 |
SM-LMC | 27.068414 | 38.755937 | 51.631192 |
CONV | 14.408723 | 19.680302 | 26.118271 |