hetGPy Intro: Background and Basic Usage
This notebook gives a high-level overview of the basics of
hetGPyWe discuss the core model objects, regression, and prediction
This document assumes some prior knowledge of Gaussian Process modeling. For those looking for a refresher, some suggested sources are Ch.2 of Rassmussen and Williams (2006), Ch. 5 of Gramacy (2020), and Ch. 2-3 of Garnett (2023).
What is hetGPy?
hetGPy is a Python library for homoskedastic and heteroskedastic Gaussian Process modeling. It is a Python port of the hetGP R package from Binois and Gramacy (2018). The package also includes functions for sequential design and Bayesian Optimization.
Why is a Python port needed when we have rpy2 to call R code from Python?
For the basic use cases of Gaussian Process modeling, most of the time you can wrap the R code from hetGP via the rpy2 Python library. However, there are three principle motivations for having a Python port:
Environment restrictions and sequential design:
Gaussian Process modeling is especially useful for computer experiments and Bayesian Optimization. In both domains, it is common to work with expensive simulation functions which are run on high-performance computing clusters, and many of these simulators are written in Python. Thus, when working on computing clusters, it is advantageous to use one type of software rather than trying to build multiple computing environments. Additionally, in
rpy2the R objects live in R’s memory space, and their size is unknown to Python. For large-scale analyses, this could prove problematic.In many cases, computer experiments and Bayesian Optimization rely on sequential design: initialzing some set of inputs, acquiring (usually expensive) data, and then choosing the next set of experiments via some criteria or acquisition function. Thus, keeping the entire analysis in one language (in this case, Python) is desirable for human-out-of-the-loop workflows.
Opportunity for extensions:
The core maximum likelihood routines in
hetGPandhetGPyrely on hand-coded likelihoods and gradients, and are thus restricted to a subset of closed-form covariance kernels (in this case, Gaussian and Matern kernels (\(\nu = 3/2, 5/2\))). State of the art Python libraries, such as pytorch and jax utilize autodifferentiation methods, which would allow for a more flexible use of arbitrary covariance kernels inhetGPy.Availability of high-performance optimization functions:
Most of the computation in
hetGPyis the optimization of the maximum likelihood which relies on the L-BFGS-B algorithm, a standard constrained optimization routine. The definitive L-BFGS-B implementation is by Nocedal and Morales (2011) in Fortran, which is callable from scipy but not from base R.
Basic Setup
The central objects that performs Gaussian process regeression (GPR) in
hetGPyare thehomGPandhetGPclasses for homoskedastic and heteroskedasic regression.
First, we show homGP modeling on noiseless data, noisy data, and finally on heteroskedastic noise.
Noiseless case
[1]:
from hetgpy import homGP
import numpy as np
import matplotlib.pyplot as plt
xgrid = np.linspace(0,2*np.pi, 100).reshape(-1,1)
xs = xgrid.squeeze() # for plotting
X = np.linspace(0,2*np.pi, 10).reshape(-1,1)
Z = np.sin(X).squeeze()
To train a GPR, instantiate the model object (
homGPorhetGP) and call the maximum likelihood estimation function (mlewhich aliases the functionsmleHomGPandmleHetGPwhich are the equivalent function names in thehetGPR package).After training, call
predicton the model object to return the mean and variance predictions.
[2]:
# now train a homoskedastic Gaussian Process Regression
%config InlineBackend.figure_formats = ['svg']
fig, ax = plt.subplots()
ax.scatter(X, Z, color = 'k', label = 'Data')
ax.legend(edgecolor='black')
ax.set_xlabel('X'); ax.set_ylabel('sin(X)')
GP = homGP()
GP.mle(
X = X,
Z = Z,
covtype = "Gaussian",
lower = np.array([0.1]),
upper = np.array([2.0])
)
preds = GP.predict(x = xgrid)
ax.plot(xs, preds['mean'], color = 'blue', label = 'Mean')
ax.legend();
Fast Estimation under Replication
hetGPyis designed to handle replicationConsider the same example, except this time with some replication (5 each) and noise
Like its R counterpart,
hetGPydoes training on the order of the unique input locations \(n\) rather than the full dataset \(N\). When replication is high, as is common in stochastic computer experiments, this can yield much shorter training times.
[3]:
rand = np.random.default_rng(seed = 42)
X = np.linspace(0,2*np.pi, 10)
X = np.repeat(X,repeats = 5).reshape(-1,1)
Z = np.sin(X).squeeze()
Z += 0.2 * rand.normal(loc = 0,size = len(Z))
fig, ax = plt.subplots()
ax.plot(xs, np.sin(xs), label = 'True', color = 'red', alpha = 0.4)
ax.scatter(X, Z, color = 'k', label = 'Data')
ax.legend(edgecolor='black')
ax.set_xlabel('X'); ax.set_ylabel('sin(X)');
GP = homGP()
GP.mle(
X = X,
Z = Z,
covtype = "Gaussian",
lower = np.array([0.1]),
upper = np.array([2.0])
)
preds = GP.predict(x = xgrid, interval = 'predictive', interval_lower = 0.05, interval_upper = 0.95)
interval = preds['predictive_interval']
ax.plot(xs, preds['mean'], color = 'blue', label = 'Mean (homGP)')
ax.plot(xs, interval['lower'], color = 'blue', label = '90% Predictive Interval (homGP)', linestyle = 'dashed')
ax.plot(xs, interval['upper'], color = 'blue',linestyle = 'dashed')
ax.legend(edgecolor='black');
Heteroskedastic Gaussian Process Regression
We can also perform heteroskedastic GPR. Consider the mcycle motorocycle simulation data from R (available in hetGPy) which is known to have heteroskedasticity.
The major advantage of a heteroskedastic approach is the improved uncertainty quantification, in that we estimate a wider predictive interval in the higher noise regions, and a narrow one in lower noise regions, especially compared to a homoskedastic model.
We model the noise as a series of latent variables at each unique input via a second (homoskedastic) GP. The model for the noise process is from Binois and Gramacy (2018) and is also given in Gramacy (2020), Ch. 10 (equation 10.7):
\begin{align*} \Lambda_n = C_{(\delta)} K_{(\delta)}^{-1} \Delta_n, \text{ where } K_{(\delta)} = (C_{(\delta)} + g_{(\delta)} A_n^{-1}). \end{align*}
Where \(\Lambda_n\) is the prediction of the noise, \(C_{(\delta)}\) is the covariance matrix, \(\Delta_n\) is the estimated variance at each input, and \(A_n^{-1}\) is the inverse of the number of replicates for each input location (stored in a diagonal matrix).
[4]:
from hetgpy import hetGP
from hetgpy.example_data import mcycle
m = mcycle()
X = m['times']
Z = m['accel']
xgrid = np.linspace(0,60,301).reshape(-1,1) # predictive grid
xs = xgrid.squeeze()
hom, het = homGP(), hetGP()
# homGP
hom.mle(
X = X,
Z = Z,
covtype = "Matern5_2",
lower = np.array([0.1]),
upper = np.array([50.0])
)
preds_hom = hom.predict(x = xgrid, interval = 'predictive', interval_lower = 0.05, interval_upper = 0.95)
interval_hom = preds_hom['predictive_interval']
# hetGP
het.mle(
X = X,
Z = Z,
covtype = "Matern5_2",
lower = np.array([0.1]),
upper = np.array([50.0])
)
preds_het = het.predict(x = xgrid, interval = 'predictive', interval_lower = 0.05, interval_upper = 0.95)
interval_het = preds_het['predictive_interval']
fig, ax = plt.subplots(nrows=2,figsize=(9,6),sharey=True)
ax[0].scatter(X.squeeze(),Z,label='Data',color='black')
ax[1].scatter(X.squeeze(),Z,color='black')
ax[0].plot(xs, preds_hom['mean'], color = 'blue', label = 'homGP')
ax[0].plot(xs, interval_hom['lower'], color = 'blue', linestyle = 'dashed')
ax[0].plot(xs, interval_hom['upper'], color = 'blue',linestyle = 'dashed')
ax[1].plot(xs, preds_het['mean'], color = 'red', label = 'hetGP')
ax[1].plot(xs, interval_het['lower'], color = 'red', linestyle = 'dashed')
ax[1].plot(xs, interval_het['upper'], color = 'red',linestyle = 'dashed')
fig.legend(edgecolor='black',loc='upper center',ncol=3,bbox_to_anchor=(0.5,1.05))
[ax[i].set_xlabel('times (milliseconds)') for i in range(2)];
[ax[i].set_ylabel('accel (g)') for i in range(2)];
fig.tight_layout()
Specifying initializations for the hyperparamters
Specify starting values for the hyperparameters via the
init = {param:value}argument
For example, to initialize the lengthscale \(\theta\) at \(\theta_0=10\):
[5]:
GP = hetGP()
GP.mle(
X = X,
Z = Z,
lower = np.array([1.0]),
upper = np.array([10]),
init = {'theta': np.array([10.0])}
)
print(f"Lengthscale: {GP.theta}")
print(f"Log likelihood: {GP.ll}")
Lengthscale: [9.95168406]
Log likelihood: -499.2919739824025
Specifying known values of hyperparameters
You can also override the hyperparameter learning via the
knownargument
For example, to fix the lengthscale of the mean process at \(\theta=3\) (and note the lower fit based on log-likelood):
[ ]:
GP = hetGP()
GP.mle(
X = X,
Z = Z,
lower = np.array([1.0]),
upper = np.array([10]),
known = {'theta': np.array([3.0])}
)
print(f"Lengthscale: {GP.theta}")
print(f"Log likelihood: {GP.ll}")
Lengthscale: [3.]
Log likelihood: -513.7520035600228