`profit.sur.gp.backend.gp_functions`

Collection of functions for the Custom GPSurrogate.

Module Contents

Functions

`optimize`(xtrain, ytrain, a0, kernel[, fixed_sigma_n, ...])	Finds optimal hyperparameters from initial array a0, sorted as [length_scale, scale, noise].
`solve_cholesky`(L, b)	Solves a linear equation with a lower triangular matrix L from the Cholesky decomposition.
`negative_log_likelihood_cholesky`(hyp, X, y, kernel[, ...])	Computes the negative log likelihood using the Cholesky decomposition of the covariance matrix.
`negative_log_likelihood`(hyp, X, y, kernel[, ...])	Computes the negative log likelihood either by a Cholesky- or an Eigendecomposition.
`invert_cholesky`(L)	Inverts a positive-definite matrix based on a Cholesky decomposition.
`invert`(K[, neig, tol, eps, max_iter])	Inverts a positive-definite matrix using either a Cholesky- or an Eigendecomposition.
`marginal_variance_BBQ`(Xtrain, ytrain, Xpred, kernel, ...)	Calculates the marginal variance to infer the next point in active learning.
`predict_f`(hyp, x, y, xtest, kernel[, return_full_cov, ...])	Predicts values given only a set of hyperparameters and a kernel.

profit.sur.gp.backend.gp_functions.optimize(xtrain, ytrain, a0, kernel, fixed_sigma_n=False, eval_gradient=False, return_hess_inv=False)[source]

Finds optimal hyperparameters from initial array a0, sorted as [length_scale, scale, noise].

The objective function, which is minimized, is the negative log likelihood. The hyperparameters are transformed logarithmically before the optimization to enhance its flexibility. They are transformed back afterwards.

Parameters:

xtrain (ndarray) –
ytrain (ndarray) –
a0 (ndarray) – Flat array of initial hyperparameters.
kernel (function) – Function of the kernel, not the actual matrix.
fixed_sigma_n (bool) – If the noise \(\sigma_n\) should be fixed during optimization.
eval_gradient (bool) – Whether the analytic derivatives of the kernel and likelihood are used in the optimization.
return_hess_inv (bool) – If True, returns the inverse Hessian matrix from the optimization result. This is important for advanced active learning.

Returns:

A tuple containing:

opt_hyperparameters (ndarray): Flat array of optimized hyperparameters.
hess_inv (scipy.optimize.LbfgsInvHessProduct): Inverse Hessian matrix in the form of a scipy linear

operator. If return_hess_inv is False, only the optimized hyperparameters are returned.

Return type:

tuple

profit.sur.gp.backend.gp_functions.solve_cholesky(L, b)[source]

Solves a linear equation with a lower triangular matrix L from the Cholesky decomposition.

In the context of GP’s the \(L\) is the Cholesky decomposition of the training kernel matrix and \(b\) is the training output data.

\[ \begin{equation} \alpha = L^T L b \end{equation} \]

Parameters:

L (ndarray) – Lower triangular matrix.
b (ndarray) – A vector.

Returns:

ndarray

profit.sur.gp.backend.gp_functions.negative_log_likelihood_cholesky(hyp, X, y, kernel, eval_gradient=False, log_scale_hyp=False, fixed_sigma_n=False)[source]

Computes the negative log likelihood using the Cholesky decomposition of the covariance matrix.

The calculation follows Rasmussen&Williams 2006, p. 19, 113-114.

\[\begin{split} \begin{align} NL &= \frac{1}{2} y^T \alpha + tr(\log(L)) + \frac{n}{2} \log(2 \pi) \\ \frac{dNL}{d\theta} &= \frac{1}{2} tr\left( (K_y^{-1} - \alpha \alpha^T) \frac{\partial K}{\partial \theta} \right) \\ \alpha &= K_y^{-1} y \end{align} \end{split}\]

Parameters:

hyp (ndarray) – Flat hyperparameter array [length_scale, sigma_f, sigma_n]. They can also be already log transformed.
X (ndarray) – Training input points.
y (ndarray) – Observed training output.
kernel (function) – Function to build the covariance matrix.
eval_gradient (bool) – If the analytic gradient of the negative log likelihood w.r.t. the hyperparameters should be returned.
log_scale_hyp (bool) – Whether the hyperparameters are log transformed. This is important for the gradient calculation.
fixed_sigma_n (bool) – If the noise \(\sigma_n\) is kept fixed. In this case, there is no gradient with respect to sigma_n.

Returns:

A tuple containing:

nll (float): The negative log likelihood.
dnll (ndarray): The derivative of the negative log likelihood w.r.t. to the hyperparameters.

If eval_gradient is False, only nll is returned.

Return type:

tuple

profit.sur.gp.backend.gp_functions.negative_log_likelihood(hyp, X, y, kernel, eval_gradient=False, log_scale_hyp=False, fixed_sigma_n_value=None, neig=0, max_iter=1000)[source]

Computes the negative log likelihood either by a Cholesky- or an Eigendecomposition.

First, the Cholesky decomposition is tried. If this results in a LinAlgError, the biggest eigenvalues are calculated until convergence or until the maximum iterations are reached. The eigenvalues are cut off at \(1e-10\) due ensure numerical stability.

\[\begin{split} \begin{align} NL &= \frac{1}{2} \left( y^T \alpha + tr(\log(\lambda)) + n_{eig} \log(2 \pi) \right) \\ \frac{dNL}{d\theta} &= \frac{1}{2} tr\left( (K_y^{-1} - \alpha \alpha^T) \frac{\partial K}{\partial \theta} \right) \\ \alpha &= v (\lambda^{-1} (v^T y)) \end{align} \end{split}\]

Parameters:

hyp (ndarray) – Flat hyperparameter array [length_scale, sigma_f, sigma_n].
X (ndarray) – Training input points.
y (ndarray) – Observed training output.
kernel (function) – Function to build the covariance matrix.
eval_gradient (bool) – If the analytic gradient of the negative log likelihood w.r.t. the hyperparameters should be returned.
log_scale_hyp (bool) – Whether the hyperparameters are log transformed. This is important for the gradient calculation.
fixed_sigma_n_value (float) – The value of the fixed noise \(\sigma_n\). If it should be optimizied as well, this should be None.
neig (int) – Initial number of eigenvalues to calculate if the Cholesky decomposition is not successful. This is doubled during every iteration.
max_iter (int) – Maximum number of iterations of the eigenvalue solver until convergence must be reached.

Returns:

A tuple containing:

nll (float): The negative log likelihood.
dnll (ndarray): The derivative of the negative log likelihood w.r.t. to the hyperparameters.

If eval_gradient is False, only nll is returned.

Return type:

tuple

profit.sur.gp.backend.gp_functions.invert_cholesky(L)[source]

Inverts a positive-definite matrix based on a Cholesky decomposition.

This is used to invert the covariance matrix.

Parameters:: L (ndarray) – Lower triangular matrix from a Cholesky decomposition.
Returns:: Inverse of the matrix \(L^T L\)
Return type:: ndarray

profit.sur.gp.backend.gp_functions.invert(K, neig=0, tol=1e-10, eps=1e-06, max_iter=1000)[source]

Inverts a positive-definite matrix using either a Cholesky- or an Eigendecomposition.

The solution method depends on the rapidness of decay of the eigenvalues.

Parameters:

K (np.ndarray) – Kernel matrix.
neig (int) – Initial number of eigenvalues to calculate if the Cholesky decomposition is not successful. This is doubled during every iteration.
tol (float) – Convergence criterion for the eigenvalues.
eps (float) – Small number to be added to diagonal of kernel matrix to ensure positive definiteness.
max_iter (int) – Maximum number of iterations of the eigenvalue solver until convergence must be reached.

Returns:

Inverse covariance matrix.

Return type:

np.ndarray

profit.sur.gp.backend.gp_functions.marginal_variance_BBQ(Xtrain, ytrain, Xpred, kernel, hyperparameters, hess_inv, fixed_sigma_n=False, alpha=None, predictive_variance=0)[source]

Calculates the marginal variance to infer the next point in active learning.

The calculation follows Osborne (2012).

\(\tilde{V}\) … Marginal covariance matrix \(\hat{V}\) … Predictive variance \(\frac{dm}{d\theta}\) … Derivative of the predictive mean w.r.t. the hyperparameters \(H\) … Hessian matrix

\[ \begin{equation} \tilde{V} = \left( \frac{dm}{d\theta} \right) H^{-1} \left( \frac{dm}{d\theta} \right)^T \end{equation} \]

Parameters:

Xtrain (ndarray) – Input training points.
ytrain (ndarray) – Observed output daat.
Xpred (ndarray) – Possible prediction input points.
kernel (function) – Function to build the covariance matrix.
hyperparameters (dict) – Dictionary of the hyperparameters.
hess_inv (ndarray) – Inverse Hessian matrix.
fixed_sigma_n (bool) – If the noise \(\sigma_n\) was fixed during optimization. Then, the Hessian has to be padded with zeros.
alpha (ndarray) – If available, the result of \(K_y^{-1} y\), else None.
predictive_variance (ndarray) – Predictive variance only. This is added to the marginal variance.

Returns:

Sum of the actual marginal variance and the predictive variance.

Return type:

ndarray

profit.sur.gp.backend.gp_functions.predict_f(hyp, x, y, xtest, kernel, return_full_cov=False, neig=0)[source]

Predicts values given only a set of hyperparameters and a kernel.

This function is independent of the surrogates and used to quickly predict a function output with specific hyperparameters.

The calculation follows Rasmussen&Williams, p. 16, eq. 23-24.

Parameters:

hyp (ndarray) – Flat array of hyperparameters, sorted as [length_scale, sigma_f, sigma_n].
x (ndarray) – Input training points.
y (ndarray) – Observed output data.
xtest (ndarray) – Prediction points.
kernel (function) – Function to build the covariance matrix.
return_full_cov (bool) – If True, returns the full covariance matrix, otherwise only its diagonal.
neig (int) – Initial number of eigenvalues to be computed during the inversion of the covariance matrix.

Returns:

A tuple containing:

fstar (ndarray): Posterior mean.
vstar (ndarray): Posterior covariance matrix or its diagonal.

Return type:

tuple

profit.sur.gp.backend.gp_functions

Module Contents

Functions

`profit.sur.gp.backend.gp_functions`