profit.sur.gp.backend.gp_functions

Collection of functions for the Custom GPSurrogate.

Module Contents

Functions

optimize(xtrain, ytrain, a0, kernel[, fixed_sigma_n, ...])

Finds optimal hyperparameters from initial array a0, sorted as [length_scale, scale, noise].

solve_cholesky(L, b)

Solves a linear equation with a lower triangular matrix L from the Cholesky decomposition.

negative_log_likelihood_cholesky(hyp, X, y, kernel[, ...])

Computes the negative log likelihood using the Cholesky decomposition of the covariance matrix.

negative_log_likelihood(hyp, X, y, kernel[, ...])

Computes the negative log likelihood either by a Cholesky- or an Eigendecomposition.

invert_cholesky(L)

Inverts a positive-definite matrix based on a Cholesky decomposition.

invert(K[, neig, tol, eps, max_iter])

Inverts a positive-definite matrix using either a Cholesky- or an Eigendecomposition.

marginal_variance_BBQ(Xtrain, ytrain, Xpred, kernel, ...)

Calculates the marginal variance to infer the next point in active learning.

predict_f(hyp, x, y, xtest, kernel[, return_full_cov, ...])

Predicts values given only a set of hyperparameters and a kernel.

profit.sur.gp.backend.gp_functions.optimize(xtrain, ytrain, a0, kernel, fixed_sigma_n=False, eval_gradient=False, return_hess_inv=False)[source]

Finds optimal hyperparameters from initial array a0, sorted as [length_scale, scale, noise].

The objective function, which is minimized, is the negative log likelihood. The hyperparameters are transformed logarithmically before the optimization to enhance its flexibility. They are transformed back afterwards.

Parameters:
  • xtrain (ndarray) –

  • ytrain (ndarray) –

  • a0 (ndarray) – Flat array of initial hyperparameters.

  • kernel (function) – Function of the kernel, not the actual matrix.

  • fixed_sigma_n (bool) – If the noise \(\sigma_n\) should be fixed during optimization.

  • eval_gradient (bool) – Whether the analytic derivatives of the kernel and likelihood are used in the optimization.

  • return_hess_inv (bool) – If True, returns the inverse Hessian matrix from the optimization result. This is important for advanced active learning.

Returns:

A tuple containing:
  • opt_hyperparameters (ndarray): Flat array of optimized hyperparameters.

  • hess_inv (scipy.optimize.LbfgsInvHessProduct): Inverse Hessian matrix in the form of a scipy linear

operator. If return_hess_inv is False, only the optimized hyperparameters are returned.

Return type:

tuple

profit.sur.gp.backend.gp_functions.solve_cholesky(L, b)[source]

Solves a linear equation with a lower triangular matrix L from the Cholesky decomposition.

In the context of GP’s the \(L\) is the Cholesky decomposition of the training kernel matrix and \(b\) is the training output data.

\[ \begin{equation} \alpha = L^T L b \end{equation} \]

Parameters:
  • L (ndarray) – Lower triangular matrix.

  • b (ndarray) – A vector.

Returns:

ndarray

profit.sur.gp.backend.gp_functions.negative_log_likelihood_cholesky(hyp, X, y, kernel, eval_gradient=False, log_scale_hyp=False, fixed_sigma_n=False)[source]

Computes the negative log likelihood using the Cholesky decomposition of the covariance matrix.

The calculation follows Rasmussen&Williams 2006, p. 19, 113-114.

\[\begin{split} \begin{align} NL &= \frac{1}{2} y^T \alpha + tr(\log(L)) + \frac{n}{2} \log(2 \pi) \\ \frac{dNL}{d\theta} &= \frac{1}{2} tr\left( (K_y^{-1} - \alpha \alpha^T) \frac{\partial K}{\partial \theta} \right) \\ \alpha &= K_y^{-1} y \end{align} \end{split}\]

Parameters:
  • hyp (ndarray) – Flat hyperparameter array [length_scale, sigma_f, sigma_n]. They can also be already log transformed.

  • X (ndarray) – Training input points.

  • y (ndarray) – Observed training output.

  • kernel (function) – Function to build the covariance matrix.

  • eval_gradient (bool) – If the analytic gradient of the negative log likelihood w.r.t. the hyperparameters should be returned.

  • log_scale_hyp (bool) – Whether the hyperparameters are log transformed. This is important for the gradient calculation.

  • fixed_sigma_n (bool) – If the noise \(\sigma_n\) is kept fixed. In this case, there is no gradient with respect to sigma_n.

Returns:

A tuple containing:
  • nll (float): The negative log likelihood.

  • dnll (ndarray): The derivative of the negative log likelihood w.r.t. to the hyperparameters.

If eval_gradient is False, only nll is returned.

Return type:

tuple

profit.sur.gp.backend.gp_functions.negative_log_likelihood(hyp, X, y, kernel, eval_gradient=False, log_scale_hyp=False, fixed_sigma_n_value=None, neig=0, max_iter=1000)[source]

Computes the negative log likelihood either by a Cholesky- or an Eigendecomposition.

First, the Cholesky decomposition is tried. If this results in a LinAlgError, the biggest eigenvalues are calculated until convergence or until the maximum iterations are reached. The eigenvalues are cut off at \(1e-10\) due ensure numerical stability.

\[\begin{split} \begin{align} NL &= \frac{1}{2} \left( y^T \alpha + tr(\log(\lambda)) + n_{eig} \log(2 \pi) \right) \\ \frac{dNL}{d\theta} &= \frac{1}{2} tr\left( (K_y^{-1} - \alpha \alpha^T) \frac{\partial K}{\partial \theta} \right) \\ \alpha &= v (\lambda^{-1} (v^T y)) \end{align} \end{split}\]

Parameters:
  • hyp (ndarray) – Flat hyperparameter array [length_scale, sigma_f, sigma_n].

  • X (ndarray) – Training input points.

  • y (ndarray) – Observed training output.

  • kernel (function) – Function to build the covariance matrix.

  • eval_gradient (bool) – If the analytic gradient of the negative log likelihood w.r.t. the hyperparameters should be returned.

  • log_scale_hyp (bool) – Whether the hyperparameters are log transformed. This is important for the gradient calculation.

  • fixed_sigma_n_value (float) – The value of the fixed noise \(\sigma_n\). If it should be optimizied as well, this should be None.

  • neig (int) – Initial number of eigenvalues to calculate if the Cholesky decomposition is not successful. This is doubled during every iteration.

  • max_iter (int) – Maximum number of iterations of the eigenvalue solver until convergence must be reached.

Returns:

A tuple containing:
  • nll (float): The negative log likelihood.

  • dnll (ndarray): The derivative of the negative log likelihood w.r.t. to the hyperparameters.

If eval_gradient is False, only nll is returned.

Return type:

tuple

profit.sur.gp.backend.gp_functions.invert_cholesky(L)[source]

Inverts a positive-definite matrix based on a Cholesky decomposition.

This is used to invert the covariance matrix.

Parameters:

L (ndarray) – Lower triangular matrix from a Cholesky decomposition.

Returns:

Inverse of the matrix \(L^T L\)

Return type:

ndarray

profit.sur.gp.backend.gp_functions.invert(K, neig=0, tol=1e-10, eps=1e-06, max_iter=1000)[source]

Inverts a positive-definite matrix using either a Cholesky- or an Eigendecomposition.

The solution method depends on the rapidness of decay of the eigenvalues.

Parameters:
  • K (np.ndarray) – Kernel matrix.

  • neig (int) – Initial number of eigenvalues to calculate if the Cholesky decomposition is not successful. This is doubled during every iteration.

  • tol (float) – Convergence criterion for the eigenvalues.

  • eps (float) – Small number to be added to diagonal of kernel matrix to ensure positive definiteness.

  • max_iter (int) – Maximum number of iterations of the eigenvalue solver until convergence must be reached.

Returns:

Inverse covariance matrix.

Return type:

np.ndarray

profit.sur.gp.backend.gp_functions.marginal_variance_BBQ(Xtrain, ytrain, Xpred, kernel, hyperparameters, hess_inv, fixed_sigma_n=False, alpha=None, predictive_variance=0)[source]

Calculates the marginal variance to infer the next point in active learning.

The calculation follows Osborne (2012).

\(\tilde{V}\) … Marginal covariance matrix \(\hat{V}\) … Predictive variance \(\frac{dm}{d\theta}\) … Derivative of the predictive mean w.r.t. the hyperparameters \(H\) … Hessian matrix

\[ \begin{equation} \tilde{V} = \left( \frac{dm}{d\theta} \right) H^{-1} \left( \frac{dm}{d\theta} \right)^T \end{equation} \]

Parameters:
  • Xtrain (ndarray) – Input training points.

  • ytrain (ndarray) – Observed output daat.

  • Xpred (ndarray) – Possible prediction input points.

  • kernel (function) – Function to build the covariance matrix.

  • hyperparameters (dict) – Dictionary of the hyperparameters.

  • hess_inv (ndarray) – Inverse Hessian matrix.

  • fixed_sigma_n (bool) – If the noise \(\sigma_n\) was fixed during optimization. Then, the Hessian has to be padded with zeros.

  • alpha (ndarray) – If available, the result of \(K_y^{-1} y\), else None.

  • predictive_variance (ndarray) – Predictive variance only. This is added to the marginal variance.

Returns:

Sum of the actual marginal variance and the predictive variance.

Return type:

ndarray

profit.sur.gp.backend.gp_functions.predict_f(hyp, x, y, xtest, kernel, return_full_cov=False, neig=0)[source]

Predicts values given only a set of hyperparameters and a kernel.

This function is independent of the surrogates and used to quickly predict a function output with specific hyperparameters.

The calculation follows Rasmussen&Williams, p. 16, eq. 23-24.

Parameters:
  • hyp (ndarray) – Flat array of hyperparameters, sorted as [length_scale, sigma_f, sigma_n].

  • x (ndarray) – Input training points.

  • y (ndarray) – Observed output data.

  • xtest (ndarray) – Prediction points.

  • kernel (function) – Function to build the covariance matrix.

  • return_full_cov (bool) – If True, returns the full covariance matrix, otherwise only its diagonal.

  • neig (int) – Initial number of eigenvalues to be computed during the inversion of the covariance matrix.

Returns:

A tuple containing:
  • fstar (ndarray): Posterior mean.

  • vstar (ndarray): Posterior covariance matrix or its diagonal.

Return type:

tuple