catlearn.regression.gpfunctions

catlearn.regression.gpfunctions.covariance

Generation of covariance matrix.

catlearn.regression.gpfunctions.covariance.get_covariance(kernel_list, log_scale, matrix1, matrix2=None, regularization=None, eval_gradients=False)

Return the covariance matrix of training dataset.

Parameters:
  • kernel_list (dict of dicts) – A dict containing all dictionaries for the kernels.
  • log_scale – Flag to define if the hyperparameters are log scale.
  • train_matrix (list) – A list of the training fingerprint vectors.
  • test_matrix (list) – A list of the test fingerprint vectors.
  • regularization (None or float) – Smoothing parameter for the Gramm matrix.

catlearn.regression.gpfunctions.default_scale

Scale everything within regression functions.

class catlearn.regression.gpfunctions.default_scale.ScaleData(train_features, train_targets)

Bases: object

Class to perform default scaling in the regression functions.

Will standardize both the features and the targets. These can then be rescaled before being returned. The parameters can be accessed from the class with:

ScaleData.feature_data[‘mean’]

This can be accessed from the gp with:

gp = GaussianProcess(…) gp.scaling.feature_data[‘mean’]
rescale_targets(predictions)

Rescale predictions.

Parameters:predictions (list) – The predicted values from the GP.
Returns:p – The rescaled predictions.
Return type:array
test(test_features)

Scale the test features.

Parameters:test_features (array) – Feature matrix for the test data.
Returns:scaled_features – The scaled features for the test data.
Return type:array
train()

Scale the training features and targets.

Returns:
  • feature_data (array) – The scaled features for the training data.
  • target_data (array) – The scaled targets for the training data.

catlearn.regression.gpfunctions.hyperparameter_scaling

Utility to scale hyperparameters.

catlearn.regression.gpfunctions.hyperparameter_scaling.hyperparameters(scaling, kernel_list)

Scale the hyperparameters.

catlearn.regression.gpfunctions.hyperparameter_scaling.rescale_hyperparameters(scaling, kernel_list)

Rescale hyperparameters.

catlearn.regression.gpfunctions.io

Functions to read and write models to file.

catlearn.regression.gpfunctions.io.read(filename, ext='pkl')

Function to read a pickle of model object.

Parameters:
  • filename (str) – The name of the save file.
  • ext (str) – Format to save GP, can be pkl or hdf5. Default is pkl.
Returns:

model – Python GaussianProcess object.

Return type:

obj

catlearn.regression.gpfunctions.io.read_train_data(filename)

Function to read raw training data.

Parameters:filename (str) – The name of the save file.
Returns:
  • train_features (arr) – Arry of the training features.
  • train_targets (list) – A list of the training targets.
  • regularization (float) – The regularization parameter.
  • kernel_list (list) – The dictionary containing parameters for the kernels.
catlearn.regression.gpfunctions.io.write(filename, model, ext='pkl')

Function to write a pickle of model object.

Parameters:
  • filename (str) – The name of the save file.
  • model (obj) – Python GaussianProcess object.
  • ext (str) – Format to save GP, can be pkl or hdf5. Default is pkl.
catlearn.regression.gpfunctions.io.write_train_data(filename, train_features, train_targets, regularization, kernel_list)

Function to write raw training data.

Parameters:
  • filename (str) – The name of the save file.
  • train_features (arr) – Arry of the training features.
  • train_targets (list) – A list of the training targets.
  • regularization (float) – The regularization parameter.
  • kernel_list (dict) – The list containing dictionaries for the kernels.

catlearn.regression.gpfunctions.kernel_scaling

Function to scale kernel hyperparameters.

catlearn.regression.gpfunctions.kernel_scaling.kernel_scaling(scale_data, kernel_list, rescale)

Base hyperparameter scaling function.

Parameters:
  • scale_data (object) – Output from the default scaling function.
  • kernel_list (list) – Dictionary containing all dictionaries for the kernels.
  • rescale (boolean) – Flag for whether to scale or rescale the data.

catlearn.regression.gpfunctions.kernel_setup

Functions to prepare and return kernel data.

catlearn.regression.gpfunctions.kernel_setup.kdict2list(kdict, N_D=None)

Return ordered list of hyperparameters.

Assumes function is given a dictionary containing properties of a single kernel. The dictionary must contain either the key ‘hyperparameters’ or ‘theta’ containing a list of hyperparameters or the keys ‘type’ containing the type name in a string and ‘width’ in the case of a ‘gaussian’ or ‘laplacian’ type or the keys ‘degree’ and ‘slope’ in the case of a ‘quadratic’ type.

Parameters:
  • kdict (dict) – A kernel dictionary containing the keys ‘type’ and optional keys containing the hyperparameters of the kernel.
  • N_D (none or int) – The number of descriptors if not specified in the kernel dict, by the lenght of the lists of hyperparameters.
catlearn.regression.gpfunctions.kernel_setup.kdicts2list(kernel_list, N_D=None)

Return ordered list of hyperparameters given the kernel dictionary.

The kernel dictionary must contain one or more dictionaries, each specifying the type and hyperparameters.

Parameters:
  • kernel_list (dict) – A dictionary containing kernel dictionaries.
  • N_D (int) – The number of descriptors if not specified in the kernel dict, by the length of the lists of hyperparameters.
catlearn.regression.gpfunctions.kernel_setup.list2kdict(hyperparameters, kernel_list)

Return updated kernel dictionary with updated hyperparameters from list.

Assumed an ordered list of hyperparametersthe and the previous kernel dictionary. The kernel dictionary must contain a dictionary for each kernel type in the same order as their respective hyperparameters in the list hyperparameters.

Parameters:
  • hyperparameters (list) – All hyperparameters listed in the order they are specified in the kernel dictionary.
  • kernel_list (dict) – A dictionary containing kernel dictionaries.
catlearn.regression.gpfunctions.kernel_setup.prepare_kernels(kernel_list, regularization_bounds, eval_gradients, N_D)

Format kernel_listionary and stores bounds for optimization.

Parameters:
  • kernel_list (dict) – List containing all dictionaries for the kernels.
  • regularization_bounds (tuple) – Optional to change the bounds for the regularization.
  • eval_gradients (boolean) – Flag to change kernel setup based on gradients being defined.
  • N_D (int) – Number of dimensions of the original data.

catlearn.regression.gpfunctions.kernels

Contains kernel functions and gradients of kernels.

catlearn.regression.gpfunctions.kernels.AA_kernel(theta, log_scale, m1, m2=None, eval_gradients=False)

Generate the covariance between data with a Aichinson & Aitken kernel.

Parameters:
  • theta (list) – [l, n, c]
  • log_scale (boolean) – Scaling hyperparameters in the kernel can be useful for optimization.
  • m1 (list) – A list of the training fingerprint vectors.
  • m2 (list) – A list of the training fingerprint vectors.
Returns:

k – The covariance matrix.

Return type:

array

catlearn.regression.gpfunctions.kernels.constant_kernel(theta, log_scale, m1, m2=None, eval_gradients=False)

Return constant to add to the kernel.

Parameters:
  • theta (list) – A list of widths for each feature.
  • log_scale (boolean) – Scaling hyperparameters in the kernel can be useful for optimization.
  • eval_gradients (boolean) – Analytical gradients of the training features can be included.
  • m1 (list) – A list of the training fingerprint vectors.
  • m2 (list) – A list of the training fingerprint vectors.
Returns:

k – The covariance matrix.

Return type:

array

catlearn.regression.gpfunctions.kernels.constant_multi_kernel(theta, log_scale, m1, m2=None, eval_gradients=True)

Return constant to add to the kernel.

Parameters:
  • theta (list) – A list containing the constants.
  • log_scale (boolean) – Scaling hyperparameters in the kernel can be useful for optimization.
  • eval_gradients (boolean) – Analytical gradients of the training features can be included.
  • m1 (list) – A list of the training fingerprint vectors.
  • m2 (list) – A list of the training fingerprint vectors.
Returns:

k – The covariance matrix.

Return type:

array

catlearn.regression.gpfunctions.kernels.gaussian_dk_dwidth(k, m1, kwidth, log_scale=False)

Return gradient of the gaussian kernel with respect to the j’th width.

Parameters:
  • k (array) – n by n array. The (not scaled) gaussian kernel.
  • m1 (list) – A list of the training fingerprint vectors.
  • kwidth (float) – The full list of widths
  • log_scale (boolean) – Scaling hyperparameters in kernel can be useful for optimization.
catlearn.regression.gpfunctions.kernels.gaussian_kernel(theta, log_scale, m1, m2=None, eval_gradients=False)

Generate the covariance between data with a Gaussian kernel.

Parameters:
  • theta (list) – A list of widths for each feature.
  • log_scale (boolean) – Scaling hyperparameters in the kernel can be useful for optimization.
  • eval_gradients (boolean) – Analytical gradients of the training features can be included.
  • m1 (list) – A list of the training fingerprint vectors.
  • m2 (list) – A list of the training fingerprint vectors.
Returns:

k – The covariance matrix.

Return type:

array

catlearn.regression.gpfunctions.kernels.gaussian_xx_gradients(m1, kwidth, k)

Gradient for k(x, x).

Parameters:
  • m1 (array) – Feature matrix.
  • kwidth (list) – List of lengthscales for the gaussian kernel.
  • k (array) – Upper left portion of the overall covariance matrix.
catlearn.regression.gpfunctions.kernels.gaussian_xxp_gradients(m1, m2, kwidth, k)

Gradient for k(x, x’).

Parameters:
  • m1 (array) – Feature matrix.
  • m2 (array) – Feature matrix typically associated with the test data.
  • kwidth (list) – List of lengthscales for the gaussian kernel.
  • k (array) – Upper left portion of the overall covariance matrix.
catlearn.regression.gpfunctions.kernels.laplacian_dk_dwidth(k, m1, kwidth, log_scale=False)
catlearn.regression.gpfunctions.kernels.laplacian_kernel(theta, log_scale, m1, m2=None, eval_gradients=False)

Generate the covariance between data with a laplacian kernel.

Parameters:
  • theta (list) – A list of widths for each feature.
  • log_scale (boolean) – Scaling hyperparameters in the kernel can be useful for optimization.
  • m1 (list) – A list of the training fingerprint vectors.
  • m2 (list or None) – A list of the training fingerprint vectors.
Returns:

k – The covariance matrix.

Return type:

array

catlearn.regression.gpfunctions.kernels.linear_kernel(theta, log_scale, m1, m2=None, eval_gradients=False)

Generate the covariance between data with a linear kernel.

Parameters:
  • theta (list) – A list containing constant offset.
  • log_scale (boolean) – Scaling hyperparameters in the kernel can be useful for optimization.
  • eval_gradients (boolean) – Analytical gradients of the training features can be included.
  • m1 (list) – A list of the training fingerprint vectors.
  • m2 (list or None) – A list of the training fingerprint vectors.
Returns:

k – The covariance matrix.

Return type:

array

catlearn.regression.gpfunctions.kernels.noise_multi_kernel(theta, log_scale, m1, m2=None, eval_gradients=False)

Return constant to add to the kernel.

Parameters:
  • theta (list) – A list containing the constants to be added in the diagonal of the covariance matrix .
  • eval_gradients (boolean) – Analytical gradients of the training features can be included.
  • m1 (list) – A list of the training fingerprint vectors.
  • m2 (list) – A list of the training fingerprint vectors.
Returns:

k – The covariance matrix.

Return type:

array

catlearn.regression.gpfunctions.kernels.quadratic_dk_ddegree(k, m1, degree, log_scale=False)
catlearn.regression.gpfunctions.kernels.quadratic_dk_dslope(k, m1, slope, log_scale=False)
catlearn.regression.gpfunctions.kernels.quadratic_kernel(theta, log_scale, m1, m2=None, eval_gradients=False)

Generate the covariance between data with a quadratic kernel.

Parameters:
  • theta (list) – A list containing slope and degree for quadratic.
  • log_scale (boolean) – Scaling hyperparameters in the kernel can be useful for optimization.
  • m1 (list) – A list of the training fingerprint vectors.
  • m2 (list or None) – A list of the training fingerprint vectors.
Returns:

k – The covariance matrix.

Return type:

array

catlearn.regression.gpfunctions.kernels.scaled_sqe_kernel(theta, log_scale, m1, m2=None, eval_gradients=False)

Generate the covariance between data with a Gaussian kernel.

Parameters:
  • theta (list) – A list of hyperparameters.
  • log_scale (boolean) – Scaling hyperparameters in the kernel can be useful for optimization.
  • m1 (list) – A list of the training fingerprint vectors.
  • m2 (list) – A list of the training fingerprint vectors.
Returns:

k – The covariance matrix.

Return type:

array

catlearn.regression.gpfunctions.kernels.sqe_kernel(theta, log_scale, m1, m2=None, eval_gradients=False)

Generate the covariance between data with a Gaussian kernel.

Parameters:
  • theta (list) – A list of widths for each feature.
  • log_scale (boolean) – Scaling hyperparameters in the kernel can be useful for optimization.
  • m1 (list) – A list of the training fingerprint vectors.
  • m2 (list) – A list of the training fingerprint vectors.
Returns:

k – The covariance matrix.

Return type:

array

catlearn.regression.gpfunctions.log_marginal_likelihood

Log marginal likelihood calculator function.

catlearn.regression.gpfunctions.log_marginal_likelihood.dK_dtheta_j(theta, train_matrix, kernel_list, Q)

Return the jacobian of the log marginal likelyhood.

This is calculated with respect to the hyperparameters, as in: Equation 5.9 in C. E. Rasmussen and C. K. I. Williams, 2006

Parameters:
  • theta (list) – A list containing the hyperparameters.
  • train_matrix (list) – A list of the test fingerprint vectors.
  • kernel_list (list) – A list of kernel dictionaries.
  • Q (array.) –
catlearn.regression.gpfunctions.log_marginal_likelihood.log_marginal_likelihood(theta, train_matrix, targets, kernel_list, scale_optimizer, eval_gradients, cinv=None, eval_jac=False)

Return the negative of the log marginal likelyhood.

Equation 5.8 in C. E. Rasmussen and C. K. I. Williams, 2006

Parameters:
  • theta (list) – A list containing the hyperparameters.
  • train_matrix (list) – A list of the test fingerprint vectors.
  • targets (list) – A list of target values.
  • kernel_list (dict) – A list of kernel dictionaries.
  • scale_optimizer (boolean) – Flag to define if the hyperparameters are log scale for optimization.
  • eval_gradients (boolean) – Flag to specify whether to compute gradients in covariance.
  • cinv (array) – Pre-computed inverted covariance matrix.
  • eval_jac (boolean) – Flag to specify whether to calculate gradients for hyperparameter optimization.

catlearn.regression.gpfunctions.sensitivity

Function performing GP sensitivity analysis.

class catlearn.regression.gpfunctions.sensitivity.SensitivityAnalysis(train_matrix, train_targets, test_matrix, kernel_list, init_reg=0.001, init_width=10.0)

Bases: object

Perform sensitivity analysis to estimate important features.

backward_selection(predict=False, test_targets=None, selection=None)

Feature selection with backward elimination.

Parameters:
  • predict (boolean) – Specify whether to make predictions on test data.
  • test_targets (list) – A list of test targets to calculate errors, if known.
  • selection (int, list) – Specify the number or range of features to consider.

catlearn.regression.gpfunctions.uncertainty

Function performing uncertainty analysis.

catlearn.regression.gpfunctions.uncertainty.get_uncertainty(kernel_list, test_fp, ktb, cinv, log_scale)

Function to calculate uncertainty.

Parameters:
  • kernel_list (list) – List containing all dictionaries for the kernels.
  • test_fp (array) – Test feature set.
  • ktb (array) – Covariance matrix for test and training data.
  • cinv (array) – Covariance matrix for training dataset.
  • log_scale (boolean) – Flag to define if the hyperparameters are log scale.
Returns:

uncertainty – The uncertainty on each prediction in the test data. By default, this includes a measure of the noise on the data.

Return type:

list