catlearn.active_learning package

Submodules

catlearn.active_learning.acquisition_functions module

GP acquisition functions.

catlearn.active_learning.acquisition_functions.EI(y_best, predictions, uncertainty, objective='max')

Return expected improvement acq. function.

Parameters:
  • y_best (float) – Condition
  • predictions (list) – Predicted means.
  • uncertainty (list) – Uncertainties associated with the predictions.
catlearn.active_learning.acquisition_functions.PI(y_best, predictions, uncertainty, objective)

Probability of improvement acq. function.

Parameters:
  • y_best (float) – Condition
  • predictions (list) – Predicted means.
  • uncertainty (list) – Uncertainties associated with the predictions.
catlearn.active_learning.acquisition_functions.UCB(predictions, uncertainty, objective='max', kappa=1.5)

Upper-confidence bound acq. function.

Parameters:
  • predictions (list) – Predicted means.
  • uncertainty (list) – Uncertainties associated with the predictions.
  • kappa (float) – Constant that controls the explotation/exploration ratio in UCB.
catlearn.active_learning.acquisition_functions.classify(classifier, train_atoms, test_atoms, targets, predictions, uncertainty, train_features=None, test_features=None, objective='max', k_means=3, kappa=1.5, metrics=['optimistic', 'UCB', 'EI', 'PI'])

Classify ranked predictions based on acquisition function.

Parameters:
  • classifier (func) – User defined function to classify an atoms object.
  • train_atoms (list) – List of atoms objects from training data upon which to base classification.
  • test_atoms (list) – List of atoms objects from test data upon which to base classification.
  • targets (list) – List of known target values.
  • predictions (list) – List of predictions from the GP.
  • uncertainty (list) – List of variance on the GP predictions.
  • train_features (array) – Feature matrix for the training data.
  • test_features (array) – Feature matrix for the test data.
  • k_means (int) – Number of cluster to generate with clustering.
  • kappa (float) – Constant that controls the explotation/exploration ratio in UCB.
  • metrics (list) – list of strings. Accepted values are ‘cdf’, ‘UCB’, ‘EI’, ‘PI’, ‘optimistic’ and ‘pdf’.
Returns:

res – A dictionary of lists containg the fitness of each test point for the different acquisition functions.

Return type:

dict

catlearn.active_learning.acquisition_functions.cluster(train_features, targets, test_features, predictions, k_means=3)

Penalize test points that are too clustered.

Parameters:
  • train_features (array) – Feature matrix for the training data.
  • targets (list) – Training targets.
  • test_features (array) – Feature matrix for the test data.
  • predictions (list) – Predicted means.
  • k_means (int) – Number of clusters.
catlearn.active_learning.acquisition_functions.optimistic(y_best, predictions, uncertainty)

Find predictions that will optimistically lead to progress.

Parameters:
  • y_best (float) – Condition
  • predictions (list) – Predicted means.
  • uncertainty (list) – Uncertainties associated with the predictions.
catlearn.active_learning.acquisition_functions.optimistic_proximity(y_best, predictions, uncertainty)

Return uncertainties minus distances to y_best.

Parameters:
  • y_best (float) – Condition
  • predictions (list) – Predicted means.
  • uncertainty (list) – Uncertainties associated with the predictions.
catlearn.active_learning.acquisition_functions.probability_density(y_best, predictions, uncertainty)

Return probability densities at y_best.

Parameters:
  • y_best (float) – Condition
  • predictions (list) – Predicted means.
  • uncertainty (list) – Uncertainties associated with the predictions.
catlearn.active_learning.acquisition_functions.proximity(y_best, predictions, uncertainty=None)

Return negative distances to y_best.

Parameters:
  • y_best (float) – Condition
  • predictions (list) – Predicted means.
  • uncertainty (list) – Uncertainties associated with the predictions.
catlearn.active_learning.acquisition_functions.random_acquisition(y_best, predictions, uncertainty=None)

Return random numbers for control experiments.

Parameters:
  • y_best (float) – Condition
  • predictions (list) – Predicted means.
  • uncertainty (list) – Uncertainties associated with the predictions.
catlearn.active_learning.acquisition_functions.rank(targets, predictions, uncertainty, train_features=None, test_features=None, objective='max', k_means=3, kappa=1.5, metrics=['optimistic', 'UCB', 'EI', 'PI'])

Rank predictions based on acquisition function.

Parameters:
  • targets (list) – List of known target values.
  • predictions (list) – List of predictions from the GP.
  • uncertainty (list) – List of variance on the GP predictions.
  • train_features (array) – Feature matrix for the training data.
  • test_features (array) – Feature matrix for the test data.
  • k_means (int) – Number of cluster to generate with clustering.
  • kappa (float) – Constant that controls the explotation/exploration ratio in UCB.
  • metrics (list) – list of strings. Accepted values are ‘cdf’, ‘UCB’, ‘EI’, ‘PI’, ‘optimistic’ and ‘pdf’.
Returns:

res – A dictionary of lists containg the fitness of each test point for the different acquisition functions.

Return type:

dict

catlearn.active_learning.algorithm module

Class to automate building a surrogate model.

class catlearn.active_learning.algorithm.ActiveLearning(surrogate_model, train_data, target)

Bases: object

Active learning class, intended for screening or optimizing in a predefined and finite search space.

acquire(unlabeled_data, batch_size=1)

Return indices of datapoints to acquire, from a predefined, finite search space.

Parameters:
  • unlabeled_data (array) – Data matrix representing an unlabeled search space.
  • initial_subset (list) – Row indices of data to train on in the first iteration.
  • batch_size (int) – Number of training points to acquire (move from test to training) in every iteration.
Returns:

  • to_acquire (list) – Row indices of unlabeled data to acquire.
  • score – User defined output from predict.

ensemble_test(size, initial_subset=None, batch_size=1, n_max=None, seed_list=None, nprocs=None)

Return a 3d array of test results for a surrogate model. The third dimension expands the ensemble of tests.

Parameters:
  • size (int) – How many tests to run.
  • initial_subset (list) – Row indices of data to train on in the first iteration.
  • batch_size (int) – Number of training points to acquire (move from test to training) in every iteration.
  • n_max (int) – Max number of training points to test.
  • seed_list (list) – List of integer seeds for shuffling training data.
  • nprocs (int) – Number of processors for parallelization
Returns:

ensemble – size by iterations by number of metrics array of test results.

Return type:

array

test_acquisition(initial_subset=None, batch_size=1, n_max=None, seed=None)

Return an array of test results for a surrogate model.

Parameters:
  • initial_subset (list) – Row indices of data to train on in the first iteration.
  • batch_size (int) – Number of training points to acquire (move from test to training) in every iteration.
  • n_max (int) – Max number of training points to test.

Module contents