catlearn.active_learning package¶
Submodules¶
catlearn.active_learning.acquisition_functions module¶
GP acquisition functions.

catlearn.active_learning.acquisition_functions.
EI
(y_best, predictions, uncertainty, objective='max')¶ Return expected improvement acq. function.
Parameters:  y_best (float) – Condition
 predictions (list) – Predicted means.
 uncertainty (list) – Uncertainties associated with the predictions.

catlearn.active_learning.acquisition_functions.
PI
(y_best, predictions, uncertainty, objective)¶ Probability of improvement acq. function.
Parameters:  y_best (float) – Condition
 predictions (list) – Predicted means.
 uncertainty (list) – Uncertainties associated with the predictions.

catlearn.active_learning.acquisition_functions.
UCB
(predictions, uncertainty, objective='max', kappa=1.5)¶ Upperconfidence bound acq. function.
Parameters:  predictions (list) – Predicted means.
 uncertainty (list) – Uncertainties associated with the predictions.
 kappa (float) – Constant that controls the explotation/exploration ratio in UCB.

catlearn.active_learning.acquisition_functions.
classify
(classifier, train_atoms, test_atoms, targets, predictions, uncertainty, train_features=None, test_features=None, objective='max', k_means=3, kappa=1.5, metrics=['optimistic', 'UCB', 'EI', 'PI'])¶ Classify ranked predictions based on acquisition function.
Parameters:  classifier (func) – User defined function to classify an atoms object.
 train_atoms (list) – List of atoms objects from training data upon which to base classification.
 test_atoms (list) – List of atoms objects from test data upon which to base classification.
 targets (list) – List of known target values.
 predictions (list) – List of predictions from the GP.
 uncertainty (list) – List of variance on the GP predictions.
 train_features (array) – Feature matrix for the training data.
 test_features (array) – Feature matrix for the test data.
 k_means (int) – Number of cluster to generate with clustering.
 kappa (float) – Constant that controls the explotation/exploration ratio in UCB.
 metrics (list) – list of strings. Accepted values are ‘cdf’, ‘UCB’, ‘EI’, ‘PI’, ‘optimistic’ and ‘pdf’.
Returns: res – A dictionary of lists containg the fitness of each test point for the different acquisition functions.
Return type: dict

catlearn.active_learning.acquisition_functions.
cluster
(train_features, targets, test_features, predictions, k_means=3)¶ Penalize test points that are too clustered.
Parameters:  train_features (array) – Feature matrix for the training data.
 targets (list) – Training targets.
 test_features (array) – Feature matrix for the test data.
 predictions (list) – Predicted means.
 k_means (int) – Number of clusters.

catlearn.active_learning.acquisition_functions.
optimistic
(y_best, predictions, uncertainty)¶ Find predictions that will optimistically lead to progress.
Parameters:  y_best (float) – Condition
 predictions (list) – Predicted means.
 uncertainty (list) – Uncertainties associated with the predictions.

catlearn.active_learning.acquisition_functions.
optimistic_proximity
(y_best, predictions, uncertainty)¶ Return uncertainties minus distances to y_best.
Parameters:  y_best (float) – Condition
 predictions (list) – Predicted means.
 uncertainty (list) – Uncertainties associated with the predictions.

catlearn.active_learning.acquisition_functions.
probability_density
(y_best, predictions, uncertainty)¶ Return probability densities at y_best.
Parameters:  y_best (float) – Condition
 predictions (list) – Predicted means.
 uncertainty (list) – Uncertainties associated with the predictions.

catlearn.active_learning.acquisition_functions.
proximity
(y_best, predictions, uncertainty=None)¶ Return negative distances to y_best.
Parameters:  y_best (float) – Condition
 predictions (list) – Predicted means.
 uncertainty (list) – Uncertainties associated with the predictions.

catlearn.active_learning.acquisition_functions.
random_acquisition
(y_best, predictions, uncertainty=None)¶ Return random numbers for control experiments.
Parameters:  y_best (float) – Condition
 predictions (list) – Predicted means.
 uncertainty (list) – Uncertainties associated with the predictions.

catlearn.active_learning.acquisition_functions.
rank
(targets, predictions, uncertainty, train_features=None, test_features=None, objective='max', k_means=3, kappa=1.5, metrics=['optimistic', 'UCB', 'EI', 'PI'])¶ Rank predictions based on acquisition function.
Parameters:  targets (list) – List of known target values.
 predictions (list) – List of predictions from the GP.
 uncertainty (list) – List of variance on the GP predictions.
 train_features (array) – Feature matrix for the training data.
 test_features (array) – Feature matrix for the test data.
 k_means (int) – Number of cluster to generate with clustering.
 kappa (float) – Constant that controls the explotation/exploration ratio in UCB.
 metrics (list) – list of strings. Accepted values are ‘cdf’, ‘UCB’, ‘EI’, ‘PI’, ‘optimistic’ and ‘pdf’.
Returns: res – A dictionary of lists containg the fitness of each test point for the different acquisition functions.
Return type: dict
catlearn.active_learning.algorithm module¶
Class to automate building a surrogate model.

class
catlearn.active_learning.algorithm.
ActiveLearning
(surrogate_model, train_data, target)¶ Bases:
object
Active learning class, intended for screening or optimizing in a predefined and finite search space.

acquire
(unlabeled_data, batch_size=1)¶ Return indices of datapoints to acquire, from a predefined, finite search space.
Parameters:  unlabeled_data (array) – Data matrix representing an unlabeled search space.
 initial_subset (list) – Row indices of data to train on in the first iteration.
 batch_size (int) – Number of training points to acquire (move from test to training) in every iteration.
Returns:  to_acquire (list) – Row indices of unlabeled data to acquire.
 score – User defined output from predict.

ensemble_test
(size, initial_subset=None, batch_size=1, n_max=None, seed_list=None, nprocs=None)¶ Return a 3d array of test results for a surrogate model. The third dimension expands the ensemble of tests.
Parameters:  size (int) – How many tests to run.
 initial_subset (list) – Row indices of data to train on in the first iteration.
 batch_size (int) – Number of training points to acquire (move from test to training) in every iteration.
 n_max (int) – Max number of training points to test.
 seed_list (list) – List of integer seeds for shuffling training data.
 nprocs (int) – Number of processors for parallelization
Returns: ensemble – size by iterations by number of metrics array of test results.
Return type: array

test_acquisition
(initial_subset=None, batch_size=1, n_max=None, seed=None)¶ Return an array of test results for a surrogate model.
Parameters:  initial_subset (list) – Row indices of data to train on in the first iteration.
 batch_size (int) – Number of training points to acquire (move from test to training) in every iteration.
 n_max (int) – Max number of training points to test.
