catlearn.active_learning package¶
Submodules¶
catlearn.active_learning.acquisition_functions module¶
GP acquisition functions.
-
catlearn.active_learning.acquisition_functions.
EI
(y_best, predictions, uncertainty, objective='max')¶ Return expected improvement acq. function.
Parameters: - y_best (float) – Condition
- predictions (list) – Predicted means.
- uncertainty (list) – Uncertainties associated with the predictions.
-
catlearn.active_learning.acquisition_functions.
PI
(y_best, predictions, uncertainty, objective)¶ Probability of improvement acq. function.
Parameters: - y_best (float) – Condition
- predictions (list) – Predicted means.
- uncertainty (list) – Uncertainties associated with the predictions.
-
catlearn.active_learning.acquisition_functions.
UCB
(predictions, uncertainty, objective='max', kappa=1.5)¶ Upper-confidence bound acq. function.
Parameters: - predictions (list) – Predicted means.
- uncertainty (list) – Uncertainties associated with the predictions.
- kappa (float) – Constant that controls the explotation/exploration ratio in UCB.
-
catlearn.active_learning.acquisition_functions.
classify
(classifier, train_atoms, test_atoms, targets, predictions, uncertainty, train_features=None, test_features=None, objective='max', k_means=3, kappa=1.5, metrics=['optimistic', 'UCB', 'EI', 'PI'])¶ Classify ranked predictions based on acquisition function.
Parameters: - classifier (func) – User defined function to classify an atoms object.
- train_atoms (list) – List of atoms objects from training data upon which to base classification.
- test_atoms (list) – List of atoms objects from test data upon which to base classification.
- targets (list) – List of known target values.
- predictions (list) – List of predictions from the GP.
- uncertainty (list) – List of variance on the GP predictions.
- train_features (array) – Feature matrix for the training data.
- test_features (array) – Feature matrix for the test data.
- k_means (int) – Number of cluster to generate with clustering.
- kappa (float) – Constant that controls the explotation/exploration ratio in UCB.
- metrics (list) – list of strings. Accepted values are ‘cdf’, ‘UCB’, ‘EI’, ‘PI’, ‘optimistic’ and ‘pdf’.
Returns: res – A dictionary of lists containg the fitness of each test point for the different acquisition functions.
Return type: dict
-
catlearn.active_learning.acquisition_functions.
cluster
(train_features, targets, test_features, predictions, k_means=3)¶ Penalize test points that are too clustered.
Parameters: - train_features (array) – Feature matrix for the training data.
- targets (list) – Training targets.
- test_features (array) – Feature matrix for the test data.
- predictions (list) – Predicted means.
- k_means (int) – Number of clusters.
-
catlearn.active_learning.acquisition_functions.
optimistic
(y_best, predictions, uncertainty)¶ Find predictions that will optimistically lead to progress.
Parameters: - y_best (float) – Condition
- predictions (list) – Predicted means.
- uncertainty (list) – Uncertainties associated with the predictions.
-
catlearn.active_learning.acquisition_functions.
optimistic_proximity
(y_best, predictions, uncertainty)¶ Return uncertainties minus distances to y_best.
Parameters: - y_best (float) – Condition
- predictions (list) – Predicted means.
- uncertainty (list) – Uncertainties associated with the predictions.
-
catlearn.active_learning.acquisition_functions.
probability_density
(y_best, predictions, uncertainty)¶ Return probability densities at y_best.
Parameters: - y_best (float) – Condition
- predictions (list) – Predicted means.
- uncertainty (list) – Uncertainties associated with the predictions.
-
catlearn.active_learning.acquisition_functions.
proximity
(y_best, predictions, uncertainty=None)¶ Return negative distances to y_best.
Parameters: - y_best (float) – Condition
- predictions (list) – Predicted means.
- uncertainty (list) – Uncertainties associated with the predictions.
-
catlearn.active_learning.acquisition_functions.
random_acquisition
(y_best, predictions, uncertainty=None)¶ Return random numbers for control experiments.
Parameters: - y_best (float) – Condition
- predictions (list) – Predicted means.
- uncertainty (list) – Uncertainties associated with the predictions.
-
catlearn.active_learning.acquisition_functions.
rank
(targets, predictions, uncertainty, train_features=None, test_features=None, objective='max', k_means=3, kappa=1.5, metrics=['optimistic', 'UCB', 'EI', 'PI'])¶ Rank predictions based on acquisition function.
Parameters: - targets (list) – List of known target values.
- predictions (list) – List of predictions from the GP.
- uncertainty (list) – List of variance on the GP predictions.
- train_features (array) – Feature matrix for the training data.
- test_features (array) – Feature matrix for the test data.
- k_means (int) – Number of cluster to generate with clustering.
- kappa (float) – Constant that controls the explotation/exploration ratio in UCB.
- metrics (list) – list of strings. Accepted values are ‘cdf’, ‘UCB’, ‘EI’, ‘PI’, ‘optimistic’ and ‘pdf’.
Returns: res – A dictionary of lists containg the fitness of each test point for the different acquisition functions.
Return type: dict
catlearn.active_learning.algorithm module¶
Class to automate building a surrogate model.
-
class
catlearn.active_learning.algorithm.
ActiveLearning
(surrogate_model, train_data, target)¶ Bases:
object
Active learning class, intended for screening or optimizing in a predefined and finite search space.
-
acquire
(unlabeled_data, batch_size=1)¶ Return indices of datapoints to acquire, from a predefined, finite search space.
Parameters: - unlabeled_data (array) – Data matrix representing an unlabeled search space.
- initial_subset (list) – Row indices of data to train on in the first iteration.
- batch_size (int) – Number of training points to acquire (move from test to training) in every iteration.
Returns: - to_acquire (list) – Row indices of unlabeled data to acquire.
- score – User defined output from predict.
-
ensemble_test
(size, initial_subset=None, batch_size=1, n_max=None, seed_list=None, nprocs=None)¶ Return a 3d array of test results for a surrogate model. The third dimension expands the ensemble of tests.
Parameters: - size (int) – How many tests to run.
- initial_subset (list) – Row indices of data to train on in the first iteration.
- batch_size (int) – Number of training points to acquire (move from test to training) in every iteration.
- n_max (int) – Max number of training points to test.
- seed_list (list) – List of integer seeds for shuffling training data.
- nprocs (int) – Number of processors for parallelization
Returns: ensemble – size by iterations by number of metrics array of test results.
Return type: array
-
test_acquisition
(initial_subset=None, batch_size=1, n_max=None, seed=None)¶ Return an array of test results for a surrogate model.
Parameters: - initial_subset (list) – Row indices of data to train on in the first iteration.
- batch_size (int) – Number of training points to acquire (move from test to training) in every iteration.
- n_max (int) – Max number of training points to test.
-