catlearn.featurize package

Submodules

catlearn.featurize.adsorbate_prep module

This function constructs a dictionary with abinitio_energies.

Input:
fname (str) path/filename of ase.db file selection (list) ase.db selection
catlearn.featurize.adsorbate_prep.ads_index(atoms)

Returns a list of indices of atoms belonging to the adsorbate. These are defined as atoms that are not belonging to the slab.

Parameters:atoms (ase atoms object) –

The atoms object must have the key ‘ads_atoms’ in atoms.subsets:

  • ’slab_atoms’ : list
    indices of atoms belonging to the adsorbate
catlearn.featurize.adsorbate_prep.attach_cations(atoms, anion_number=8)

Attaches list of cation and anion atomic indices.

Parameters:
  • atoms (object) – ase.Atoms object.
  • anion_number (int) – Atomic number of the anion of this chalcogenide.
catlearn.featurize.adsorbate_prep.auto_layers(atoms, miller=(0, 0, 1))

Returns two arrays describing which layer each atom belongs to and the distance between the layers and origo. Assumes the tolerance corresponds to the average atomic radii of the slab.

Parameters:atoms (object) –

The atoms object must have the following keys in atoms.subsets:

’slab_atoms’ : list
indices of atoms belonging to the slab
catlearn.featurize.adsorbate_prep.autogen_info(images)

Return a list of atoms objects with atomic group information attached to atoms.subsets. This information is needed by some functions in the AdsorbateFingerprintGenerator.

Parameters:images (list) – list of atoms objects representing adsorbates on slabs. No further information is required in atoms.subsets.
catlearn.featurize.adsorbate_prep.catalysis_hub_to_info(images)
catlearn.featurize.adsorbate_prep.check_reconstructions(image_pairs)

Return a list of database ids, for adsorbate/slab structures, which has a reconstructed slab with respect to the reference slab.

Parameters:image_pairs (list) – List of tuples containing pairs of ASE atoms objects. The first element in each tuple must represent an adsorbate*slab structure and the second element must represent a slab.
catlearn.featurize.adsorbate_prep.compare_slab_connectivity(atoms, reference_atoms)

Return a boolean for whether an adsorbate has caused a slab to reconstruct and change it’s connectivity.

Parameters:
  • atoms (object) – ASE atoms object with connectivity and ‘slab_atoms’ subsets attached. This represents an adsorbate*slab structure.
  • reference_atoms (object) – ASE atoms object with connectivity and ‘slab_atoms’ subsets attached. This represents a slab structure.
Returns:

identical – Are the connectivities within the slabs identical or not.

Return type:

boolean

catlearn.featurize.adsorbate_prep.connectivity2ads_index(atoms, species)

Return the indexes of atoms from the global list of adsorbate symbols.

Parameters:
  • atoms (object) – ASE atoms object with connectivity attached. This represents an adsorbate*slab structure.
  • species (str) – chemical formula of the adsorbate.
catlearn.featurize.adsorbate_prep.connectivity_termination(atoms)

Return lists bulk, term and subsurf containing atom indices belonging to those subsets of a surface atoms object. This function relies on the connectivity of the atoms.

Parameters:atoms (object) –

atoms.connectivity should be a connectivity matrix. The atoms object must have the following keys in atoms.subsets:

’slab_atoms’ : list
indices of atoms belonging to the slab
catlearn.featurize.adsorbate_prep.constraints_termination(atoms)

Return lists bulk, term and subsurf containing atom indices belonging to those subsets of a surface atoms object. This function relies on the connectivity of the atoms and it assumes that bulk atoms are those that have are constrained in the first constraint.

Parameters:atoms (object) –

atoms.connectivity should be a connectivity matrix. The atoms object must have the following keys in atoms.subsets:

’slab_atoms’ : list
indices of atoms belonging to the slab.
catlearn.featurize.adsorbate_prep.detect_adsorbate(atoms)

Return a list of indices of atoms belonging to an adsorbate.

Parameters:atoms (object) – An ase atoms object.
catlearn.featurize.adsorbate_prep.detect_termination(atoms)

Returns three lists, the first containing indices of bulk atoms and the second containing indices of atoms in the second outermost layer, and the last denotes atoms in the outermost layer or termination or the slab.

Parameters:atoms (object.) –

The atoms object must have the following keys in atoms.subsets:

’slab_atoms’ : list
indices of atoms belonging to the slab
catlearn.featurize.adsorbate_prep.formula2ads_index(atoms, species)

Return the indexes of atoms, which have symbols matching the chemical formula of the adsorbate. This function will not work for adsorbates containing the same elements as the slab.

Parameters:
  • atoms (ase atoms object.) – atoms.info must be a dictionary containing the key ‘key_value_pairs’, which is expected to contain CatMAP standard adsorbate structure key value pairs. See the ase db to catmap module in catmap. the key value pair ‘species’ must be the chemical formula of the adsorbate.
  • species (str) – chemical formula of the adsorbate.
catlearn.featurize.adsorbate_prep.info2primary_index(atoms)

Returns lists identifying the nearest neighbors of the adsorbate atoms.

Parameters:atoms (ase atoms object.) –

The atoms object must have the following keys in atoms.subsets:

’ads_atoms’ : list
indices of atoms belonging to the adsorbate
’slab_atoms’ : list
indices of atoms belonging to the slab
catlearn.featurize.adsorbate_prep.last2ads_index(atoms, species)

Return the indexes of the last n atoms in the atoms object, where n is the length of the composition of the adsorbate species. This function will work on atoms objects, where the slab was set up first, and the adsorbate was added after.

Parameters:
  • atoms (ase atoms object.) – atoms.info must be a dictionary containing the key ‘key_value_pairs’, which is expected to contain CATMAP standard adsorbate structure key value pairs. See the ase db to catmap module in catmap. the key value pair ‘species’ must be the chemical formula of the adsorbate.
  • species (str) – chemical formula of the adsorbate.
catlearn.featurize.adsorbate_prep.layers2ads_index(atoms, species)

Returns the indexes of atoms in layers exceeding the number of layers stored in the key value pair ‘layers’.

Parameters:
  • atoms (ase atoms object.) – atoms.info must be a dictionary containing the key ‘key_value_pairs’, which is expected to contain CatMAP standard adsorbate structure key value pairs. See the ase db to catmap module in catmap. the key value pair ‘species’ must be the chemical formula of the adsorbate and ‘layers’ must be an integer.
  • species (str) – chemical formula of the adsorbate.
catlearn.featurize.adsorbate_prep.layers_termination(atoms, miller=(0, 0, 1))

Return lists bulk, term and subsurf containing atom indices belonging to those subsets of a surface atoms object. This function relies on ase.atoms.get_layers, default atomic radii, and a slab oriented in the xy plane, where the termination in the z+ direction is the surface.

Parameters:atoms (object) –

The atoms object must have the following keys in atoms.subsets:

’slab_atoms’ : list
indices of atoms belonging to the slab.
catlearn.featurize.adsorbate_prep.slab_index(atoms)

Returns a list of indices of atoms belonging to the slab. These are defined as atoms that are not belonging to the adsorbate.

Parameters:atoms (ase atoms object) –

The atoms object must have the key ‘ads_atoms’ in atoms.subsets:

  • ’ads_atoms’ : list
    indices of atoms belonging to the adsorbate
catlearn.featurize.adsorbate_prep.slab_positions2ads_index(atoms, slab, species)

Return the indexes of adsorbate atoms identified by comparing positions to a reference slab structure.

Parameters:atoms (object) –
catlearn.featurize.adsorbate_prep.sym2ads_index(atoms, ads_syms)

Return the indexes of atoms from the global list of adsorbate symbols.

Parameters:atoms (object) – An ase atoms object.
catlearn.featurize.adsorbate_prep.tags2ads_index(atoms)

Return the indexes of atoms from the global list of adsorbate symbols.

Parameters:atoms (object) – An ase atoms object. atoms.tags must label adsorbate atoms with 0 or negative numbers.
catlearn.featurize.adsorbate_prep.tags_termination(atoms)

Return lists bulk, term and subsurf containing atom indices belonging to those subsets of a surface atoms object. CatKit and ase.build contain functions that by default store this information in tags.

Parameters:atoms (object) – the termination atoms should have tag=1 and subsequent layers should be tagged in increasing order.
catlearn.featurize.adsorbate_prep.termination_info(images)

Return a list of atoms objects with attached information about the slab termination, the slab second outermost layer and the bulk slab compositions.

Parameters:images (list) –

list of atoms objects representing adsorbates on slabs. The atoms objects must have the following keys in atoms.subsets:

  • ’ads_atoms’ : list
    indices of atoms belonging to the adsorbate
  • ’slab_atoms’ : list
    indices of atoms belonging to the slab
catlearn.featurize.adsorbate_prep.z2ads_index(atoms, species)

Returns the indexes of the n atoms with the highest position in the z direction, where n is the number of atoms in the chemical formula from the species key value pair.

Parameters:
  • atoms (ase atoms object.) – atoms.info must be a dictionary containing the key ‘key_value_pairs’, which is expected to contain CatMAP standard adsorbate structure key value pairs. See the ase db to catmap module in catmap. the key value pair ‘species’.
  • species (str) – chemical formula of the adsorbate.

catlearn.featurize.asap_wrapper module

catlearn.featurize.base module

Base class for the feature generators.

This is inherited by the other fingerprint generators and allows access to a number of useful and commonly used functions. Standard functionality that is implemented and applicable to more than one of the other classes should be put here.

class catlearn.featurize.base.BaseGenerator(**kwargs)

Bases: object

Base class for feature generation.

get_all_distances(candidate)

Function to return the atomic distances.

Parameters:candidate (object) – Target data object from which to get the atomic distances.
get_atomic_numbers(candidate)

Function to return the atomic numbers.

Parameters:candidate (object) – Target data object from which to get the atomic numbers.
get_masses(candidate)

Function to return the atomic masses.

Parameters:candidate (object) – Target data object from which to get the atomic masses.
get_neighborlist(candidate)

Function to return the neighborlist.

It will check to see if the neighbor list is stored in the data object. If not it will generate the neighborlist from scratch.

Parameters:candidate (object) – Target data object from which to get the neighbor list.
get_positions(candidate)

Function to return the atomic coordinates.

Parameters:candidate (object) – Target data object from which to get the atomic coordinates.
make_neighborlist(candidate, neighbor_number=1)

Function to generate the neighborlist.

Parameters:
  • candidate (object) – Target data object on which to generate neighbor list.
  • dx (dict) – Buffer to calculate nearest neighbor pairs in dict format: dx = {atomic_number: buffer}.
  • neighbor_number (int) – Neighbor shell.
catlearn.featurize.base.check_labels(labels, result, atoms)

Check that two lists have the same length. If not, print an informative error message containing a databse id if present.

Parameters:
  • labels (list) – A list of feature names.
  • result (list) – A fingerprint.
  • atoms (object) – A single atoms object.

catlearn.featurize.neighbor_matrix module

Functions to build a neighbor matrix feature representation.

catlearn.featurize.neighbor_matrix.connection_dict(atoms, periodic=False, dx=0.2, neighbor_number=1, reuse_nl=False)

Generate a dict of atom connections.

Parameters:
  • atoms (object) – Target ase atoms object on which to build the connections matrix.
  • periodic (boolean) – Specify whether to use the periodic neighborlist generator. None periodic method is faster and used by default.
  • dx (float) – Buffer to calculate nearest neighbor pairs.
  • neighbor_number (int) – Neighbor shell.
  • reuse_nl (boolean) – Whether to reuse a previously stored neighborlist if available.
catlearn.featurize.neighbor_matrix.connection_matrix(atoms, periodic=False, dx=0.2, neighbor_number=1, reuse_nl=False)

Generate a connections matrix from an atoms object.

Parameters:
  • atoms (object) – Target ase atoms object on which to build the connections matrix.
  • periodic (boolean) – Specify whether to use the periodic neighborlist generator. None periodic method is faster and used by default.
  • dx (float) – Buffer to calculate nearest neighbor pairs.
  • neighbor_number (int) – Neighbor shell.
  • reuse_nl (boolean) – Whether to reuse a previously stored neighborlist if available.
catlearn.featurize.neighbor_matrix.neighbor_features(atoms, property=None, periodic=False, dx=0.2, neighbor_number=1, reuse_nl=False)

Generate predefined features from atoms objects.

Parameters:
  • atoms (object) – The target ase atoms object.
  • property (list) – List of the target properties from mendeleev.
  • periodic (boolean) – Specify whether to use the periodic neighborlist generator. None periodic method is faster and used by default.
  • dx (float) – Buffer to calculate nearest neighbor pairs.
  • neighbor_number (int) – Neighbor shell.
  • reuse_nl (boolean) – Whether to reuse a previously stored neighborlist if available.
catlearn.featurize.neighbor_matrix.property_matrix(atoms, property)

Generate a property matrix based on the atomic types.

Parameters:
  • atoms (object) – The target ase atoms opject.
  • property (str) – The target property from mendeleev.

catlearn.featurize.periodic_table_data module

Function pulling atomic data for elements.

This is typically used in conjunction with other fiungerprint generators to combine general atomic data with more specific properties.

catlearn.featurize.periodic_table_data.default_catlearn_radius(z)

Return the default CatLearn covalent radius of element z.

Parameters:z (int) – Atomic number.
catlearn.featurize.periodic_table_data.get_mendeleev_params(atomic_number, params=None)

Return a list of generic parameters about an atom.

Parameters:
  • atomic_number (list or int) – An atomic number.
  • extra_params (list of str) – Extra Mendaleev parameters to be returned in the list. For a full list see here - https://goo.gl/G4eTvu
Returns:

var – All parameters of the element with specified atomic number.

Return type:

list

catlearn.featurize.periodic_table_data.get_radius(z, params=['atomic_radius', 'covalent_radius_cordero'])

Return a metric of atomic radius.

Parameters:
  • z (int) – Atomic number.
  • params (list) – Atomic radius metrics in order of preference. The first successful value will be returned.
catlearn.featurize.periodic_table_data.list_mendeleev_params(numbers, params=None)

Return an n by p array, containing p parameters of n atoms.

Parameters:
  • numbers (list) – atomic numbers.
  • params (list) – elemental parameters.
catlearn.featurize.periodic_table_data.make_labels(params, prefix, suffix)

Return a list of feature labels.

Parameters:
  • params (list) – Parameter keys.
  • prefix (str) – Appended in front of each parameter key.
  • suffix (str) – Appended to end of each parameter key.
Returns:

labels

Return type:

list

catlearn.featurize.periodic_table_data.n_outer(econf)
catlearn.featurize.periodic_table_data.stat_mendeleev_params(composition, params=None)

Return an n by p array, containing p parameters of n atoms and stoichiometry weigths associated with the unique elements in the formula.

Parameters:
  • composition (str) – chemical composition formula. Floats are accepted.
  • params (list) – elemental parameters.

catlearn.featurize.setup module

Functions to setup fingerprint vectors.

class catlearn.featurize.setup.FeatureGenerator(atom_types=None, atom_len=None, nprocs=1, **kwargs)

Bases: catlearn.fingerprint.adsorbate.AdsorbateFingerprintGenerator, catlearn.fingerprint.particle.ParticleFingerprintGenerator, catlearn.fingerprint.standard.StandardFingerprintGenerator, catlearn.fingerprint.graph.GraphFingerprintGenerator, catlearn.fingerprint.bulk.BulkFingerprintGenerator, catlearn.fingerprint.convoluted.ConvolutedFingerprintGenerator, catlearn.fingerprint.chalcogenide.ChalcogenideFingerprintGenerator, catlearn.fingerprint.catapp.CatappFingerprintGenerator, catlearn.fingerprint.molecule.AutoCorrelationFingerprintGenerator

Feature generator class.

It is sometimes necessary to normalize the length of feature vectors when data is supplied with variable numbers of atoms or elemental types. If this is the case, use the normalize_features function.

In this class, there are functions to take a data object and return a feature vector. This is done with the return_vec function. The names of the descriptors in the feature vector can be accessed with the return_names function.

The class inherits the actual generator functions from the [NAME]FingerprintGenerator classes. Additional variables are passed as kwargs.

featurize_atomic_pairs(candidates)

Featurize pairs of atoms by their elements and pair distances, in order to optimize the bond classifier.

Parameters:candidates (list of atoms objects.) –
Returns:data – Data matrix.
Return type:array
get_dataframe(candidates, vec_names)

Sequentially combine feature vectors. Padding handled automatically.

Parameters:
  • candidates (list or dict) – Atoms objects to construct fingerprints for.
  • vec_name (list) – List of fingerprinting functions.
Returns:

df – Fingerprint dataframe with n rows and m columns (n, m) where n is the number of candidates and m is the summed number of features from all fingerprint classes supplied.

Return type:

DataFrame

normalize_features(train_candidates, test_candidates=None)

Function to attach feature data to class.

Currently the function attaches data on all elemental types present in the data as well as the maximum number of atoms in a data object.

Parameters:
  • train_candidates (list) – List of atoms objects.
  • test_candidates (list) – List of atoms objects.
return_names(vec_names)

Function to return a list of feature names.

Parameters:vec_name (list) – List of fingerprinting functions.
Returns:fingerprint_vector – Name array.
Return type:ndarray
return_vec(candidates, vec_names)

Sequentially combine feature vectors. Padding handled automatically.

Parameters:
  • candidates (list or dict) – Atoms objects to construct fingerprints for.
  • vec_name (list) – List of fingerprinting functions.
Returns:

vector – Fingerprint array (n, m) where n is the number of candidates and m is the summed number of features from all fingerprint classes supplied.

Return type:

ndarray

catlearn.featurize.setup.default_fingerprinters(generator, data_type)

“Return a list of generators.

Parameters:
  • generator (object) – FeatureGenerator object
  • data_type (str) – ‘bulk’, ‘adsorbates’ or ‘fragment’
Returns:

vec_name – List of fingerprinting classes.

Return type:

list of / single vec class(es)

catlearn.featurize.slab_utilities module

catlearn.featurize.slab_utilities.is_metal(chemical_symbol)

Checks whether string is a metal elementary symbol.

Parameters:chemical_symbol (string) – The element name.
Returns:metal – Whether it’s a metal.
Return type:Boolean
catlearn.featurize.slab_utilities.is_oxide(atoms)

Checks whether atms object is an oxide.

Parameters:atoms (object) – ASE atoms object.
Returns:oxide – Whether it is likely an oxide.
Return type:Boolean
catlearn.featurize.slab_utilities.slab_layers(atoms, max_layers=20, tolerance=0.5)

Return a number of layers given a slab.

Parameters:
  • atoms (object) – ASE atoms object.
  • max_layers (maximum number of layers expected.) –
  • tolerance (convergence criterion for clustering) –
  • on the pooled standard deviation of z-coordinates. (based) –
  • Suggested (0.5 for oxides, 0.2 for metals.) –
Returns:

  • layer_avg_z (list) – List of average z-values of all layers.
  • layer_atoms (list of list) – Each sublist contains the atom indices of the atoms in that layer.

catlearn.featurize.slab_utilities.stoichiometry(atoms)

Return a number of layers given a slab.

Parameters:atoms (object) – ASE atoms object.
Returns:num_dict – First entry is total number of atoms. Then key = element and entry = number
Return type:dictionary

Module contents