catlearn.cross_validation¶

catlearn.cross_validation.hierarchy_cv¶

Cross validation routines to work with feature database.

class catlearn.cross_validation.hierarchy_cv.Hierarchy(file_name, db_name, table='FingerVector', file_format='pickle')¶

Bases: object

Class to form hierarchy crossvalidation setup.

This class is used to cross-validate with respect to data size. The initial dataset is split in two and subsequent datasets split further until a minimum size is reached. Predictions are made on all subsets of data giving averaged error and certainty at each data size.

get_subset_data(index_split, indicies, split=None)¶

Make array with training data according to index.

Parameters:	index_split (array) – Array with the index data. indicies (array) – Index used to generate data.

globalscaledata(index_split)¶

Make an array with all data.

Parameters:	index_split (array) – Array with the index data.

load_split()¶: Function to load the split from file.

split_index(min_split, max_split=None, all_index=None)¶

Function to split up the db index to form subsets of data.

Parameters:	min_split (int) – Minimum size of a data subset. max_split (int) – Maximum size of a data subset. all_index (list) – List of indices in the feature database.

split_predict(index_split, predict, **kwargs)¶

Function to make predictions looping over all subsets of data.

Parameters:

index_split (dict) – All data for the split.
predict (function) – The prediction function. Must return dict with ‘result’ in it.

Returns:

result (list) – A list of averaged errors for each subset of data.
size (list) – A list of data sizes corresponding to the errors list.

todb(features, targets)¶: Function to convert numpy arrays to basic db.

transform_output(data)¶

Function to compile results in a format for plotting average error.

Parameters:	data (dict) – The dictionary output from the split_predict function.
Returns:	size (list) – A list of the data sizes used in the CV. error (list) – A list of the mean errors at each data size.