catlearn.cross_validation¶
catlearn.cross_validation.hierarchy_cv¶
Cross validation routines to work with feature database.
-
class
catlearn.cross_validation.hierarchy_cv.
Hierarchy
(file_name, db_name, table='FingerVector', file_format='pickle')¶ Bases:
object
Class to form hierarchy crossvalidation setup.
This class is used to cross-validate with respect to data size. The initial dataset is split in two and subsequent datasets split further until a minimum size is reached. Predictions are made on all subsets of data giving averaged error and certainty at each data size.
-
get_subset_data
(index_split, indicies, split=None)¶ Make array with training data according to index.
Parameters: - index_split (array) – Array with the index data.
- indicies (array) – Index used to generate data.
-
globalscaledata
(index_split)¶ Make an array with all data.
Parameters: index_split (array) – Array with the index data.
-
load_split
()¶ Function to load the split from file.
-
split_index
(min_split, max_split=None, all_index=None)¶ Function to split up the db index to form subsets of data.
Parameters: - min_split (int) – Minimum size of a data subset.
- max_split (int) – Maximum size of a data subset.
- all_index (list) – List of indices in the feature database.
-
split_predict
(index_split, predict, **kwargs)¶ Function to make predictions looping over all subsets of data.
Parameters: - index_split (dict) – All data for the split.
- predict (function) – The prediction function. Must return dict with ‘result’ in it.
Returns: - result (list) – A list of averaged errors for each subset of data.
- size (list) – A list of data sizes corresponding to the errors list.
-
todb
(features, targets)¶ Function to convert numpy arrays to basic db.
-
transform_output
(data)¶ Function to compile results in a format for plotting average error.
Parameters: data (dict) – The dictionary output from the split_predict function. Returns: - size (list) – A list of the data sizes used in the CV.
- error (list) – A list of the mean errors at each data size.
-
catlearn.cross_validation.k_fold_cv¶
Setup k-fold array split for cross validation.
-
catlearn.cross_validation.k_fold_cv.
k_fold
(features, targets=None, nsplit=3, fix_size=None)¶ Routine to split feature matrix and return sublists.
Parameters: - features (array) – An n, d feature array.
- targets (list) – A list to target values.
- nsplit (int) – The number of bins that data should be devided into.
- fix_size (int) – Define a fixed sample size, e.g. nsplit=5 fix_size=100, generates 5 x 100 data split. Default is None, all available data is divided nsplit times.
Returns: - features (list) – A list of feature arrays of length nsplit.
- targets (list) – A list of targets lists of length nsplit.
-
catlearn.cross_validation.k_fold_cv.
read_split
(fname, fformat='pickle')¶ Function to read the k-fold split from file.
Parameters: - fname (str) – The name of the read file.
- fformat (str) – File format to read from. Can be json or pickle, default is pickle.
Returns: - features (list) – A list of feature arrays of length nsplit.
- targets (list) – A list of targets lists of length nsplit.
-
catlearn.cross_validation.k_fold_cv.
write_split
(features, targets, fname, fformat='pickle')¶ Function to write the k-fild split to file.
Parameters: - features (array) – An n, d feature array.
- targets (list) – A list to target values.
- fname (str) – The name of the write file.
- fformat (str) – File format to write to. Can be json or pickle, default is pickle.
Cross validation functions.