catlearn.learning_curve

catlearn.learning_curve.data_process

Processing of data for HierarchyValidation.

class catlearn.learning_curve.data_process.data_process(features, min_split, max_split, scale=True, normalization=True, ridge=True, loocv=True, batchfarm=False)

Bases: object

Class to glue different function used for HierarchyValidation.

This class pick up data from HierarchyValidation. The data is then modified if requested with “feature_preprocess”, and “predict”. The data is then fitted with regression model for example with “ridge_regression”. The error of the fit is then measured.

average_nested(Y, X)

Calculate statistics for predicition.

Parameters:
  • data_size (list) – Data_size for where the prediction were made.
  • p_error (list) – Error for where the prediction were made.
get_statistic(data_size, p_error)

Generate statistics for predicition.

Parameters:
  • data_size (list) – Data_size for where the prediction were made.
  • p_error (list) – Error for where the prediction were made.
globalscaling(globalscaledata, train_features)

All sub-groups of traindata are scaled same.

Parameters:globalscaledata (string) – The data will be scaled globally if requested.
prediction_error(test_features, test_targets, coef, s_tar, m_tar)

Calculate the error of the prediction with the model.

Parameters:
  • test_features (array) – Independet data for testing the model.
  • test_targets (array) – Dependent data to test the model.
  • coef (array) – The coeffieiceints which makes up the model.
  • s_tar (string) – Standard devation or (max-min), for the dependent train_targets.
  • m_tar (array) – Mean for the dependent train_targets.
scaling_data(train_features, train_targets, test_features, s_tar, m_tar, s_feat, m_feat)

Scaling the data if requested.

Parameters:
  • train_feature (array) – Independent data used to train model.
  • train_targets (array) – Dependent data used to train model.
  • test_features (array) – Independent data used to test the model.
  • s_tar (array) – Standard devation or (max-min), for the dependent train_targets.
  • m_tar (array) – Mean for the dependent train_targets.
  • s_feat (array) – Standard devation or (max-min), for the independent train_features.
  • m_feat (array) – Mean for the independent train_features.

catlearn.learning_curve.feature_selection

Feature selection with lasso.

class catlearn.learning_curve.feature_selection.feature_selection(train_features, train_targets)

Bases: object

Class made to make it possible to select features.

Used with hierarchy cross-validation.

alpha_finder(feat_vec, alpha_vec, feat)

Find the alpha corresponding to the number of features.

Parameters:
  • feat_vec (list) – Features within the interval.
  • alpha_vec (list) – Alphas within the interval.
  • feat (int) – The group of feature searched.
alpha_refinment(alpha, feat, splits=10, refsteps=1, upper=1.5)

Find a more stringent alpha for the number of feature searched for.

Parameters:
  • alpha (int) – Initial alpha found for the nuumber of feature searched for. Will be used as a lower limit.
  • feat (int) – The number of feature searched for.
  • splits (int) – Increase of Number of alphas under inspection within interval.
  • refsteps (int) – Number of refinements.
  • upper – How many times alpha the upper limit should be.
feature_inspection(lower=0, upper=1, interval=100, alpha_list=None)

Generate interval used to search for the alpha.

Parameters:
  • lower (int) – Lower bound for the interval search.
  • upper (int) – Upper bound for the interval search.
  • interval (int) – Number of alphas in interval inspected.
interval_modifier(feat_vec, alpha_vec, feat, splits, int_expand)

Modifiy the interval under inspection by reduction or expantion.

Parameters:
  • feat_vec (list) – Features within the interval.
  • alpha_vec (list) – Alphas within the interval.
  • feat (int) – The group of feature searched.
  • splits (int) – Increase of Number of alphas under inspection within interval.
  • int_expand (int) – Number of times the number of alphas in interval is increased.
selection(select_limit)

Select the the feture/s that works best wtig L1.

catlearn.learning_curve.learning_curve

Generate the learning curve.

class catlearn.learning_curve.learning_curve.LearningCurve(nprocs=1)

Bases: object

Learning curve class. Test a model while varying the density of the training data.

run(model, train, target, test, test_target, step=1, min_data=2)

Evaluate a model versus training data size.

Parameters:
  • model (object) –

    A function that will train or load a regression model or classifier and make predictions for testing. model should accept the parameters:

    train_features : array test_features : array train_targets : list test_targets : list

    model should return either a float or a list of floats. The float or the first value of the list will be used as the fitness score.

  • train (array) – An n, d array of training examples.
  • targets (test) – A list of the target values.
  • test (array) – An n, d array of test data.
  • targets – A list of the test target values.
  • step (int) – Incrementent the data set size by this many examples.
  • min_data (int) – Smallest number of training examples to test.
Returns:

output – Each row is the output from the model object.

Return type:

array

catlearn.learning_curve.learning_curve.feature_frequency(cv, features, min_split, max_split, smallest=False, new_data=True, ridge=True, scale=True, globalscale=True, normalization=True, featselect_featvar=False, featselect_featconst=True, select_limit=None, feat_sub=15)

Function to extract raw data from the database.

Parameters:
  • features (int) – Number of features used for regression.
  • min_split (int) – Number of datasplit in the smallest sub-set.
  • max_split (int) – Number of datasplit in the largest sub-set.
  • new_data (string) – Use new data or the previous data.
  • ridge (string) – Ridge regulazer is deafult. If False, lasso is used.
  • scale (string) – If the data are supposed to be scaled or not.
  • globalscale (string) – Using global scaleing or not.
  • normalization (string) – If scaled, normalized or standardized. Normalized is default.
  • feature_selection (string) – Using feature selection with ridge, or plain vanilla ridge.
  • select_limit (int) – Up to have many number of features used for feature selection.
catlearn.learning_curve.learning_curve.hierarchy(cv, features, min_split, max_split, new_data=True, ridge=True, scale=True, globalscale=True, normalization=True, featselect_featvar=False, featselect_featconst=True, select_limit=None, feat_sub=15)

Start the hierarchy.

Parameters:
  • features (int) – Number of features used for regression.
  • min_split (int) – Number of datasplit in the smallest sub-set.
  • max_split (int) – Number of datasplit in the largest sub-set.
  • new_data (string) – Use new data or the previous data.
  • ridge (string) – Ridge regulazer is deafult. If False, lasso is used.
  • scale (string) – If the data are supposed to be scaled or not.
  • globalscale (string) – Using global scaleing or not.
  • normalization (string) – If scaled, normalized or standardized. Normalized is default.
  • feature_selection (string) – Using feature selection with ridge, or plain vanilla ridge.
  • select_limit (int) – Up to have many number of features used for feature selection.

catlearn.learning_curve.placeholder

Placeholder for now.

class catlearn.learning_curve.placeholder.placeholder(PC, index_split, hv, indicies, hier_level, featselect_featvar, featselect_featconst, s_feat, m_feat, feat_sub=15, s_tar=None, m_tar=None, select_limit=None, selected_features=None, glob_feat1=None, glob_tar1=None, new_training=True)

Bases: object

Used to make the hierarchey more easy to follow.

Placeholder for now.

get_data_scale(split, set_size=None, p_error=None, result=None)

Get the data for each sub-set of data and scales it accordingly.

Parameters:
  • split (int) – Which sub-set od data within hierarchy level.
  • result (list) – Contain all the coefficien and omega2 for all training data.
  • set_size (list) – Size of sub-set of data/features which the model is based on.
  • p_error (list) – The prediction error for plain vanilla ridge.
getstats()

Used to get features for the frequencyplots.

predict_subsets(result=None, set_size=None, p_error=None)

Run the prediction on each sub-set of data on the hierarchy level.

Parameters:
  • result (list) – Contain all the coefficien and omega2 for all training data.
  • set_size (list) – Size of sub-set of data/features which the model is based on.
  • p_error (list) – The prediction error for plain vanilla ridge.
reg_data_var(train_features, train_targets, test_features, test_targets, ridge, set_size, p_error, result)

Ridge regression and calculation of prediction error.

Parameters:
  • train_features (array) – Independent data used to train the model.
  • train_targets (array) – Dependent data used to train model.
  • test_features (array) – Independent data used to test model.
  • test_target (array) – Dependent data used to test model.
  • ridge (object) – Generates the model based on the training data.
  • set_size (list) – Size of sub-set of data/features which the model is based on.
  • p_error (list) – The prediction error for plain vanilla ridge.
  • result (list) – Contain all the coefficien and omega2 for all training data.
reg_feat_var(train_features, train_targets, test_features, test_targets, ridge, set_size, p_error, result)

Regression within a dataset with varying feature.

Parameters:
  • train_features (array) – Independent data used to train the model.
  • train_targets (array) – Dependent data used to train model.
  • test_features (array) – Independent data used to test model.
  • test_target (array) – Dependent data used to test model.
  • ridge (object) – Generates the model based on the training data.
  • p_error (list) – The prediction error for feature selection corresponding to different feature set.
  • set_size (list) – Different data/feature set used for feature selection.
  • result (list) – Contain all the coefficien and omega2 for all training data.

catlearn.learning_curve.pltfile