catlearn.learning_curve¶

catlearn.learning_curve.data_process¶

Processing of data for HierarchyValidation.

class catlearn.learning_curve.data_process.data_process(features, min_split, max_split, scale=True, normalization=True, ridge=True, loocv=True, batchfarm=False)¶

Bases: object

Class to glue different function used for HierarchyValidation.

This class pick up data from HierarchyValidation. The data is then modified if requested with “feature_preprocess”, and “predict”. The data is then fitted with regression model for example with “ridge_regression”. The error of the fit is then measured.

average_nested(Y, X)¶

Calculate statistics for predicition.

Parameters:	data_size (list) – Data_size for where the prediction were made. p_error (list) – Error for where the prediction were made.

get_statistic(data_size, p_error)¶

Generate statistics for predicition.

Parameters:	data_size (list) – Data_size for where the prediction were made. p_error (list) – Error for where the prediction were made.

globalscaling(globalscaledata, train_features)¶

All sub-groups of traindata are scaled same.

Parameters:	globalscaledata (string) – The data will be scaled globally if requested.

prediction_error(test_features, test_targets, coef, s_tar, m_tar)¶

Calculate the error of the prediction with the model.

Parameters:	test_features (array) – Independet data for testing the model. test_targets (array) – Dependent data to test the model. coef (array) – The coeffieiceints which makes up the model. s_tar (string) – Standard devation or (max-min), for the dependent train_targets. m_tar (array) – Mean for the dependent train_targets.

scaling_data(train_features, train_targets, test_features, s_tar, m_tar, s_feat, m_feat)¶

Scaling the data if requested.

Parameters:

train_feature (array) – Independent data used to train model.
train_targets (array) – Dependent data used to train model.
test_features (array) – Independent data used to test the model.
s_tar (array) – Standard devation or (max-min), for the dependent train_targets.
m_tar (array) – Mean for the dependent train_targets.
s_feat (array) – Standard devation or (max-min), for the independent train_features.
m_feat (array) – Mean for the independent train_features.

catlearn.learning_curve.feature_selection¶

Feature selection with lasso.

class catlearn.learning_curve.feature_selection.feature_selection(train_features, train_targets)¶

Bases: object

Class made to make it possible to select features.

Used with hierarchy cross-validation.

alpha_finder(feat_vec, alpha_vec, feat)¶

Find the alpha corresponding to the number of features.

Parameters:	feat_vec (list) – Features within the interval. alpha_vec (list) – Alphas within the interval. feat (int) – The group of feature searched.

alpha_refinment(alpha, feat, splits=10, refsteps=1, upper=1.5)¶

Find a more stringent alpha for the number of feature searched for.

Parameters:	alpha (int) – Initial alpha found for the nuumber of feature searched for. Will be used as a lower limit. feat (int) – The number of feature searched for. splits (int) – Increase of Number of alphas under inspection within interval. refsteps (int) – Number of refinements. upper – How many times alpha the upper limit should be.

feature_inspection(lower=0, upper=1, interval=100, alpha_list=None)¶

Generate interval used to search for the alpha.

Parameters:	lower (int) – Lower bound for the interval search. upper (int) – Upper bound for the interval search. interval (int) – Number of alphas in interval inspected.

interval_modifier(feat_vec, alpha_vec, feat, splits, int_expand)¶

Modifiy the interval under inspection by reduction or expantion.

Parameters:	feat_vec (list) – Features within the interval. alpha_vec (list) – Alphas within the interval. feat (int) – The group of feature searched. splits (int) – Increase of Number of alphas under inspection within interval. int_expand (int) – Number of times the number of alphas in interval is increased.

selection(select_limit)¶: Select the the feture/s that works best wtig L1.

catlearn.learning_curve.learning_curve¶

Generate the learning curve.

class catlearn.learning_curve.learning_curve.LearningCurve(nprocs=1)¶

Bases: object

Learning curve class. Test a model while varying the density of the training data.

run(model, train, target, test, test_target, step=1, min_data=2)¶

Evaluate a model versus training data size.

Parameters:	model (object) – A function that will train or load a regression model or classifier and make predictions for testing. model should accept the parameters: train_features : array test_features : array train_targets : list test_targets : list model should return either a float or a list of floats. The float or the first value of the list will be used as the fitness score. train (array) – An n, d array of training examples. targets (test) – A list of the target values. test (array) – An n, d array of test data. targets – A list of the test target values. step (int) – Incrementent the data set size by this many examples. min_data (int) – Smallest number of training examples to test.
Returns:	output – Each row is the output from the model object.
Return type:	array

catlearn.learning_curve.learning_curve.feature_frequency(cv, features, min_split, max_split, smallest=False, new_data=True, ridge=True, scale=True, globalscale=True, normalization=True, featselect_featvar=False, featselect_featconst=True, select_limit=None, feat_sub=15)¶

Function to extract raw data from the database.

Parameters:

features (int) – Number of features used for regression.
min_split (int) – Number of datasplit in the smallest sub-set.
max_split (int) – Number of datasplit in the largest sub-set.
new_data (string) – Use new data or the previous data.
ridge (string) – Ridge regulazer is deafult. If False, lasso is used.
scale (string) – If the data are supposed to be scaled or not.
globalscale (string) – Using global scaleing or not.
normalization (string) – If scaled, normalized or standardized. Normalized is default.
feature_selection (string) – Using feature selection with ridge, or plain vanilla ridge.
select_limit (int) – Up to have many number of features used for feature selection.

catlearn.learning_curve.learning_curve.hierarchy(cv, features, min_split, max_split, new_data=True, ridge=True, scale=True, globalscale=True, normalization=True, featselect_featvar=False, featselect_featconst=True, select_limit=None, feat_sub=15)¶

Start the hierarchy.

Parameters:

features (int) – Number of features used for regression.
min_split (int) – Number of datasplit in the smallest sub-set.
max_split (int) – Number of datasplit in the largest sub-set.
new_data (string) – Use new data or the previous data.
ridge (string) – Ridge regulazer is deafult. If False, lasso is used.
scale (string) – If the data are supposed to be scaled or not.
globalscale (string) – Using global scaleing or not.
normalization (string) – If scaled, normalized or standardized. Normalized is default.
feature_selection (string) – Using feature selection with ridge, or plain vanilla ridge.
select_limit (int) – Up to have many number of features used for feature selection.

catlearn.learning_curve.placeholder¶

Placeholder for now.

class catlearn.learning_curve.placeholder.placeholder(PC, index_split, hv, indicies, hier_level, featselect_featvar, featselect_featconst, s_feat, m_feat, feat_sub=15, s_tar=None, m_tar=None, select_limit=None, selected_features=None, glob_feat1=None, glob_tar1=None, new_training=True)¶

Bases: object

Used to make the hierarchey more easy to follow.

Placeholder for now.

get_data_scale(split, set_size=None, p_error=None, result=None)¶

Get the data for each sub-set of data and scales it accordingly.

Parameters:	split (int) – Which sub-set od data within hierarchy level. result (list) – Contain all the coefficien and omega2 for all training data. set_size (list) – Size of sub-set of data/features which the model is based on. p_error (list) – The prediction error for plain vanilla ridge.

getstats()¶: Used to get features for the frequencyplots.

predict_subsets(result=None, set_size=None, p_error=None)¶

Run the prediction on each sub-set of data on the hierarchy level.

Parameters:	result (list) – Contain all the coefficien and omega2 for all training data. set_size (list) – Size of sub-set of data/features which the model is based on. p_error (list) – The prediction error for plain vanilla ridge.

reg_data_var(train_features, train_targets, test_features, test_targets, ridge, set_size, p_error, result)¶

Ridge regression and calculation of prediction error.

Parameters:

train_features (array) – Independent data used to train the model.
train_targets (array) – Dependent data used to train model.
test_features (array) – Independent data used to test model.
test_target (array) – Dependent data used to test model.
ridge (object) – Generates the model based on the training data.
set_size (list) – Size of sub-set of data/features which the model is based on.
p_error (list) – The prediction error for plain vanilla ridge.
result (list) – Contain all the coefficien and omega2 for all training data.

reg_feat_var(train_features, train_targets, test_features, test_targets, ridge, set_size, p_error, result)¶

Regression within a dataset with varying feature.

Parameters:

train_features (array) – Independent data used to train the model.
train_targets (array) – Dependent data used to train model.
test_features (array) – Independent data used to test model.
test_target (array) – Dependent data used to test model.
ridge (object) – Generates the model based on the training data.
p_error (list) – The prediction error for feature selection corresponding to different feature set.
set_size (list) – Different data/feature set used for feature selection.
result (list) – Contain all the coefficien and omega2 for all training data.

catlearn.learning_curve¶

catlearn.learning_curve.data_process¶

catlearn.learning_curve.feature_selection¶

catlearn.learning_curve.learning_curve¶

catlearn.learning_curve.placeholder¶

catlearn.learning_curve.pltfile¶