catlearn.learning_curve¶
catlearn.learning_curve.data_process¶
Processing of data for HierarchyValidation.
-
class
catlearn.learning_curve.data_process.
data_process
(features, min_split, max_split, scale=True, normalization=True, ridge=True, loocv=True, batchfarm=False)¶ Bases:
object
Class to glue different function used for HierarchyValidation.
This class pick up data from HierarchyValidation. The data is then modified if requested with “feature_preprocess”, and “predict”. The data is then fitted with regression model for example with “ridge_regression”. The error of the fit is then measured.
-
average_nested
(Y, X)¶ Calculate statistics for predicition.
Parameters: - data_size (list) – Data_size for where the prediction were made.
- p_error (list) – Error for where the prediction were made.
-
get_statistic
(data_size, p_error)¶ Generate statistics for predicition.
Parameters: - data_size (list) – Data_size for where the prediction were made.
- p_error (list) – Error for where the prediction were made.
-
globalscaling
(globalscaledata, train_features)¶ All sub-groups of traindata are scaled same.
Parameters: globalscaledata (string) – The data will be scaled globally if requested.
-
prediction_error
(test_features, test_targets, coef, s_tar, m_tar)¶ Calculate the error of the prediction with the model.
Parameters: - test_features (array) – Independet data for testing the model.
- test_targets (array) – Dependent data to test the model.
- coef (array) – The coeffieiceints which makes up the model.
- s_tar (string) – Standard devation or (max-min), for the dependent train_targets.
- m_tar (array) – Mean for the dependent train_targets.
-
scaling_data
(train_features, train_targets, test_features, s_tar, m_tar, s_feat, m_feat)¶ Scaling the data if requested.
Parameters: - train_feature (array) – Independent data used to train model.
- train_targets (array) – Dependent data used to train model.
- test_features (array) – Independent data used to test the model.
- s_tar (array) – Standard devation or (max-min), for the dependent train_targets.
- m_tar (array) – Mean for the dependent train_targets.
- s_feat (array) – Standard devation or (max-min), for the independent train_features.
- m_feat (array) – Mean for the independent train_features.
-
catlearn.learning_curve.feature_selection¶
Feature selection with lasso.
-
class
catlearn.learning_curve.feature_selection.
feature_selection
(train_features, train_targets)¶ Bases:
object
Class made to make it possible to select features.
Used with hierarchy cross-validation.
-
alpha_finder
(feat_vec, alpha_vec, feat)¶ Find the alpha corresponding to the number of features.
Parameters: - feat_vec (list) – Features within the interval.
- alpha_vec (list) – Alphas within the interval.
- feat (int) – The group of feature searched.
-
alpha_refinment
(alpha, feat, splits=10, refsteps=1, upper=1.5)¶ Find a more stringent alpha for the number of feature searched for.
Parameters: - alpha (int) – Initial alpha found for the nuumber of feature searched for. Will be used as a lower limit.
- feat (int) – The number of feature searched for.
- splits (int) – Increase of Number of alphas under inspection within interval.
- refsteps (int) – Number of refinements.
- upper – How many times alpha the upper limit should be.
-
feature_inspection
(lower=0, upper=1, interval=100, alpha_list=None)¶ Generate interval used to search for the alpha.
Parameters: - lower (int) – Lower bound for the interval search.
- upper (int) – Upper bound for the interval search.
- interval (int) – Number of alphas in interval inspected.
-
interval_modifier
(feat_vec, alpha_vec, feat, splits, int_expand)¶ Modifiy the interval under inspection by reduction or expantion.
Parameters: - feat_vec (list) – Features within the interval.
- alpha_vec (list) – Alphas within the interval.
- feat (int) – The group of feature searched.
- splits (int) – Increase of Number of alphas under inspection within interval.
- int_expand (int) – Number of times the number of alphas in interval is increased.
-
selection
(select_limit)¶ Select the the feture/s that works best wtig L1.
-
catlearn.learning_curve.learning_curve¶
Generate the learning curve.
-
class
catlearn.learning_curve.learning_curve.
LearningCurve
(nprocs=1)¶ Bases:
object
Learning curve class. Test a model while varying the density of the training data.
-
run
(model, train, target, test, test_target, step=1, min_data=2)¶ Evaluate a model versus training data size.
Parameters: - model (object) –
A function that will train or load a regression model or classifier and make predictions for testing. model should accept the parameters:
train_features : array test_features : array train_targets : list test_targets : listmodel should return either a float or a list of floats. The float or the first value of the list will be used as the fitness score.
- train (array) – An n, d array of training examples.
- targets (test) – A list of the target values.
- test (array) – An n, d array of test data.
- targets – A list of the test target values.
- step (int) – Incrementent the data set size by this many examples.
- min_data (int) – Smallest number of training examples to test.
Returns: output – Each row is the output from the model object.
Return type: array
- model (object) –
-
-
catlearn.learning_curve.learning_curve.
feature_frequency
(cv, features, min_split, max_split, smallest=False, new_data=True, ridge=True, scale=True, globalscale=True, normalization=True, featselect_featvar=False, featselect_featconst=True, select_limit=None, feat_sub=15)¶ Function to extract raw data from the database.
Parameters: - features (int) – Number of features used for regression.
- min_split (int) – Number of datasplit in the smallest sub-set.
- max_split (int) – Number of datasplit in the largest sub-set.
- new_data (string) – Use new data or the previous data.
- ridge (string) – Ridge regulazer is deafult. If False, lasso is used.
- scale (string) – If the data are supposed to be scaled or not.
- globalscale (string) – Using global scaleing or not.
- normalization (string) – If scaled, normalized or standardized. Normalized is default.
- feature_selection (string) – Using feature selection with ridge, or plain vanilla ridge.
- select_limit (int) – Up to have many number of features used for feature selection.
-
catlearn.learning_curve.learning_curve.
hierarchy
(cv, features, min_split, max_split, new_data=True, ridge=True, scale=True, globalscale=True, normalization=True, featselect_featvar=False, featselect_featconst=True, select_limit=None, feat_sub=15)¶ Start the hierarchy.
Parameters: - features (int) – Number of features used for regression.
- min_split (int) – Number of datasplit in the smallest sub-set.
- max_split (int) – Number of datasplit in the largest sub-set.
- new_data (string) – Use new data or the previous data.
- ridge (string) – Ridge regulazer is deafult. If False, lasso is used.
- scale (string) – If the data are supposed to be scaled or not.
- globalscale (string) – Using global scaleing or not.
- normalization (string) – If scaled, normalized or standardized. Normalized is default.
- feature_selection (string) – Using feature selection with ridge, or plain vanilla ridge.
- select_limit (int) – Up to have many number of features used for feature selection.
catlearn.learning_curve.placeholder¶
Placeholder for now.
-
class
catlearn.learning_curve.placeholder.
placeholder
(PC, index_split, hv, indicies, hier_level, featselect_featvar, featselect_featconst, s_feat, m_feat, feat_sub=15, s_tar=None, m_tar=None, select_limit=None, selected_features=None, glob_feat1=None, glob_tar1=None, new_training=True)¶ Bases:
object
Used to make the hierarchey more easy to follow.
Placeholder for now.
-
get_data_scale
(split, set_size=None, p_error=None, result=None)¶ Get the data for each sub-set of data and scales it accordingly.
Parameters: - split (int) – Which sub-set od data within hierarchy level.
- result (list) – Contain all the coefficien and omega2 for all training data.
- set_size (list) – Size of sub-set of data/features which the model is based on.
- p_error (list) – The prediction error for plain vanilla ridge.
-
getstats
()¶ Used to get features for the frequencyplots.
-
predict_subsets
(result=None, set_size=None, p_error=None)¶ Run the prediction on each sub-set of data on the hierarchy level.
Parameters: - result (list) – Contain all the coefficien and omega2 for all training data.
- set_size (list) – Size of sub-set of data/features which the model is based on.
- p_error (list) – The prediction error for plain vanilla ridge.
-
reg_data_var
(train_features, train_targets, test_features, test_targets, ridge, set_size, p_error, result)¶ Ridge regression and calculation of prediction error.
Parameters: - train_features (array) – Independent data used to train the model.
- train_targets (array) – Dependent data used to train model.
- test_features (array) – Independent data used to test model.
- test_target (array) – Dependent data used to test model.
- ridge (object) – Generates the model based on the training data.
- set_size (list) – Size of sub-set of data/features which the model is based on.
- p_error (list) – The prediction error for plain vanilla ridge.
- result (list) – Contain all the coefficien and omega2 for all training data.
-
reg_feat_var
(train_features, train_targets, test_features, test_targets, ridge, set_size, p_error, result)¶ Regression within a dataset with varying feature.
Parameters: - train_features (array) – Independent data used to train the model.
- train_targets (array) – Dependent data used to train model.
- test_features (array) – Independent data used to test model.
- test_target (array) – Dependent data used to test model.
- ridge (object) – Generates the model based on the training data.
- p_error (list) – The prediction error for feature selection corresponding to different feature set.
- set_size (list) – Different data/feature set used for feature selection.
- result (list) – Contain all the coefficien and omega2 for all training data.
-