pipeline.modeling package¶
Submodules¶
pipeline.modeling.data_to_df module¶
-
class
pipeline.modeling.data_to_df.LoadDF(config_path, feather_dir='./data/feathered_data')¶ Bases:
objectTakes a dataset of a bunch of csvs and converts them into a single DataFrame
This dataframe can be returned or saved as a feather database for fast load times.
We name our compressed datset using a hash of the features in the dataset. This enables the identical dataset to be loaded if the features haven’t changed.
- A dataset configuration file has three parts: feature_sets, feature_files, and features
feature_sets: the types of features to be included feature_files: the paths to the csv files for each feature set features: the feature names (and csv column headers) for each feature set
-
load_all_dataframes(**kw)¶
pipeline.modeling.data_utils module¶
pipeline.modeling.datasets module¶
pipeline.modeling.model_defs module¶
pipeline.modeling.model_monitoring module¶
-
class
pipeline.modeling.model_monitoring.EarlyStopping(name, mode='min', min_delta=0.001, patience=10, percentage=False)¶ Bases:
object-
step(metric, verbose=True)¶ Compare metric against last time step to determine if training should stop.
- Parameters
metric (float) – Metric can be a loss value or a model performance metric (e.g. accuracy)
verbose (bool) – Print a status explain why stop or continue is recommended
- Returns
Indication of whether or not to stop now (True-> stop; False-> continue)
- Return type
[bool]
-
pipeline.modeling.model_performance module¶
-
class
pipeline.modeling.model_performance.ModelMetrics(params)¶ Bases:
object-
calculate_metrics(labels, preds, probs=None, output_dict=True, summary_stat='macro avg', verbose=False)¶
-
graph_model_output(actual_labels, predicted_labels, probabilities=None, max_graph_size=1000, title='Graph Title')¶
-
listify_metrics(metrics_dict, loss=0)¶ Convert metrics from a dictionary to a list
The dictionary of all metrics is converted to a list and the columns are saved in self.metrics_names. The loss is not always used for
- Parameters
metrics_dict (dict) – dictionary of all performance metrics
loss (int, optional) – cumulative loss for a given epoch. Defaults to 0.
- Returns
a list of the performance metrics values and a dataframe
- Return type
list, DataFrame
-
plot_metrics(metrics, metrics_names, verbose=False)¶
-
pipeline.modeling.model_training module¶
pipeline.modeling.select_features module¶
-
pipeline.modeling.select_features.calculateCorr(df, corr_method, threshold)¶ Methods include ‘pearson’, ‘kendall’, ‘spearman’
-
pipeline.modeling.select_features.get_args() → argparse.ArgumentParser¶
-
pipeline.modeling.select_features.intersection(feature_lists)¶
-
pipeline.modeling.select_features.main()¶
-
pipeline.modeling.select_features.select_by_correlation(feature_csv_path, correlation_method='pearson', threshold=0.7)¶