pipeline.modeling package¶

Submodules¶

pipeline.modeling.data_to_df module¶

class pipeline.modeling.data_to_df.LoadDF(config_path, feather_dir='./data/feathered_data')¶

Bases: object

Takes a dataset of a bunch of csvs and converts them into a single DataFrame

This dataframe can be returned or saved as a feather database for fast load times.

We name our compressed datset using a hash of the features in the dataset. This enables the identical dataset to be loaded if the features haven’t changed.

A dataset configuration file has three parts: feature_sets, feature_files, and features: feature_sets: the types of features to be included feature_files: the paths to the csv files for each feature set features: the feature names (and csv column headers) for each feature set

load_all_dataframes(**kw)¶

pipeline.modeling.data_utils module¶

class pipeline.modeling.data_utils.TransformDF¶

Bases: object

apply_rolling_window(**kw)¶

normalize_dataset(**kw)¶

sub_sample(**kw)¶

pipeline.modeling.datasets module¶

pipeline.modeling.model_defs module¶

pipeline.modeling.model_monitoring module¶

class pipeline.modeling.model_monitoring.EarlyStopping(name, mode='min', min_delta=0.001, patience=10, percentage=False)¶

Bases: object

step(metric, verbose=True)¶

Compare metric against last time step to determine if training should stop.

Parameters

metric (float) – Metric can be a loss value or a model performance metric (e.g. accuracy)
verbose (bool) – Print a status explain why stop or continue is recommended

Returns

Indication of whether or not to stop now (True-> stop; False-> continue)

Return type

[bool]

pipeline.modeling.model_performance module¶

class pipeline.modeling.model_performance.ModelMetrics(params)¶

Bases: object

calculate_metrics(labels, preds, probs=None, output_dict=True, summary_stat='macro avg', verbose=False)¶

graph_model_output(actual_labels, predicted_labels, probabilities=None, max_graph_size=1000, title='Graph Title')¶

listify_metrics(metrics_dict, loss=0)¶

Convert metrics from a dictionary to a list

The dictionary of all metrics is converted to a list and the columns are saved in self.metrics_names. The loss is not always used for

Parameters

metrics_dict (dict) – dictionary of all performance metrics
loss (int, optional) – cumulative loss for a given epoch. Defaults to 0.

Returns

a list of the performance metrics values and a dataframe

Return type

list, DataFrame

plot_metrics(metrics, metrics_names, verbose=False)¶

pipeline.modeling.model_training module¶

pipeline.modeling.select_features module¶

pipeline.modeling.select_features.calculateCorr(df, corr_method, threshold)¶: Methods include ‘pearson’, ‘kendall’, ‘spearman’

pipeline.modeling.select_features.get_args() → argparse.ArgumentParser¶

pipeline.modeling.select_features.intersection(feature_lists)¶

pipeline.modeling.select_features.main()¶

pipeline.modeling.select_features.select_by_correlation(feature_csv_path, correlation_method='pearson', threshold=0.7)¶

pipeline.modeling package¶

Submodules¶

pipeline.modeling.data_to_df module¶

pipeline.modeling.data_utils module¶

pipeline.modeling.datasets module¶

pipeline.modeling.model_defs module¶

pipeline.modeling.model_monitoring module¶

pipeline.modeling.model_performance module¶

pipeline.modeling.model_training module¶

pipeline.modeling.select_features module¶

Module contents¶