pipeline.annotation package¶
Submodules¶
pipeline.annotation.annotate_utterances module¶
-
class
pipeline.annotation.annotate_utterances.PythonTurnAnnotator(args)¶ Bases:
object-
check_empty_spaces(i=0, min_gap=0.2)¶ Check between voice activity for missed utterances. Insert utterances into gaps if found.
- Parameters
i (int, optional) – Starting point in utterance list. Defaults to 0.
- Returns
- Return type
[utterances]
-
check_times()¶ Iterate through times and check if they are correct. Checks if the string and float times match as well as if the end time comes after the start time. Prints out errors so they can be found and fixed by manual entry into the json.
- Returns
Returns True if errors were found.
- Return type
Bool
-
clean_times(i=0)¶ Iterates through utterances and updates float time with the value of the string time.
- Parameters
i (ing, optional) – Starting point. Defaults to 0.
- Returns
list of utterances
- Return type
[utterances]
-
fill_labels(i=0)¶ This is intended to be a first pass through unlabeled utterances will modify the list of utterances in place and write in progress updates.
- Parameters
i (int, optional) – Starting point in utterance list. Defaults to 0.
- Returns
- Return type
[utterances]
-
fix_speaker_labels(i=0)¶ Iterate through the labels and correct those that have been marked as ‘f’ for fix. Will request input from the user and then update the utterance list and file.
- Parameters
i (int, optional) – Starting point in utterance list. Defaults to 0.
- Returns
list of utterances
- Return type
[utterances]
-
make_turns()¶ Merges utterances into turns. Removes ‘chunk’ from utterance. Takes series of continuous utterances by the same person and merges them into a single turn.
- Returns
list of merged utterances
- Return type
[utterances]
-
review_labels()¶
-
review_person(person)¶
-
pipeline.annotation.annotations_to_csv module¶
-
pipeline.annotation.annotations_to_csv.convert_annotations(annotation_path, output_path, frame_rate=30, json_key='utterances', label_key='speaker', feature_csv_path=None)¶ Turns annotations that are formatted in .json to CSV
The CSV is formatted such that each row is a frame and each column is a one hot encoded label
- Parameters
annotation_path ([type]) – path to annotation json. Json should be formatted as
list with individual annotations having a start and stop time and a label. (a) –
output_path ([type]) – path to converted csv
json_key (str, optional) – Key to list of annotations. Defaults to “utterances”.
label_key (str, optional) – Key to labels that have been annotated. Defaults to “speaker”.
feature_csv_path ([type], optional) – Path to feature csv with “timestamps” that will be
to create labels that match the features. Defaults to None. If not supplied the timestamps (used) –
be generated using the framerate. (will) –
pipeline.annotation.combine_labels_and_features module¶
-
pipeline.annotation.combine_labels_and_features.main(label_file, feature_file, num_labels, output)¶
pipeline.annotation.intervals_to_timeseries module¶
-
pipeline.annotation.intervals_to_timeseries.main(interval_file, output, frequency, delimiter)¶