pipeline.annotation package

Submodules

pipeline.annotation.annotate_utterances module

class pipeline.annotation.annotate_utterances.PythonTurnAnnotator(args)

Bases: object

check_empty_spaces(i=0, min_gap=0.2)

Check between voice activity for missed utterances. Insert utterances into gaps if found.

Parameters

i (int, optional) – Starting point in utterance list. Defaults to 0.

Returns

Return type

[utterances]

check_times()

Iterate through times and check if they are correct. Checks if the string and float times match as well as if the end time comes after the start time. Prints out errors so they can be found and fixed by manual entry into the json.

Returns

Returns True if errors were found.

Return type

Bool

clean_times(i=0)

Iterates through utterances and updates float time with the value of the string time.

Parameters

i (ing, optional) – Starting point. Defaults to 0.

Returns

list of utterances

Return type

[utterances]

fill_labels(i=0)

This is intended to be a first pass through unlabeled utterances will modify the list of utterances in place and write in progress updates.

Parameters

i (int, optional) – Starting point in utterance list. Defaults to 0.

Returns

Return type

[utterances]

fix_speaker_labels(i=0)

Iterate through the labels and correct those that have been marked as ‘f’ for fix. Will request input from the user and then update the utterance list and file.

Parameters

i (int, optional) – Starting point in utterance list. Defaults to 0.

Returns

list of utterances

Return type

[utterances]

make_turns()

Merges utterances into turns. Removes ‘chunk’ from utterance. Takes series of continuous utterances by the same person and merges them into a single turn.

Returns

list of merged utterances

Return type

[utterances]

review_labels()
review_person(person)

pipeline.annotation.annotations_to_csv module

pipeline.annotation.annotations_to_csv.convert_annotations(annotation_path, output_path, frame_rate=30, json_key='utterances', label_key='speaker', feature_csv_path=None)

Turns annotations that are formatted in .json to CSV

The CSV is formatted such that each row is a frame and each column is a one hot encoded label

Parameters
  • annotation_path ([type]) – path to annotation json. Json should be formatted as

  • list with individual annotations having a start and stop time and a label. (a) –

  • output_path ([type]) – path to converted csv

  • json_key (str, optional) – Key to list of annotations. Defaults to “utterances”.

  • label_key (str, optional) – Key to labels that have been annotated. Defaults to “speaker”.

  • feature_csv_path ([type], optional) – Path to feature csv with “timestamps” that will be

  • to create labels that match the features. Defaults to None. If not supplied the timestamps (used) –

  • be generated using the framerate. (will) –

pipeline.annotation.combine_labels_and_features module

pipeline.annotation.combine_labels_and_features.main(label_file, feature_file, num_labels, output)

pipeline.annotation.intervals_to_timeseries module

pipeline.annotation.intervals_to_timeseries.main(interval_file, output, frequency, delimiter)

Module contents