pydrobert.kaldi.eval.util

Utilities for evaluation

pydrobert.kaldi.eval.util.edit_distance(ref, hyp, insertion_cost=1, deletion_cost=1, substitution_cost=1, return_tables=False)[source]

Levenshtein (edit) distance

Parameters
  • ref (Sequence) – Sequence of tokens of reference text (source)

  • hyp (Sequence) – Sequence of tokens of hypothesis text (target)

  • insertion_cost (int) – Penalty for hyp inserting a token to ref

  • deletion_cost (int) – Penalty for hyp deleting a token from ref

  • substitution_cost (int) – Penalty for hyp swapping tokens in ref

  • return_tables (bool) – See below

Returns

distances (int or (int, dict, dict, dict, dict)) – Returns the edit distance of hyp from ref. If return_tables is True, this returns a tuple of the edit distance, a dict of insertion counts, a dict of deletion , a dict of substitution counts per ref token, and a dict of counts of ref tokens. Any tokens with count 0 are excluded from the dictionary.