pydrobert.kaldi.io.util

Kaldi I/O utilities

pydrobert.kaldi.io.util.infer_kaldi_data_type(obj)[source]

Infer the appropriate kaldi data type for this object

The following map is used (in order):

Object

KaldiDataType

an int

Int32

a boolean

Bool

a float*

Base

str

Token

2-dim numpy array float32

FloatMatrix

1-dim numpy array float32

FloatVector

2-dim numpy array float64

DoubleMatrix

1-dim numpy array float64

DoubleVector

1-dim numpy array of int32

Int32Vector

2-dim numpy array of int32*

Int32VectorVector

(matrix-like, float or int)

WaveMatrix**

an empty container

BaseMatrix

container of str

TokenVector

1-dim py container of ints

Int32Vector

2-dim py container of ints*

Int32VectorVector

2-dim py container of pairs of floats

BasePairVector

matrix-like python container

DoubleMatrix

vector-like python container

DoubleVector

*The same data types could represent a Double or an Int32PairVector, respectively. Care should be taken in these cases.

**The first element is the wave data, the second its sample frequency. The wave data can be a 2d numpy float array of the same precision as KaldiDataType.BaseMatrix, or a matrix-like python container of floats and/or ints.

Returns

pydrobert.kaldi.io.enums.KaldiDataType or None

pydrobert.kaldi.io.util.parse_kaldi_input_path(path)[source]

Determine the characteristics of an input stream by its path

Returns a 4-tuple of the following information:

  1. If path is not an rspecifier (TableType.NotATable):

    1. Classify path as an rxfilename

    2. return a tuple of (TableType, path, RxfilenameType, dict())

  2. else:

    1. Put all rspecifier options (once, sorted, called_sorted, permissive, background) into a dictionary

    2. Extract the embedded rxfilename and classify it

    3. return a tuple of (TableType, rxfilename, RxfilenameType, options)

Parameters

path (str) – A string that would be passed to pydrobert.kaldi.io.open

pydrobert.kaldi.io.util.parse_kaldi_output_path(path)[source]

Determine the charactersistics of an output stram by its path

Returns a 4-tuple of the following information

  1. If path is not a wspecifier (TableType.NotATable)

    1. Classify path as a wxfilename

    2. return a tuple of (TableType, path, WxfilenameType, dict())

  2. If path is an archive or script

    1. Put all wspecifier options (binary, flush, permissive) into a dictionary

    2. Extract the embedded wxfilename and classify it

    3. return a tuple of (TableType, wxfilename, WxfilenameType, options)

  3. If path contains both an archive and a script (TableType.BothTables)

    1. Put all wspecifier options (binary, flush, permissive) into a dictionary

    2. Extract both embedded wxfilenames and classify them

    3. return a tuple of (TableType, (arch_wxfilename, script_wxfilename), (arch_WxfilenameType, script_WxfilenameType), options)

Parameters

path (str) – A string that would be passed to pydrobert.kaldi.io.open()