pydrobert.kaldi.io.util

Kaldi I/O utilities

pydrobert.kaldi.io.util.infer_kaldi_data_type(obj)[source]

Infer the appropriate kaldi data type for this object

The following map is used (in order):

Object	KaldiDataType
an int	Int32
a boolean	Bool
a float*	Base
str	Token
2-dim numpy array float32	FloatMatrix
1-dim numpy array float32	FloatVector
2-dim numpy array float64	DoubleMatrix
1-dim numpy array float64	DoubleVector
1-dim numpy array of int32	Int32Vector
2-dim numpy array of int32*	Int32VectorVector
(matrix-like, float or int)	WaveMatrix**
an empty container	BaseMatrix
container of str	TokenVector
1-dim py container of ints	Int32Vector
2-dim py container of ints*	Int32VectorVector
2-dim py container of pairs of floats	BasePairVector
matrix-like python container	DoubleMatrix
vector-like python container	DoubleVector

*The same data types could represent a Double or an Int32PairVector, respectively. Care should be taken in these cases.

**The first element is the wave data, the second its sample frequency. The wave data can be a 2d numpy float array of the same precision as KaldiDataType.BaseMatrix, or a matrix-like python container of floats and/or ints.

Returns: pydrobert.kaldi.io.enums.KaldiDataType or None

pydrobert.kaldi.io.util.parse_kaldi_input_path(path)[source]

Determine the characteristics of an input stream by its path

Returns a 4-tuple of the following information:

If path is not an rspecifier (TableType.NotATable):
1. Classify path as an rxfilename
2. return a tuple of (TableType, path, RxfilenameType, dict())
else:
1. Put all rspecifier options (once, sorted, called_sorted, permissive, background) into a dictionary
2. Extract the embedded rxfilename and classify it
3. return a tuple of (TableType, rxfilename, RxfilenameType, options)

Parameters: path (str) – A string that would be passed to pydrobert.kaldi.io.open

pydrobert.kaldi.io.util.parse_kaldi_output_path(path)[source]

Determine the charactersistics of an output stram by its path

Returns a 4-tuple of the following information

If path is not a wspecifier (TableType.NotATable)
1. Classify path as a wxfilename
2. return a tuple of (TableType, path, WxfilenameType, dict())
If path is an archive or script
1. Put all wspecifier options (binary, flush, permissive) into a dictionary
2. Extract the embedded wxfilename and classify it
3. return a tuple of (TableType, wxfilename, WxfilenameType, options)
If path contains both an archive and a script (TableType.BothTables)
1. Put all wspecifier options (binary, flush, permissive) into a dictionary
2. Extract both embedded wxfilenames and classify them
3. return a tuple of (TableType, (arch_wxfilename, script_wxfilename), (arch_WxfilenameType, script_WxfilenameType), options)

Parameters: path (str) – A string that would be passed to pydrobert.kaldi.io.open()