pydrobert.kaldi.io.enums

Kaldi enumerations, including data types and xspecifier types

class pydrobert.kaldi.io.enums.KaldiDataType(value)[source]

Bases: Enum

Enumerates the data types stored and retrieved by Kaldi I/O

This enumerable lists the types of data written and read to various readers and writers. It is used in the factory method pydrobert.kaldi.io.open() to dictate the subclass created.

Notes

The “base float” mentioned in this documentation is the same type as kaldi::BaseFloat, which was determined when Kaldi was built. The easiest way to determine whether this is a double (64-bit) or a float (32-bit) is by checking the value of KaldiDataType.BaseVector.is_double()

Base = 'b'

Inputs/outputs are single base floats

BaseMatrix = 'bm'

Inputs/outputs are 2D numpy arrays of the base float

BasePairVector = 'bpv'

Inputs/outputs are tuples of pairs of the base float

BaseVector = 'bv'

Inputs/outputs are 1D numpy arrays of the base float

Bool = 'B'

Inputs/outputs are single booleans

Double = 'd'

Inputs/outputs are single 64-bit floats

DoubleMatrix = 'dm'

Inputs/outputs are 2D numpy arrays of 64-bit floats

DoubleVector = 'dv'

Inputs/outputs are 1D numpy arrays of 64-bit floats

FloatMatrix = 'fm'

Inputs/outputs are 2D numpy arrays of 32-bit floats

FloatVector = 'fv'

Inputs/outputs are 1D numpy arrays of 32-bit floats

Int32 = 'i'

Inputs/outputs are single 32-bit ints

Int32PairVector = 'ipv'

Inputs/outputs are tuples of pairs of 32-bit ints

Int32Vector = 'iv'

Inputs/outputs are tuples of 32-bit ints

Int32VectorVector = 'ivv'

Inputs/outputs are tuples of tuples of 32-bit ints

Token = 't'

Inputs/outputs are individual whitespace-free ASCII or unicode words

TokenVector = 'tv'

Inputs/outputs are tuples of tokens

WaveMatrix = 'wm'

Inputs/outputs are wave file data, cast to base float 2D arrays

Wave matrices have the shape (n_channels, n_samples). Kaldi will read PCM wave files, but will always convert the samples the base floats.

Though Kaldi can read wave files of different types and sample rates, Kaldi will only write wave files as PCM16 sampled at 16k.

property is_basic

whether data are stored in kaldi with Read/WriteBasicType

Type

bool

property is_double

whether this data type is double precision (64-bit)

Type

bool

property is_floating_point

whether this type has a floating point representation

Type

bool

property is_matrix

whether this type is a numpy matrix type

Type

bool

property is_num_vector

whether this is a numpy vector

Type

bool

class pydrobert.kaldi.io.enums.RxfilenameType(value)[source]

Bases: Enum

The type of stream to read, based on an extended filename

FileInput = 1

Input is from a file on disk with no offset

InvalidInput = 0

An invalid stream

OffsetFileInput = 4

Input is from a file on disk, read from a specific offset

PipedInput = 3

Input is being piped from a command

StandardInput = 2

Input is being piped from stdin

class pydrobert.kaldi.io.enums.TableType(value)[source]

Bases: Enum

The type of table a stream points to

ArchiveTable = 1

The stream points to an archive (keys and values)

BothTables = 3

The stream points simultaneously to a script and archive

This is a special pattern for writing. The archive stores keys and values; the script stores keys and points to the locations in the archive

NotATable = 0

The stream is not a table

ScriptTable = 2

The stream points to a script (keys and extended file names)

class pydrobert.kaldi.io.enums.WxfilenameType(value)[source]

Bases: Enum

The type of stream to write, based on an extended filename

FileOutput = 1

Output to a file on disk

InvalidOutput = 0

An invalid stream

PipedOutput = 3

Output is being piped to some command

StandardOutput = 2

Output is being piped to stdout