pydrobert.kaldi.io
Interfaces for Kaldi’s readers and writers
This subpackage contains a factory function, open()
, which is intended to behave
similarly to python’s built-in open()
factory. open()
gives the specifics
behind Kaldi’s different read/write styles. Here, they are described in a general way.
Kaldi’s streams can be very exotic, including regular files, file offsets, stdin/out, and pipes.
Data can be read/written from a binary or text stream in the usual way: specific data types have specific encodings, and data are packed/unpacked in that fashion. While an appropriate style for a fixed sequence of data, variables sequences of data are encoded using the table analogy.
Kaldi uses the table analogy to store and retrieve indexed data. In a nutshell, Kaldi uses archive (“ark”) files to store binary or text data, and script files (“scp”) to point into archives. Both use whitespace- free strings as keys. Scripts and archives do not have any built-in type checking, so it is necessary to specify the input/output type when the files are opened.
A full account of Kaldi IO can be found on Kaldi’s website under Kaldi I/O Mechanisms.
See also
pydrobert.kaldi.io.enums.KaldiDataType
For more information on the types of streams that can be read or written
- class pydrobert.kaldi.io.KaldiIOBase(path)[source]
Bases:
object
IOBase for kaldi readers and writers
Similar to
io.IOBase
, but without a lot of the assumed functionality.- Parameters
path (
str
) – The path passed to “func:pydrobert.kaldi.io.open. One of an rspecifier, wspecifier, rxfilename, or wxfilename
- path
The opened path
- table_type
The type of table that’s being read/written (or
NotATable
)
- xfilenames
The extended file names being read/written. For tables, this excludes the
'ark:'
and'scp:'
prefixes from path. Usually there will be only one extended file name, unless the path uses the special'ark,scp:'
format to write both an archive and script at the same time
- xtypes
The type of extended file name opened. Usually there will be only one extended file name, unless the path uses the special
'ark,scp:'
format to write both an archive and script at the same time
- binary
Whether this stream encodes binary data (or text)
- closed
Whether this stream is closed
- permissive
Whether invalid values will be treated as non-existent (tables only)
- once
Whether each entry will only be read once (readable tables only)
- sorted
Whether keys are sorted (readable tables only)
- called_sorted
Whether entries will be read in sorted order (readable tables only)
- background
Whether reading is not being performed on the main thread (readable tables only)
- flush
Whether the stream is flushed after each write operation (writable tables only)
- pydrobert.kaldi.io.open(path, kaldi_dtype=None, mode='r', error_on_str=True, utt2spk='', value_style='b', header=True, cache=False)[source]
Factory function for initializing and opening kaldi streams
This function provides a general interface for opening kaldi streams. Kaldi streams are either simple input/output of kaldi objects (the basic/duck stream) or key-value readers and writers (tables).
When path starts with
'ark:'
or'scp:'
(possibly with modifiers before the colon), a table is opened. Otherwise, a basic stream is opened.See also
pydrobert.kaldi.io.table_streams.open_table_stream
For information on opening tables
pydrobert.kaldi.io.duck_streams.open_duck_stream
For information on opening basic streams