pydrobert.kaldi.io

Interfaces for Kaldi’s readers and writers

This subpackage contains a factory function, open(), which is intended to behave similarly to python’s built-in open() factory. open() gives the specifics behind Kaldi’s different read/write styles. Here, they are described in a general way.

Kaldi’s streams can be very exotic, including regular files, file offsets, stdin/out, and pipes.

Data can be read/written from a binary or text stream in the usual way: specific data types have specific encodings, and data are packed/unpacked in that fashion. While an appropriate style for a fixed sequence of data, variables sequences of data are encoded using the table analogy.

Kaldi uses the table analogy to store and retrieve indexed data. In a nutshell, Kaldi uses archive (“ark”) files to store binary or text data, and script files (“scp”) to point into archives. Both use whitespace- free strings as keys. Scripts and archives do not have any built-in type checking, so it is necessary to specify the input/output type when the files are opened.

A full account of Kaldi IO can be found on Kaldi’s website under Kaldi I/O Mechanisms.

See also

pydrobert.kaldi.io.enums.KaldiDataType

For more information on the types of streams that can be read or written

class pydrobert.kaldi.io.KaldiIOBase(path)[source]

Bases: object

IOBase for kaldi readers and writers

Similar to io.IOBase, but without a lot of the assumed functionality.

Parameters

path (str) – The path passed to “func:pydrobert.kaldi.io.open. One of an rspecifier, wspecifier, rxfilename, or wxfilename

path

The opened path

table_type

The type of table that’s being read/written (or NotATable)

xfilenames

The extended file names being read/written. For tables, this excludes the 'ark:' and 'scp:' prefixes from path. Usually there will be only one extended file name, unless the path uses the special 'ark,scp:' format to write both an archive and script at the same time

xtypes

The type of extended file name opened. Usually there will be only one extended file name, unless the path uses the special 'ark,scp:' format to write both an archive and script at the same time

binary

Whether this stream encodes binary data (or text)

closed

Whether this stream is closed

permissive

Whether invalid values will be treated as non-existent (tables only)

once

Whether each entry will only be read once (readable tables only)

sorted

Whether keys are sorted (readable tables only)

called_sorted

Whether entries will be read in sorted order (readable tables only)

background

Whether reading is not being performed on the main thread (readable tables only)

flush

Whether the stream is flushed after each write operation (writable tables only)

abstract close()[source]

Close and flush the underlying IO object

This method has no effect if the file is already closed

abstract readable()[source]

Return whether this object was opened for reading

abstract writable()[source]

Return whether this object was opened for writing

pydrobert.kaldi.io.open(path, kaldi_dtype=None, mode='r', error_on_str=True, utt2spk='', value_style='b', header=True, cache=False)[source]

Factory function for initializing and opening kaldi streams

This function provides a general interface for opening kaldi streams. Kaldi streams are either simple input/output of kaldi objects (the basic/duck stream) or key-value readers and writers (tables).

When path starts with 'ark:' or 'scp:' (possibly with modifiers before the colon), a table is opened. Otherwise, a basic stream is opened.

See also

pydrobert.kaldi.io.table_streams.open_table_stream

For information on opening tables

pydrobert.kaldi.io.duck_streams.open_duck_stream

For information on opening basic streams