pydrobert.kaldi.io.duck_streams
Submodule for reading and writing one-by-one, like (un)packing c structs
- class pydrobert.kaldi.io.duck_streams.KaldiInput(path, header=True)[source]
Bases:
KaldiIOBase
A kaldi input stream from which objects can be read one at a time
- Parameters
- close()[source]
Close and flush the underlying IO object
This method has no effect if the file is already closed
- read(kaldi_dtype, value_style='b', read_binary=None)[source]
Read in one object from the stream
- Parameters
kaldi_dtype (
KaldiDataType
) – The type of object to readvalue_style (
Literal
['b'
,'s'
,'d'
]) –'wm'
readers can provide not only the audio buffer ('b'
) of a wave file, but its sampling rate ('s'
), and/or duration (in sec,'d'
). Setting value_style to some combination of'b'
,'s'
, and/or'd'
will cause the reader to return a tuple of that information. If value_style is only one character, the result will not be contained in a tupleread_binary (
bool
, optional) – If set, the object will be read as either binary (True
) or text (False
). The default behaviour is to read according to the binary attribute. Ignored if there’s only one way to read the data
- class pydrobert.kaldi.io.duck_streams.KaldiOutput(path, header=True)[source]
Bases:
KaldiIOBase
A kaldi output stream from which objects can be written one at a time
- Parameters
- write(obj, kaldi_dtype=None, error_on_str=True, write_binary=True)[source]
Write one object to the stream
- Parameters
obj (
Any
) – The object to writekaldi_dtype (
Optional
[KaldiDataType
]) – The type of object to writeerror_on_str (
bool
) – Token vectors ('tv'
) accept sequences of whitespace-free ASCII/UTF strings. Astr
is also a sequence of characters, which may satisfy the token requirements. If error_on_str isTrue
, aValueError
is raised when writing astr
as a token vector. Otherwise astr
can be writtenwrite_binary (
bool
) – The object will be written as binary (True
) or text (False
)
- Raises
ValueError – If unable to determine a proper data type
See also
pydrobert.kaldi.io.util.infer_kaldi_data_type
Illustrates how different inputs are mapped to data types
- pydrobert.kaldi.io.duck_streams.open_duck_stream(path, mode='r', header=True)[source]
Open a “duck” stream
“Duck” streams provide an interface for reading or writing kaldi objects, one at a time. Essentially: remember the order things go in, then pull them out in the same order.
Duck streams can read/write binary or text data. It is mostly up to the user how to read or write data, though the following rules establish the default:
An input stream that does not look for a ‘binary header’ is binary
An input stream that looks for and finds a binary header when opening is binary
An input stream that looks for but does not find a binary header when opening is a text stream
An output stream is always binary. However, the user may choose not to write a binary header. The resulting input stream will be considered a text stream when 3. is satisfied
- Parameters
path (
str
) – The extended file name to be opened. This can be quite exotic. More details can be found on the Kaldi website.mode (
Literal
['r'
,'r+'
,'w'
]) – Whether to open the stream for input ('r'
) or output ('w'
).'r+'
is equivalent to'r'
header (
bool
) – Setting this toTrue
will either check for a ‘binary header’ in an input stream, or write a binary header for an output stream. If False, no check/write is performed