pydrobert.kaldi.io.duck_streams
Submodule for reading and writing one-by-one, like (un)packing c structs
- class pydrobert.kaldi.io.duck_streams.KaldiInput(path, header=True)[source]
Bases:
KaldiIOBaseA kaldi input stream from which objects can be read one at a time
- Parameters:
- close()[source]
Close and flush the underlying IO object
This method has no effect if the file is already closed
- read(kaldi_dtype, value_style='b', read_binary=None)[source]
Read in one object from the stream
- Parameters:
kaldi_dtype (
KaldiDataType) – The type of object to readvalue_style (
Literal['b','s','d']) –'wm'readers can provide not only the audio buffer ('b') of a wave file, but its sampling rate ('s'), and/or duration (in sec,'d'). Setting value_style to some combination of'b','s', and/or'd'will cause the reader to return a tuple of that information. If value_style is only one character, the result will not be contained in a tupleread_binary (
bool, optional) – If set, the object will be read as either binary (True) or text (False). The default behaviour is to read according to the binary attribute. Ignored if there’s only one way to read the data
- class pydrobert.kaldi.io.duck_streams.KaldiOutput(path, header=True)[source]
Bases:
KaldiIOBaseA kaldi output stream from which objects can be written one at a time
- Parameters:
- write(obj, kaldi_dtype=None, error_on_str=True, write_binary=True)[source]
Write one object to the stream
- Parameters:
obj (
Any) – The object to writekaldi_dtype (
Optional[KaldiDataType]) – The type of object to writeerror_on_str (
bool) – Token vectors ('tv') accept sequences of whitespace-free ASCII/UTF strings. Astris also a sequence of characters, which may satisfy the token requirements. If error_on_str isTrue, aValueErroris raised when writing astras a token vector. Otherwise astrcan be writtenwrite_binary (
bool) – The object will be written as binary (True) or text (False)
- Raises:
ValueError – If unable to determine a proper data type
See also
pydrobert.kaldi.io.util.infer_kaldi_data_typeIllustrates how different inputs are mapped to data types
- pydrobert.kaldi.io.duck_streams.open_duck_stream(path, mode='r', header=True)[source]
Open a “duck” stream
“Duck” streams provide an interface for reading or writing kaldi objects, one at a time. Essentially: remember the order things go in, then pull them out in the same order.
Duck streams can read/write binary or text data. It is mostly up to the user how to read or write data, though the following rules establish the default:
An input stream that does not look for a ‘binary header’ is binary
An input stream that looks for and finds a binary header when opening is binary
An input stream that looks for but does not find a binary header when opening is a text stream
An output stream is always binary. However, the user may choose not to write a binary header. The resulting input stream will be considered a text stream when 3. is satisfied
- Parameters:
path (
str) – The extended file name to be opened. This can be quite exotic. More details can be found on the Kaldi website.mode (
Literal['r','r+','w']) – Whether to open the stream for input ('r') or output ('w').'r+'is equivalent to'r'header (
bool) – Setting this toTruewill either check for a ‘binary header’ in an input stream, or write a binary header for an output stream. If False, no check/write is performed