pydrobert.kaldi.io.table_streams
Submodule containing table readers and writers
- class pydrobert.kaldi.io.table_streams.KaldiRandomAccessReader(path, kaldi_dtype, utt2spk='')[source]
Bases:
KaldiTable,ContainerRead-only access to values of table by key
KaldiRandomAccessReaderobjects can access values of a table through either theget()method or square bracket access (e.g.a[key]). The presence of a key can be checked with “in” syntax (e.g.key in a). Unlike adict, the extent of aKaldiRandomAccessReaderis not known beforehand, so neither iterators nor length methods are implemented.- Parameters:
path (
str) – An rspecifier to read tables fromkaldi_dtype (
KaldiDataType) – The data type to readutt2spk (
str) – If set, the reader uses utt2spk as a map from utterance ids to speaker ids. The data in path, which are assumed to be referenced by speaker ids, can then be refrenced by utterance. If utt2spk is unspecified, the keys in path are used to query for data.
- class pydrobert.kaldi.io.table_streams.KaldiSequentialReader(path, kaldi_dtype)[source]
Bases:
KaldiTable,IteratorAbstract class for iterating over table entries
KaldiSequentialReaderiterates over key-value pairs. The default behaviour (i.e. that in a for-loop) is to iterate over the values in order of access. Similar todictinstances,items(),values(), andkeys()return iterators over their respective domains. Alternatively, themove()method moves to the next pair, at which point thekey()andvalue()methods can be queried.Though it is possible to mix and match access patterns, all methods refer to the same underlying iterator (the
KaldiSequentialReader)- Parameters:
path (
str) – An rspecifier to read the table fromkaldi_dtype (
KaldiDataType) – The data type to read
- Yields:
objector(str,object)– Values or key, value pairs
- class pydrobert.kaldi.io.table_streams.KaldiTable(path, kaldi_dtype)[source]
Bases:
KaldiIOBaseBase class for interacting with tables
All table readers and writers are subclasses of
KaldiTable. Tables must specify the type of data being read ahead of time- Parameters:
path (
str) – An rspecifier or wspecifierkaldi_dtype (
KaldiDataType) – The type of data type this table contains
- kaldi_dtype
The table’s data type
- Type:
KaldiDataType
- Raises:
IOError – If unable to open table
- class pydrobert.kaldi.io.table_streams.KaldiWriter(path, kaldi_dtype)[source]
Bases:
KaldiTableWrite key-value pairs to tables
- Parameters:
path (
str) – An rspecifier to write the table tokaldi_dtype (
pydrobert.kaldi.io.enums.KaldiDataType) – The data type to write
- pydrobert.kaldi.io.table_streams.open_table_stream(path, kaldi_dtype, mode='r', error_on_str=True, utt2spk='', value_style='b', cache=False)[source]
Factory function to open a kaldi table
This function finds the correct
KaldiTableaccording to the args kaldi_dtype and mode. Specific combinations allow for optional parameters outlined by the table belowmode
kaldi_dtype
additional kwargs
'r''wm'value_style='b''r+'utt2spk='''r+''wm'value_style='b''w''tv'error_on_str=True- Parameters:
path (
str) – The specifier used by kaldi to open the script. Generally these will take the form'{ark|scp}:<path_to_file>', though they can take much more interesting forms (like pipes). More information can be found on the Kaldi websitekaldi_dtype (
KaldiDataType) – The type of data the table is expected to handlemode (
Literal['r','r+','w']) – Specifies the type of access to be performed: read sequential, read random, or write. They are implemented by subclasses ofKaldiSequentialReader,KaldiRandomAccessReader, orKaldiWriter, resp.error_on_str (
bool) – Token vectors ('tv') accept sequences of whitespace-free ASCII/UTF strings. Astris also a sequence of characters, which may satisfy the token requirements. If error_on_str isTrue, aValueErroris raised when writing astras a token vector. Otherwise astrcan be writtenutt2spk (
str) – If set, the reader uses utt2spk as a map from utterance ids to speaker ids. The data in path, which are assumed to be referenced by speaker ids, can then be refrenced by utterance. If utt2spk is unspecified, the keys in path are used to query for datavalue_style (
str) –- Wave readers can provide not only the audio buffer (
'b') of a wave file, but its sampling rate (
's'), and/or duration (in sec,'d'). Setting value_style to some combination of'b','s', and/or'd'will cause the reader to return a tuple of that information. If value_style is only one character, the result will not be contained in a tuple.- cache
Whether to cache all values in a dict as they are retrieved. Only applicable to random access readers. This can be very expensive for large tables and redundant if reading from an archive directly (as opposed to a script).
- Wave readers can provide not only the audio buffer (
- Returns:
table (
KaldiTable) – A table, opened.- Raises:
IOError – On failure to open