pydrobert.kaldi.io.table_streams
Submodule containing table readers and writers
- class pydrobert.kaldi.io.table_streams.KaldiRandomAccessReader(path, kaldi_dtype, utt2spk='')[source]
Bases:
KaldiTable
,Container
Read-only access to values of table by key
KaldiRandomAccessReader
objects can access values of a table through either theget()
method or square bracket access (e.g.a[key]
). The presence of a key can be checked with “in” syntax (e.g.key in a
). Unlike adict
, the extent of aKaldiRandomAccessReader
is not known beforehand, so neither iterators nor length methods are implemented.- Parameters
path (
str
) – An rspecifier to read tables fromkaldi_dtype (
KaldiDataType
) – The data type to readutt2spk (
str
) – If set, the reader uses utt2spk as a map from utterance ids to speaker ids. The data in path, which are assumed to be referenced by speaker ids, can then be refrenced by utterance. If utt2spk is unspecified, the keys in path are used to query for data.
- class pydrobert.kaldi.io.table_streams.KaldiSequentialReader(path, kaldi_dtype)[source]
Bases:
KaldiTable
,Iterator
Abstract class for iterating over table entries
KaldiSequentialReader
iterates over key-value pairs. The default behaviour (i.e. that in a for-loop) is to iterate over the values in order of access. Similar todict
instances,items()
,values()
, andkeys()
return iterators over their respective domains. Alternatively, themove()
method moves to the next pair, at which point thekey()
andvalue()
methods can be queried.Though it is possible to mix and match access patterns, all methods refer to the same underlying iterator (the
KaldiSequentialReader
)- Parameters
path (
str
) – An rspecifier to read the table fromkaldi_dtype (
KaldiDataType
) – The data type to read
- Yields
object
or(str
,object)
– Values or key, value pairs
- class pydrobert.kaldi.io.table_streams.KaldiTable(path, kaldi_dtype)[source]
Bases:
KaldiIOBase
Base class for interacting with tables
All table readers and writers are subclasses of
KaldiTable
. Tables must specify the type of data being read ahead of time- Parameters
path (
str
) – An rspecifier or wspecifierkaldi_dtype (
KaldiDataType
) – The type of data type this table contains
- kaldi_dtype
The table’s data type
- Type
KaldiDataType
- Raises
IOError – If unable to open table
- class pydrobert.kaldi.io.table_streams.KaldiWriter(path, kaldi_dtype)[source]
Bases:
KaldiTable
Write key-value pairs to tables
- Parameters
path (
str
) – An rspecifier to write the table tokaldi_dtype (
pydrobert.kaldi.io.enums.KaldiDataType
) – The data type to write
- pydrobert.kaldi.io.table_streams.open_table_stream(path, kaldi_dtype, mode='r', error_on_str=True, utt2spk='', value_style='b', cache=False)[source]
Factory function to open a kaldi table
This function finds the correct
KaldiTable
according to the args kaldi_dtype and mode. Specific combinations allow for optional parameters outlined by the table belowmode
kaldi_dtype
additional kwargs
'r'
'wm'
value_style='b'
'r+'
utt2spk=''
'r+'
'wm'
value_style='b'
'w'
'tv'
error_on_str=True
- Parameters
path (
str
) – The specifier used by kaldi to open the script. Generally these will take the form'{ark|scp}:<path_to_file>'
, though they can take much more interesting forms (like pipes). More information can be found on the Kaldi websitekaldi_dtype (
KaldiDataType
) – The type of data the table is expected to handlemode (
Literal
['r'
,'r+'
,'w'
]) – Specifies the type of access to be performed: read sequential, read random, or write. They are implemented by subclasses ofKaldiSequentialReader
,KaldiRandomAccessReader
, orKaldiWriter
, resp.error_on_str (
bool
) – Token vectors ('tv'
) accept sequences of whitespace-free ASCII/UTF strings. Astr
is also a sequence of characters, which may satisfy the token requirements. If error_on_str isTrue
, aValueError
is raised when writing astr
as a token vector. Otherwise astr
can be writtenutt2spk (
str
) – If set, the reader uses utt2spk as a map from utterance ids to speaker ids. The data in path, which are assumed to be referenced by speaker ids, can then be refrenced by utterance. If utt2spk is unspecified, the keys in path are used to query for datavalue_style (
str
) –- Wave readers can provide not only the audio buffer (
'b'
) of a wave file, but its sampling rate (
's'
), and/or duration (in sec,'d'
). Setting value_style to some combination of'b'
,'s'
, and/or'd'
will cause the reader to return a tuple of that information. If value_style is only one character, the result will not be contained in a tuple.- cache
Whether to cache all values in a dict as they are retrieved. Only applicable to random access readers. This can be very expensive for large tables and redundant if reading from an archive directly (as opposed to a script).
- Wave readers can provide not only the audio buffer (
- Returns
table (
KaldiTable
) – A table, opened.- Raises
IOError – On failure to open