pydrobert.kaldi.io.table_streams

Submodule containing table readers and writers

class pydrobert.kaldi.io.table_streams.KaldiRandomAccessReader(path, kaldi_dtype, utt2spk='')[source]

Bases: KaldiTable, Container

Read-only access to values of table by key

KaldiRandomAccessReader objects can access values of a table through either the get() method or square bracket access (e.g. a[key]). The presence of a key can be checked with “in” syntax (e.g. key in a). Unlike a dict, the extent of a KaldiRandomAccessReader is not known beforehand, so neither iterators nor length methods are implemented.

Parameters
  • path (str) – An rspecifier to read tables from

  • kaldi_dtype (KaldiDataType) – The data type to read

  • utt2spk (str) – If set, the reader uses utt2spk as a map from utterance ids to speaker ids. The data in path, which are assumed to be referenced by speaker ids, can then be refrenced by utterance. If utt2spk is unspecified, the keys in path are used to query for data.

utt2spk

The path to the map from utterance ids to speaker ids, if set

Type

str or None

get(key, default=None)[source]
Raises

IOError – If closed

readable()[source]

Return whether this object was opened for reading

writable()[source]

Return whether this object was opened for writing

class pydrobert.kaldi.io.table_streams.KaldiSequentialReader(path, kaldi_dtype)[source]

Bases: KaldiTable, Iterator

Abstract class for iterating over table entries

KaldiSequentialReader iterates over key-value pairs. The default behaviour (i.e. that in a for-loop) is to iterate over the values in order of access. Similar to dict instances, items(), values(), and keys() return iterators over their respective domains. Alternatively, the move() method moves to the next pair, at which point the key() and value() methods can be queried.

Though it is possible to mix and match access patterns, all methods refer to the same underlying iterator (the KaldiSequentialReader)

Parameters
  • path (str) – An rspecifier to read the table from

  • kaldi_dtype (KaldiDataType) – The data type to read

Yields

object or (str, object) – Values or key, value pairs

abstract done()[source]

bool: True when closed or pairs are exhausted

items()[source]

Returns iterator over key, value pairs

abstract key()[source]

return current pair’s key, or None if done

Raises

IOError – If closed

keys()[source]

Returns iterator over keys

abstract move()[source]

Move iterator forward

Returns

moved (bool) – True if moved to new pair. False if done

Raises

IOError – If closed

readable()[source]

Return whether this object was opened for reading

abstract value()[source]

return current pair’s value, or None if done

Raises

IOError – If closed

values()[source]

Returns iterator over values

writable()[source]

Return whether this object was opened for writing

class pydrobert.kaldi.io.table_streams.KaldiTable(path, kaldi_dtype)[source]

Bases: KaldiIOBase

Base class for interacting with tables

All table readers and writers are subclasses of KaldiTable. Tables must specify the type of data being read ahead of time

Parameters
  • path (str) – An rspecifier or wspecifier

  • kaldi_dtype (KaldiDataType) – The type of data type this table contains

kaldi_dtype

The table’s data type

Type

KaldiDataType

Raises

IOError – If unable to open table

class pydrobert.kaldi.io.table_streams.KaldiWriter(path, kaldi_dtype)[source]

Bases: KaldiTable

Write key-value pairs to tables

Parameters
readable()[source]

Return whether this object was opened for reading

writable()[source]

Return whether this object was opened for writing

abstract write(key, value)[source]

Write key value pair

Parameters
  • key (str) –

  • value (Any) –

Notes

For Kaldi’s table writers, pairs are written in order without backtracking. Uniqueness is not checked.

pydrobert.kaldi.io.table_streams.open_table_stream(path, kaldi_dtype, mode='r', error_on_str=True, utt2spk='', value_style='b', cache=False)[source]

Factory function to open a kaldi table

This function finds the correct KaldiTable according to the args kaldi_dtype and mode. Specific combinations allow for optional parameters outlined by the table below

mode

kaldi_dtype

additional kwargs

'r'

'wm'

value_style='b'

'r+'

utt2spk=''

'r+'

'wm'

value_style='b'

'w'

'tv'

error_on_str=True

Parameters
  • path (str) – The specifier used by kaldi to open the script. Generally these will take the form '{ark|scp}:<path_to_file>', though they can take much more interesting forms (like pipes). More information can be found on the Kaldi website

  • kaldi_dtype (KaldiDataType) – The type of data the table is expected to handle

  • mode (Literal['r', 'r+', 'w']) – Specifies the type of access to be performed: read sequential, read random, or write. They are implemented by subclasses of KaldiSequentialReader, KaldiRandomAccessReader, or KaldiWriter, resp.

  • error_on_str (bool) – Token vectors ('tv') accept sequences of whitespace-free ASCII/UTF strings. A str is also a sequence of characters, which may satisfy the token requirements. If error_on_str is True, a ValueError is raised when writing a str as a token vector. Otherwise a str can be written

  • utt2spk (str) – If set, the reader uses utt2spk as a map from utterance ids to speaker ids. The data in path, which are assumed to be referenced by speaker ids, can then be refrenced by utterance. If utt2spk is unspecified, the keys in path are used to query for data

  • value_style (str) –

    Wave readers can provide not only the audio buffer ('b') of a wave file, but

    its sampling rate ('s'), and/or duration (in sec, 'd'). Setting value_style to some combination of 'b', 's', and/or 'd' will cause the reader to return a tuple of that information. If value_style is only one character, the result will not be contained in a tuple.

    cache

    Whether to cache all values in a dict as they are retrieved. Only applicable to random access readers. This can be very expensive for large tables and redundant if reading from an archive directly (as opposed to a script).

Returns

table (KaldiTable) – A table, opened.

Raises

IOError – On failure to open