matchzoo.preprocessors.units.character_index

Module Contents

class matchzoo.preprocessors.units.character_index.CharacterIndex(char_index:dict)

Bases: matchzoo.preprocessors.units.unit.Unit

CharacterIndexUnit for DIIN model.

The input of :class:’CharacterIndexUnit’ should be a list of word character list extracted from a text. The output is the character index representation of this text.

NgramLetterUnit and VocabularyUnit are two essential prerequisite of CharacterIndexUnit.

Examples

>>> input_ = [['#', 'a', '#'],['#', 'o', 'n', 'e', '#']]
>>> character_index = CharacterIndex(
...     char_index={
...      '<PAD>': 0, '<OOV>': 1, 'a': 2, 'n': 3, 'e':4, '#':5})
>>> index = character_index.transform(input_)
>>> index
[[5, 2, 5], [5, 1, 3, 4, 5]]
transform(self, input_:list)

Transform list of characters to corresponding indices.

Parameters:input – list of characters generated by :class:’NgramLetterUnit’.
Returns:character index representation of a text.