matchzoo.preprocessors.units.character_index
¶
Module Contents¶
Classes¶
CharacterIndexUnit for DIIN model. |
-
class
matchzoo.preprocessors.units.character_index.
CharacterIndex
(char_index: dict)¶ Bases:
matchzoo.preprocessors.units.unit.Unit
CharacterIndexUnit for DIIN model.
The input of :class:’CharacterIndexUnit’ should be a list of word character list extracted from a text. The output is the character index representation of this text.
NgramLetterUnit
andVocabularyUnit
are two essential prerequisite ofCharacterIndexUnit
.Examples
>>> input_ = [['#', 'a', '#'],['#', 'o', 'n', 'e', '#']] >>> character_index = CharacterIndex( ... char_index={ ... '<PAD>': 0, '<OOV>': 1, 'a': 2, 'n': 3, 'e':4, '#':5}) >>> index = character_index.transform(input_) >>> index [[5, 2, 5], [5, 1, 3, 4, 5]]
-
transform
(self, input_: list) → list¶ Transform list of characters to corresponding indices.
- Parameters
input – list of characters generated by :class:’NgramLetterUnit’.
- Returns
character index representation of a text.
-