matchzoo.preprocessors.units.word_hashing

Module Contents

class matchzoo.preprocessors.units.word_hashing.WordHashing(term_index:dict)

Bases: matchzoo.preprocessors.units.unit.Unit

Word-hashing layer for DSSM-based models.

The input of WordHashingUnit should be a list of word sub-letter list extracted from one document. The output of is the word-hashing representation of this document.

NgramLetterUnit and VocabularyUnit are two essential prerequisite of WordHashingUnit.

Examples

>>> letters = [['#te', 'tes','est', 'st#'], ['oov']]
>>> word_hashing = WordHashing(
...     term_index={
...      '_PAD': 0, 'OOV': 1, 'st#': 2, '#te': 3, 'est': 4, 'tes': 5
...      })
>>> hashing = word_hashing.transform(letters)
>>> hashing[0]
[0.0, 0.0, 1.0, 1.0, 1.0, 1.0]
>>> hashing[1]
[0.0, 1.0, 0.0, 0.0, 0.0, 0.0]
transform(self, input_:list)

Transform list of letters into word hashing layer.

Parameters:input – list of tri_letters generated by NgramLetterUnit.
Returns:Word hashing representation of tri-letters.