matchzoo.preprocessors.units.ngram_letter

Module Contents

class matchzoo.preprocessors.units.ngram_letter.NgramLetter(ngram:int=3, reduce_dim:bool=True)

Bases: matchzoo.preprocessors.units.unit.Unit

Process unit for n-letter generation.

Triletter is used in DSSMModel. This processor is expected to execute before Vocab has been created.

Examples

>>> triletter = NgramLetter()
>>> rv = triletter.transform(['hello', 'word'])
>>> len(rv)
9
>>> rv
['#he', 'hel', 'ell', 'llo', 'lo#', '#wo', 'wor', 'ord', 'rd#']
>>> triletter = NgramLetter(reduce_dim=False)
>>> rv = triletter.transform(['hello', 'word'])
>>> len(rv)
2
>>> rv
[['#he', 'hel', 'ell', 'llo', 'lo#'], ['#wo', 'wor', 'ord', 'rd#']]
transform(self, input_:list)

Transform token into tri-letter.

For example, word should be represented as #wo, wor, ord and rd#.

Parameters:input – list of tokens to be transformed.
Return n_letters:
 generated n_letters.