matchzoo.dataloader.callbacks.ngram

Module Contents

class matchzoo.dataloader.callbacks.ngram.Ngram(preprocessor: mz.preprocessors.BasicPreprocessor, mode: str = 'index')

Bases: matchzoo.engine.base_callback.BaseCallback

Generate the character n-gram for data.

Parameters:
  • preprocessor – The fitted BasePreprocessor object, which contains the n-gram units information.
  • mode – It can be one of ‘index’, ‘onehot’, ‘sum’ or ‘aggregate’.

Example

>>> import matchzoo as mz
>>> from matchzoo.dataloader.callbacks import Ngram
>>> data = mz.datasets.toy.load_data()
>>> preprocessor = mz.preprocessors.BasicPreprocessor(ngram_size=3)
>>> data = preprocessor.fit_transform(data)
>>> callback = Ngram(preprocessor=preprocessor, mode='index')
>>> dataset = mz.dataloader.Dataset(
...     data, callbacks=[callback])
>>> _ = dataset[0]
on_batch_unpacked(self, x, y)

Insert ngram_left and ngram_right to x.

matchzoo.dataloader.callbacks.ngram._build_word_ngram_map(ngram_process_unit: mz.preprocessors.units.NgramLetter, ngram_vocab_unit: mz.preprocessors.units.Vocabulary, index_term: dict, mode: str = 'index') → dict

Generate the word to ngram vector mapping.

Parameters:
  • ngram_process_unit – The fitted NgramLetter object.
  • ngram_vocab_unit – The fitted Vocabulary object.
  • index_term – The index to term mapping dict.
  • mode – It be one of ‘index’, ‘onehot’, ‘sum’ or ‘aggregate’.
Returns:

the word to ngram vector mapping.