matchzoo.preprocessors.units.tokenize

Module Contents

Classes

Tokenize

Process unit for text tokenization.

class matchzoo.preprocessors.units.tokenize.Tokenize

Bases: matchzoo.preprocessors.units.unit.Unit

Process unit for text tokenization.

transform(self, input_: str) → list

Process input data from raw terms to list of tokens.

Parameters

input – raw textual input.

Return tokens

tokenized tokens as a list.