`matchzoo.embedding.embedding`¶

Matchzoo toolkit for token embedding.

Module Contents¶

class matchzoo.embedding.embedding.Embedding(data:dict, output_dim:int)¶

Bases: object

Embedding class.

Examples::

>>> import matchzoo as mz
>>> train_raw = mz.datasets.toy.load_data()
>>> pp = mz.preprocessors.NaivePreprocessor()
>>> train = pp.fit_transform(train_raw, verbose=0)
>>> vocab_unit = mz.build_vocab_unit(train, verbose=0)
>>> term_index = vocab_unit.state['term_index']
>>> embed_path = mz.datasets.embeddings.EMBED_RANK

To load from a file:

>>> embedding = mz.embedding.load_from_file(embed_path)
>>> matrix = embedding.build_matrix(term_index)
>>> matrix.shape[0] == len(term_index)
True

To build your own:

>>> data = {'A':[0, 1], 'B':[2, 3]}
>>> embedding = mz.Embedding(data, 2)
>>> matrix = embedding.build_matrix({'A': 2, 'B': 1, '_PAD': 0})
>>> matrix.shape == (3, 2)
True

build_matrix(self, term_index:typing.Union[dict, mz.preprocessors.units.Vocabulary.TermIndex], initializer=lambda: np.random.uniform(-0.2, 0.2))¶

Build a matrix using term_index.

Parameters:	term_index – A dict or TermIndex to build with. initializer – A callable that returns a default value for missing terms in data. (default: a random uniform distribution in range) (-0.2, 0.2)).
Returns:	A matrix.

matchzoo.embedding.embedding.load_from_file(file_path:str, mode:str='word2vec') → Embedding¶

Load embedding from file_path.

Parameters:	file_path – Path to file. mode – Embedding file format mode, one of ‘word2vec’, ‘fasttext’ or ‘glove’.(default: ‘word2vec’)
Returns:	An `matchzoo.embedding.Embedding` instance.

matchzoo.embedding.embedding¶

Module Contents¶

`matchzoo.embedding.embedding`¶