matchzoo.models.cdssm

An implementation of CDSSM (CLSM) model.

Module Contents

Classes

CDSSM

CDSSM Model implementation.

Squeeze

Squeeze.

class matchzoo.models.cdssm.CDSSM(params: typing.Optional[ParamTable] = None)

Bases: matchzoo.engine.base_model.BaseModel

CDSSM Model implementation.

Learning Semantic Representations Using Convolutional Neural Networks for Web Search. (2014a) A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval. (2014b)

Examples

>>> import matchzoo as mz
>>> model = CDSSM()
>>> model.params['task'] = mz.tasks.Ranking()
>>> model.params['vocab_size'] = 4
>>> model.params['filters'] =  32
>>> model.params['kernel_size'] = 3
>>> model.params['conv_activation_func'] = 'relu'
>>> model.build()
classmethod get_default_params(cls) → ParamTable
Returns

model default parameters.

classmethod get_default_preprocessor(cls, truncated_mode: str = 'pre', truncated_length_left: typing.Optional[int] = None, truncated_length_right: typing.Optional[int] = None, filter_mode: str = 'df', filter_low_freq: float = 1, filter_high_freq: float = float('inf'), remove_stop_words: bool = False, ngram_size: typing.Optional[int] = 3) → BasePreprocessor

Model default preprocessor.

The preprocessor’s transform should produce a correctly shaped data pack that can be used for training.

Returns

Default preprocessor.

classmethod get_default_padding_callback(cls, fixed_length_left: int = None, fixed_length_right: int = None, pad_word_value: typing.Union[int, str] = 0, pad_word_mode: str = 'pre', with_ngram: bool = True, fixed_ngram_length: int = None, pad_ngram_value: typing.Union[int, str] = 0, pad_ngram_mode: str = 'pre') → BaseCallback

Model default padding callback.

The padding callback’s on_batch_unpacked would pad a batch of data to a fixed length.

Returns

Default padding callback.

_create_base_network(self) → nn.Module

Apply conv and maxpooling operation towards to each letter-ngram.

The input shape is fixed_text_length`*`number of letter-ngram, as described in the paper, n is 3, number of letter-trigram is about 30,000 according to their observation.

Returns

A nn.Module of CDSSM network, tensor in tensor out.

build(self)

Build model structure.

CDSSM use Siamese architecture.

forward(self, inputs)

Forward.

guess_and_fill_missing_params(self, verbose: int = 1)

Guess and fill missing parameters in params.

Use this method to automatically fill-in hyper parameters. This involves some guessing so the parameter it fills could be wrong. For example, the default task is Ranking, and if we do not set it to Classification manually for data packs prepared for classification, then the shape of the model output and the data will mismatch.

Parameters

verbose – Verbosity.

class matchzoo.models.cdssm.Squeeze

Bases: torch.nn.Module

Squeeze.

forward(self, x)

Forward.