`matchzoo.dataloader.dataset`¶

A basic class representing a Dataset.

Module Contents¶

class matchzoo.dataloader.dataset.Dataset(data_pack:mz.DataPack, mode='point', num_dup:int=1, num_neg:int=1, callbacks:typing.List[BaseCallback]=None)¶

Bases: torch.utils.data.Dataset

Dataset that is built from a data pack.

Parameters:

data_pack – DataPack to build the dataset.
mode – One of “point”, “pair”, and “list”. (default: “point”)
num_dup – Number of duplications per instance, only effective when mode is “pair”. (default: 1)
num_neg – Number of negative samples per instance, only effective when mode is “pair”. (default: 1)
callbacks – Callbacks. See matchzoo.data_generator.callbacks for more details.

Examples

>>> import matchzoo as mz
>>> data_pack = mz.datasets.toy.load_data(stage='train')
>>> preprocessor = mz.preprocessors.CDSSMPreprocessor()
>>> data_processed = preprocessor.fit_transform(data_pack)
>>> dataset_point = mz.dataloader.Dataset(data_processed, mode='point')
>>> len(dataset_point)
100
>>> dataset_pair = mz.dataloader.Dataset(
...     data_processed, mode='pair', num_neg=2)
>>> len(dataset_pair)
5

data_pack¶: data_pack getter.

callbacks¶: callbacks getter.

num_neg¶: num_neg getter.

num_dup¶: num_dup getter.

mode¶: mode getter.

index_pool¶: index_pool getter.

__len__(self)¶: Get the total number of instances.

__getitem__(self, item:int)¶

Get a set of instances from index idx.

Parameters:	item – the index of the instance.

_handle_callbacks_on_batch_data_pack(self, batch_data_pack)¶

_handle_callbacks_on_batch_unpacked(self, x, y)¶

get_index_pool(self)¶

Set the:attr:_index_pool.

Here the _index_pool records the index of all the instances.

sample(self)¶: Resample the instances from data pack.

shuffle(self)¶: Shuffle the instances.

sort(self)¶: Sort the instances by length_right.

classmethod _reorganize_pair_wise(cls, relation:pd.DataFrame, num_dup:int=1, num_neg:int=1)¶: Re-organize the data pack as pair-wise format.

matchzoo.dataloader.dataset¶

Module Contents¶

`matchzoo.dataloader.dataset`¶