matchzoo.dataloader.dataset
¶
A basic class representing a Dataset.
Module Contents¶
-
class
matchzoo.dataloader.dataset.
Dataset
(data_pack:mz.DataPack, mode='point', num_dup:int=1, num_neg:int=1, callbacks:typing.List[BaseCallback]=None)¶ Bases:
torch.utils.data.Dataset
Dataset that is built from a data pack.
Parameters: - data_pack – DataPack to build the dataset.
- mode – One of “point”, “pair”, and “list”. (default: “point”)
- num_dup – Number of duplications per instance, only effective when mode is “pair”. (default: 1)
- num_neg – Number of negative samples per instance, only effective when mode is “pair”. (default: 1)
- callbacks – Callbacks. See matchzoo.data_generator.callbacks for more details.
Examples
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data(stage='train') >>> preprocessor = mz.preprocessors.CDSSMPreprocessor() >>> data_processed = preprocessor.fit_transform(data_pack) >>> dataset_point = mz.dataloader.Dataset(data_processed, mode='point') >>> len(dataset_point) 100 >>> dataset_pair = mz.dataloader.Dataset( ... data_processed, mode='pair', num_neg=2) >>> len(dataset_pair) 5
-
data_pack
¶ data_pack getter.
-
callbacks
¶ callbacks getter.
-
num_neg
¶ num_neg getter.
-
num_dup
¶ num_dup getter.
-
mode
¶ mode getter.
-
index_pool
¶ index_pool getter.
-
__len__
(self)¶ Get the total number of instances.
-
__getitem__
(self, item:int)¶ Get a set of instances from index idx.
Parameters: item – the index of the instance.
-
_handle_callbacks_on_batch_data_pack
(self, batch_data_pack)¶
-
_handle_callbacks_on_batch_unpacked
(self, x, y)¶
-
get_index_pool
(self)¶ Set the:attr:_index_pool.
Here the
_index_pool
records the index of all the instances.
-
sample
(self)¶ Resample the instances from data pack.
-
shuffle
(self)¶ Shuffle the instances.
-
sort
(self)¶ Sort the instances by length_right.
-
classmethod
_reorganize_pair_wise
(cls, relation:pd.DataFrame, num_dup:int=1, num_neg:int=1)¶ Re-organize the data pack as pair-wise format.