matchzoo.dataloader.dataloader
¶
Basic data loader.
Module Contents¶
-
class
matchzoo.dataloader.dataloader.
DataLoader
(dataset:data.Dataset, batch_size:int=32, device:typing.Optional[torch.device]=None, stage='train', resample:bool=True, shuffle:bool=False, sort:bool=True, callback:BaseCallback=None, pin_memory:bool=False, timeout:int=0, num_workers:int=0, worker_init_fn=None)¶ Bases:
object
DataLoader that loads batches of data from a Dataset.
Parameters: - dataset – The Dataset object to load data from.
- batch_size – Batch_size. (default: 32)
- device – An instance of torch.device specifying which device the Variables are going to be created on.
- stage – One of “train”, “dev”, and “test”. (default: “train”)
- resample – Whether to resample data between epochs. only effective when mode of dataset is “pair”. (default: True)
- shuffle – Whether to shuffle data between epochs. (default: False)
- sort – Whether to sort data according to length_right. (default: True)
- callback – BaseCallback. See matchzoo.engine.base_callback.BaseCallback for more details.
- pin_momory – If set to True, tensors will be copied into pinned memory. (default: False)
- timeout – The timeout value for collecting a batch from workers. ( default: 0)
- num_workers – The number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
- worker_init_fn – If not
None
, this will be called on each worker subprocess with the worker id (an int in [0, num_workers - 1]) as input, after seeding and before data loading. (default: None)
Examples
>>> import matchzoo as mz >>> data_pack = mz.datasets.toy.load_data(stage='train') >>> preprocessor = mz.preprocessors.CDSSMPreprocessor() >>> data_processed = preprocessor.fit_transform(data_pack) >>> dataset = mz.dataloader.Dataset(data_processed, mode='point') >>> padding_callback = mz.dataloader.callbacks.CDSSMPadding() >>> dataloader = mz.dataloader.DataLoader( ... dataset, stage='train', callback=padding_callback) >>> len(dataloader) 4
-
id_left
¶ id_left getter.
-
label
¶ label getter.
-
__len__
(self)¶ Get the total number of batches.
-
init_epoch
(self)¶ Resample, shuffle or sort the dataset for a new epoch.
-
__iter__
(self)¶ Iteration.
-
_handle_callbacks_on_batch_unpacked
(self, x, y)¶
-
matchzoo.dataloader.dataloader.
mz_collate
(batch)¶ Put each data field into an array with outer dimension batch size.