matchzoo.utils.get_file
¶
Download file.
Module Contents¶
-
class
matchzoo.utils.get_file.
Progbar
(target, width=30, verbose=1, interval=0.05)¶ Bases:
object
Displays a progress bar.
Parameters: - target – Total number of steps expected, None if unknown.
- width – Progress bar width on screen.
- verbose – Verbosity mode, 0 (silent), 1 (verbose), 2 (semi-verbose)
- stateful_metrics – Iterable of string names of metrics that should not be averaged over time. Metrics in this list will be displayed as-is. All others will be averaged by the progbar before display.
- interval – Minimum visual progress update interval (in seconds).
-
update
(self, current)¶ Updates the progress bar.
-
matchzoo.utils.get_file.
_extract_archive
(file_path, path='.', archive_format='auto')¶ Extracts an archive if it matches tar, tar.gz, tar.bz, or zip formats.
Parameters: - file_path – path to the archive file
- path – path to extract the archive file
- archive_format – Archive format to try for extracting the file. Options are ‘auto’, ‘tar’, ‘zip’, and None. ‘tar’ includes tar, tar.gz, and tar.bz files. The default ‘auto’ is [‘tar’, ‘zip’]. None or an empty list will return no matches found.
Returns: True if a match was found and an archive extraction was completed, False otherwise.
-
matchzoo.utils.get_file.
get_file
(fname:str=None, origin:str=None, untar:bool=False, extract:bool=False, md5_hash:typing.Any=None, file_hash:typing.Any=None, hash_algorithm:str='auto', archive_format:str='auto', cache_subdir:typing.Union[Path, str]='data', cache_dir:typing.Union[Path, str]=matchzoo.USER_DATA_DIR, verbose:int=1) → str¶ Downloads a file from a URL if it not already in the cache.
By default the file at the url origin is downloaded to the cache_dir ~/.matchzoo/datasets, placed in the cache_subdir data, and given the filename fname. The final location of a file example.txt would therefore be ~/.matchzoo/datasets/data/example.txt.
Files in tar, tar.gz, tar.bz, and zip formats can also be extracted. Passing a hash will verify the file after download. The command line programs shasum and sha256sum can compute the hash.
Parameters: - fname – Name of the file. If an absolute path /path/to/file.txt is specified the file will be saved at that location.
- origin – Original URL of the file.
- untar – Deprecated in favor of ‘extract’. Boolean, whether the file should be decompressed.
- md5_hash – Deprecated in favor of ‘file_hash’. md5 hash of the file for verification.
- file_hash – The expected hash string of the file after download. The sha256 and md5 hash algorithms are both supported.
- cache_subdir – Subdirectory under the cache dir where the file is saved. If an absolute path /path/to/folder is specified the file will be saved at that location.
- hash_algorithm – Select the hash algorithm to verify the file. options are ‘md5’, ‘sha256’, and ‘auto’. The default ‘auto’ detects the hash algorithm in use.
- archive_format – Archive format to try for extracting the file. Options are ‘auto’, ‘tar’, ‘zip’, and None. ‘tar’ includes tar, tar.gz, and tar.bz files. The default ‘auto’ is [‘tar’, ‘zip’]. None or an empty list will return no matches found.
- cache_dir – Location to store cached files, when None it defaults to the [matchzoo.USER_DATA_DIR](~/.matchzoo/datasets).
- verbose – Verbosity mode, 0 (silent), 1 (verbose), 2 (semi-verbose)
Papram extract: True tries extracting the file as an Archive, like tar or zip.
Returns: Path to the downloaded file.
-
matchzoo.utils.get_file.
validate_file
(fpath, file_hash, algorithm='auto', chunk_size=65535)¶ Validates a file against a sha256 or md5 hash.
Parameters: - fpath – path to the file being validated
- file_hash – The expected hash string of the file. The sha256 and md5 hash algorithms are both supported.
- algorithm – Hash algorithm, one of ‘auto’, ‘sha256’, or ‘md5’. The default ‘auto’ detects the hash algorithm in use.
- chunk_size – Bytes to read at a time, important for large files.
Returns: Whether the file is valid.
-
matchzoo.utils.get_file.
_hash_file
(fpath, algorithm='sha256', chunk_size=65535)¶ Calculates a file sha256 or md5 hash.
Parameters: - fpath – path to the file being validated
- algorithm – hash algorithm, one of ‘auto’, ‘sha256’, or ‘md5’. The default ‘auto’ detects the hash algorithm in use.
- chunk_size – Bytes to read at a time, important for large files.
Returns: The file hash.