matchzoo.utils.get_file
¶
Download file.
Module Contents¶
Functions¶
|
Extracts an archive if it matches tar, tar.gz, tar.bz, or zip formats. |
|
Downloads a file from a URL if it not already in the cache. |
|
Validates a file against a sha256 or md5 hash. |
|
Calculates a file sha256 or md5 hash. |
-
class
matchzoo.utils.get_file.
Progbar
(target, width=30, verbose=1, interval=0.05)¶ Bases:
object
Displays a progress bar.
- Parameters
target – Total number of steps expected, None if unknown.
width – Progress bar width on screen.
verbose – Verbosity mode, 0 (silent), 1 (verbose), 2 (semi-verbose)
stateful_metrics – Iterable of string names of metrics that should not be averaged over time. Metrics in this list will be displayed as-is. All others will be averaged by the progbar before display.
interval – Minimum visual progress update interval (in seconds).
-
update
(self, current)¶ Updates the progress bar.
-
matchzoo.utils.get_file.
_extract_archive
(file_path, path='.', archive_format='auto')¶ Extracts an archive if it matches tar, tar.gz, tar.bz, or zip formats.
- Parameters
file_path – path to the archive file
path – path to extract the archive file
archive_format – Archive format to try for extracting the file. Options are ‘auto’, ‘tar’, ‘zip’, and None. ‘tar’ includes tar, tar.gz, and tar.bz files. The default ‘auto’ is [‘tar’, ‘zip’]. None or an empty list will return no matches found.
- Returns
True if a match was found and an archive extraction was completed, False otherwise.
-
matchzoo.utils.get_file.
get_file
(fname: str = None, origin: str = None, untar: bool = False, extract: bool = False, md5_hash: typing.Any = None, file_hash: typing.Any = None, hash_algorithm: str = 'auto', archive_format: str = 'auto', cache_subdir: typing.Union[Path, str] = 'data', cache_dir: typing.Union[Path, str] = matchzoo.USER_DATA_DIR, verbose: int = 1) → str¶ Downloads a file from a URL if it not already in the cache.
By default the file at the url origin is downloaded to the cache_dir ~/.matchzoo/datasets, placed in the cache_subdir data, and given the filename fname. The final location of a file example.txt would therefore be ~/.matchzoo/datasets/data/example.txt.
Files in tar, tar.gz, tar.bz, and zip formats can also be extracted. Passing a hash will verify the file after download. The command line programs shasum and sha256sum can compute the hash.
- Parameters
fname – Name of the file. If an absolute path /path/to/file.txt is specified the file will be saved at that location.
origin – Original URL of the file.
untar – Deprecated in favor of ‘extract’. Boolean, whether the file should be decompressed.
md5_hash – Deprecated in favor of ‘file_hash’. md5 hash of the file for verification.
file_hash – The expected hash string of the file after download. The sha256 and md5 hash algorithms are both supported.
cache_subdir – Subdirectory under the cache dir where the file is saved. If an absolute path /path/to/folder is specified the file will be saved at that location.
hash_algorithm – Select the hash algorithm to verify the file. options are ‘md5’, ‘sha256’, and ‘auto’. The default ‘auto’ detects the hash algorithm in use.
archive_format – Archive format to try for extracting the file. Options are ‘auto’, ‘tar’, ‘zip’, and None. ‘tar’ includes tar, tar.gz, and tar.bz files. The default ‘auto’ is [‘tar’, ‘zip’]. None or an empty list will return no matches found.
cache_dir – Location to store cached files, when None it defaults to the [matchzoo.USER_DATA_DIR](~/.matchzoo/datasets).
verbose – Verbosity mode, 0 (silent), 1 (verbose), 2 (semi-verbose)
- Papram extract
True tries extracting the file as an Archive, like tar or zip.
- Returns
Path to the downloaded file.
-
matchzoo.utils.get_file.
validate_file
(fpath, file_hash, algorithm='auto', chunk_size=65535)¶ Validates a file against a sha256 or md5 hash.
- Parameters
fpath – path to the file being validated
file_hash – The expected hash string of the file. The sha256 and md5 hash algorithms are both supported.
algorithm – Hash algorithm, one of ‘auto’, ‘sha256’, or ‘md5’. The default ‘auto’ detects the hash algorithm in use.
chunk_size – Bytes to read at a time, important for large files.
- Returns
Whether the file is valid.
-
matchzoo.utils.get_file.
_hash_file
(fpath, algorithm='sha256', chunk_size=65535)¶ Calculates a file sha256 or md5 hash.
- Parameters
fpath – path to the file being validated
algorithm – hash algorithm, one of ‘auto’, ‘sha256’, or ‘md5’. The default ‘auto’ detects the hash algorithm in use.
chunk_size – Bytes to read at a time, important for large files.
- Returns
The file hash.