External Data Downloader

Fetching binary and large-volume data from Google Drive.

This module provides functionality to download files and directories from Google Drive using URLs. It includes classes for different download scenarios and utility functions to validate files and create directory structures.

Classes:
  • GDriveDirDownloader: Downloads directories from Google Drive.

  • GDriveFileDownloader: Downloads files from Google Drive.

  • GDriveCachedFileDownloader: Downloads cached files from Google Drive.

Functions:
  • file_valid: Validates a file against an MD5 hash value.

  • create_directory_tree: Ensures the directory structure is present.

  • download_from_url: Downloads data from a URL.

Usage Example:

from google_drive_downloader import GDriveFileDownloader, download_from_url
url = "https://drive.google.com/..."
output_path = pathlib.Path("/path/to/save/file")
downloader = GDriveFileDownloader()
download_from_url(url, output_path, downloader)
class reemission.download.GDriveCachedFileDownloader(quiet: bool = False, md5: Any | None = None, extract: bool = False)[source]

URL downloader for cached files using gdown’s cached_download functionality.

__call__(url: str, output_path: Path | str) None[source]

Download cached file from Google Drive using a URL to output destination (path).

Parameters:
  • url (str) – URL of the Google Drive file to download.

  • output_path (Union[pathlib.Path, str]) – Path to save the downloaded file.

extract: bool = False
md5: Any | None = None
quiet: bool = False
class reemission.download.GDriveDirDownloader(quiet: bool = False, remaining_ok: bool = True)[source]

URL downloader using gdown’s download_folder functionality. Automatically overwrites whatever is already available.

Attention

Does not check checksums during download.

__call__(url: str, output_path: Path | str) None[source]

Download directory from Google Drive using a URL (shared link).

Parameters:
  • url (str) – URL of the Google Drive folder to download.

  • output_path (Union[pathlib.Path, str]) – Path to save the downloaded folder.

quiet: bool = False
remaining_ok: bool = True
class reemission.download.GDriveFileDownloader(quiet: bool = False, fuzzy: bool = True, resume: bool = False)[source]

URL downloader for files using gdown’s download functionality.

__call__(url: str, output_path: Path | str) None[source]

Download file from Google Drive using a URL to output destination (path).

Parameters:
  • url (str) – URL of the Google Drive file to download.

  • output_path (Union[pathlib.Path, str]) – Path to save the downloaded file.

fuzzy: bool = True
quiet: bool = False
resume: bool = False
class reemission.download.URL_Downloader(*args, **kwargs)[source]

Protocol for URL downloader.

__call__(url: str, output_path: Path | str) None[source]

Download data from URL to output destination (path).

Parameters:
  • url (str) – URL to download data from.

  • output_path (Union[pathlib.Path, str]) – Path to save the downloaded data.

reemission.download.create_directory_tree(path: Path, verbose: bool = True) None[source]

Ensures the directory structure is present. If it is not, creates the directory path from the first directory (counting from top) that is absent.

Parameters:
  • path (pathlib.Path) – Path of the directory to create.

  • verbose (bool) – If True, logs the directory creation process. Default is True.

reemission.download.download_from_url(url: str, output_path: Path, downloader: URL_Downloader, update: bool = True, relative_path: bool = False, checksum: Any | None = None, verbose: bool = False, post_checksum_check: bool = False) None[source]

Download data from a URL.

Parameters:
  • url (str) – URL pointing to data, e.g., share link from Google Drive.

  • output_path (pathlib.Path) – Directory/file relative to package root directory.

  • downloader (URL_Downloader) – Downloader instance to use for downloading.

  • update (bool) – Updates old data when checksum does not match. Default is True.

  • relative_path (bool) – If True, the path provided will be relative to the package root folder. Default is False.

  • checksum (Optional[Any]) – If given, validate the file against the MD5 sum.

  • verbose (bool) – If True, print more detailed output. Default is False.

  • post_checksum_check (bool) – Check the checksum once again, if given. Default is False.

reemission.download.file_valid(file_path: str | Path, valid_hash: str, chunk_size: int = 4) bool[source]

Validates a file against an MD5 hash value.

Parameters:
  • file_path (Union[str, pathlib.Path]) – Path to the file for hash validation.

  • valid_hash (str) – MD5 sum to validate the file against.

  • chunk_size (int) – Size of chunks to read the file. Default is 4.

Returns:

True if the file’s hash matches the valid_hash, False otherwise.

Return type:

bool