Skip to content

Add the implementation for downloading mnist#1383

Open
eyumboo wants to merge 3 commits into
apache:dev-postgresqlfrom
eyumboo:dev-postgresql
Open

Add the implementation for downloading mnist#1383
eyumboo wants to merge 3 commits into
apache:dev-postgresqlfrom
eyumboo:dev-postgresql

Conversation

@eyumboo

@eyumboo eyumboo commented Jul 2, 2026

Copy link
Copy Markdown

This PR adds download_mnist.py to examples/singa_peft/examples/data.

Add the implementation for downloading mnist
@eyumboo eyumboo closed this Jul 2, 2026
@eyumboo eyumboo deleted the dev-postgresql branch July 2, 2026 03:37
@eyumboo eyumboo restored the dev-postgresql branch July 2, 2026 03:37
@eyumboo

eyumboo commented Jul 2, 2026

Copy link
Copy Markdown
Author

Add the implementation for downloading mnist

@eyumboo eyumboo reopened this Jul 2, 2026
@eyumboo

eyumboo commented Jul 2, 2026

Copy link
Copy Markdown
Author

Add the implementation for downloading mnist

Comment thread examples/singa_peft/examples/data/download_mnist.py Outdated

def check_exist_or_download(url):

download_dir = '/tmp/' # downloaded to the /tmp/ folder

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

download_dir is currently hard-coded to /tmp/, but the PEFT MNIST loader defaults to /tmp/mnist and expects the dataset files under that directory:

In addition, the example runner explicitly passes -dir /tmp/mnist:
https://github.com/eyumboo/singa/blob/dev-postgresql/examples/singa_peft/examples/run.sh#L21

Because of this mismatch, even after running the new downloader, training fails since it looks for files like /tmp/mnist/train-images-idx3-ubyte.gz, which are not present.

It would be better if the downloader:

  • Writes directly to /tmp/mnist (and creates the directory if it doesn’t exist)
  • Ideally supports the same --dir-path argument used by the training pipeline for consistency

print("Downloading %s" % url)
urllib.request.urlretrieve(url, filename)
else:
print("Already Downloaded: %s" % url)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: improve download logging clarity

The current download log is helpful, but it would be even better if we also print the resolved local path (or target directory) where files are stored. This makes it easier to debug storage issues and verify where artifacts end up, especially since the download location may depend on defaults or environment variables (e.g. /tmp/mnist).

In addition, it would improve clarity to include the local filename/path in both download states.

Proposed changes:

if not os.path.isfile(filename):
    print("Downloading %s -> %s" % (url, filename))
    urllib.request.urlretrieve(url, filename)
else:
    print("Already Downloaded: %s -> %s" % (url, filename))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants