Add the implementation for downloading mnist#1383
Conversation
Add the implementation for downloading mnist
|
Add the implementation for downloading mnist |
|
Add the implementation for downloading mnist |
|
|
||
| def check_exist_or_download(url): | ||
|
|
||
| download_dir = '/tmp/' # downloaded to the /tmp/ folder |
There was a problem hiding this comment.
download_dir is currently hard-coded to /tmp/, but the PEFT MNIST loader defaults to /tmp/mnist and expects the dataset files under that directory:
- Default dataset path: https://github.com/eyumboo/singa/blob/dev-postgresql/examples/singa_peft/examples/data/mnist.py#L85
- File lookup logic assumes
/tmp/mnist: https://github.com/eyumboo/singa/blob/dev-postgresql/examples/singa_peft/examples/data/mnist.py#L38
In addition, the example runner explicitly passes -dir /tmp/mnist:
https://github.com/eyumboo/singa/blob/dev-postgresql/examples/singa_peft/examples/run.sh#L21
Because of this mismatch, even after running the new downloader, training fails since it looks for files like /tmp/mnist/train-images-idx3-ubyte.gz, which are not present.
It would be better if the downloader:
- Writes directly to
/tmp/mnist(and creates the directory if it doesn’t exist) - Ideally supports the same
--dir-pathargument used by the training pipeline for consistency
| print("Downloading %s" % url) | ||
| urllib.request.urlretrieve(url, filename) | ||
| else: | ||
| print("Already Downloaded: %s" % url) |
There was a problem hiding this comment.
Suggestion: improve download logging clarity
The current download log is helpful, but it would be even better if we also print the resolved local path (or target directory) where files are stored. This makes it easier to debug storage issues and verify where artifacts end up, especially since the download location may depend on defaults or environment variables (e.g. /tmp/mnist).
In addition, it would improve clarity to include the local filename/path in both download states.
Proposed changes:
if not os.path.isfile(filename):
print("Downloading %s -> %s" % (url, filename))
urllib.request.urlretrieve(url, filename)
else:
print("Already Downloaded: %s -> %s" % (url, filename))
This PR adds download_mnist.py to examples/singa_peft/examples/data.