Problem: 100,000 small files equate to 100,000 PUT requests.
A person could zip them all up and have
only 1 file to PUT less PUTs into SAFE storage. But, if that creates a ZIP file over 1GB in size, extracting 1 file out of that archive for use likely involves pulling that entire 1GB file down from the internet in order to find the “file index” saved at the end of the ZIP file.
I propose making a copy of the content-index as a separate file. It would be important to duplicate the ZIP file’s header (with the whole archive’s CRC-32) for validation, and then the central directory records would be duplicated in this external file. Maybe a file extension of *.netzip would be good for this index file. This pair of files would usually have the same filename and be kept together (when using a logical file path - good for traditional servers, or the psuedo-drive that a MAID disk is showing to a computer)
- A small index-only file could be downloaded quickly, and it would have references to the byte-locations of the files within the archive that a user is actually wanting.
- If the SAFE Network is implemented in such a way that in-file read-pointers could be managed, or in similar fashion to RESUME-able download requests that ask for 300 KB starting at Byte 85,000 of a 1GB file, we could reduce unneeded data transmission over the Network.
This 2-file archive can also reduce the number of PUTs that a user would need to pay for when storing archived data (which may ease the wide adoption of the SAFE Network as a storage platform for everyday users).Each 1MB chunk is a PUT
The ZIP may be less total MB (less PUTs) and slack-space of less-than-1MB files is reduced (similar to a file’s size-on-disk being affected by sector size)
- ZIP is a good candidate format to start with because changed files are traditionally added to the end of the archive without purging the old contents - just rewriting the index to point to the new start location, much like multi-session writable CDs.
- If downloaded, the full ZIP file (without its external index file) would still function as a traditional ZIP file.
- This type of archive file would be best for large quantities of small filesize files that are read and not changed.
- If the zip contents were changed, the external index would need to be updated accordingly.
Please note, this should be considered a draft. I’m hoping to work out the details along with all of you in the community. Thanks!