Php Archive Libs Comparison: Phar Vs Ziparchive Vs Archive_Tar

Hi guys, after this discussion we decided to make some test on archive libraries.

Here are the results:

14.9M TGZ using PharData:
Time: 13.44s
Peak memory usage: 60.5M
14.9M TGZ using Archive_Tar:
Time: 19.31s
Peak memory usage: 4.75M
16.1M ZIP using ZipArchive:
Time: 12.13s
Peak memory usage: 0.5M
44.1M files to TGZ using PharData:
Time: 8/10/15s
Peak memory usage: 56.75M
44.1M files to ZIP using PharData:
Time: 8/10/11s
Peak memory usage: 3.5M
44.1M files to ZIP using ZipArchive:
Time: 7.5/5/6s
Peak memory usage: 1.5M
44.1M files to TGZ using Archive_Tar:
Time: 18.05s
Peak memory usage: 1M
- for zip archives - ZipArchive is most optimal solution
- for tar.gz archives - even though Archive_Tar is slower, it requires not as much memory as PharData does.
How this will affect CS-Cart
In 4.3.5 we plan to add ZipArchive as default library for working with zip archives.
Zip will be default in Backup/Restore so we recommend to use it.

Suggest you try to run the same with a larger data set. I.e. maybe 5GB of files (like a real site with graphics/thumbnails). Please also do this on a 32 bit server.

I think you'll find that Phar will fail completely which is my whole point related to choice of archivers. 44MB of data is not representative of a real site. The standard site once installed is on the order of 350MB. But that too is really a skeletal environment.

A 5GB site would be much more representative of a 1K product site with high density imagery and all thumbnails having been created. Note too that having lots of images will also impact the compression more accurately since it will be trying to compress data that is already compressed (more realistic).

But my real point was that if you had just stuck with using Tar_Archive in the first place you wouldn't have all of these upgrade failures. Phar didn't provide you any functionality (that you used) or performance improvement. So it was yet another change for change's sake. Sometimes just leaving things alone is the best approach. And for archives/backups, stability is of the utmost importance.