First, let me say that this quick intro will generalize on tar and a couple compression options. No lvm or disk snapshots, rysnc, librsync or any other sometimes similar choices will be compared here. However, there are many situations when you will want to incorporate those other choices into a bigger strategy. This information will shed some light on a couple of opportunities that are present either directly or indirectly that may need to be thought about depending on your situation when using tar.
Generally, when you just need a quick archive the standard tar command will just be:
tar cf /path/to/archive.tar /path/to/source
That's good. It will work, but many will compress the archive with either gzip (z) or bzip2 (j) options:
tar czf /path/to/archive.tar /path/to/source
tar cjf /path/to/archive.tar /path/to/source
The space required to create the archive is normally reduced if file is not already compressed or of a compressed type like avi's or mp3 etc. That's often better but there are issues when dealing with larger sized files or directories:
- tar compression extends the time required to hold open file(s)
- tar compression extends the time required to complete the archive
- tar compression tends to create files slightly larger than post compressed files
- tar compression is limited to single processor utilization
machines". With pbzip2 (optionally) all of the systems' processors can be put to use at the same time. The archive requiring 2 hours to bzip2 can take as little as 30 minutes on a idle single quad core system.
Naturally, the trade off will be an increase in the free disk space required to complete the process. A trade off will be the need for increased free disk space. As a general minimum you will need at least 1.5 times the size of the files to be archived in order to complete the process.
This is a small scale example with a modest 1.5GB directory. The directory has data base SQL unload files. The system has 2 older quad core CPU's and a fairly fast disk subsystem along with 16GB RAM.
testdir = 1603076072 bytes or 1.5GB
time tar cf test.tar testdir
size 1603164160 bytes or about 1.5GB
time tar cjf test.tar.bz2 testdir
size 216820944 bytes = 207M
time tar czf test.tar.gz testdir
size 282025065 bytes = 269M
time pbzip2 test.tar
size 217235869 = 208M
time bzip2 test.tar
size 216820491 = 207M
Combining a normal tar file with pbzip2 provides about 22% greater compression than gzip in less time for this test. For some situations, tar + pbzip2 is a great combination. I just wanted to share ;)