Thursday 16 October 2008

Tar files

It's quite likely that if you've been using Linux for any great length of time, you'll have discovered tar files. If you go to a website and download something which is offered as the Linux version of an application, the chances are it will be a tar archive (although some offer separate RPM and deb packages for different distros). Tar files are widely used in Linux, as well as in other Unix-like operating systems.
So what are they? Well, if you've come from a Windows background, you should be familiar with .zip files. These don't usually contain an application, but instead contain a collection of files. The .zip format is simply a convenient way to store them as it rolls them into one file and compresses it, making it ideal for distributing these files across the Internet or storing them.
Well, tar files are Linux and Unix's equivalent. Tar stands for Tape Archive, and the term comes from backing up to a magnetic tape. Like .zip files, tar files are usually used as a convenient and easy way to store or distribute a collection of files. However, it's far more common for tar files to be used to distribute applications than it is for .zip files. If you download a non distro-specific Linux binary, that will usually be packaged as a tar file. Also, if you download source code for an application, you can generally expect it to be packaged as a tar file.
Although it's best to use a deb package in Ubuntu where possible, there are times when you can only get something as a tar package. So being able to deal with tar packages is a necessary Linux skill. In addition, they're ideal for backing up files in case something goes wrong.
Tar files aren't compressed by default. There are two utilities available to compress them: bzip2and gzip. Normally you'll be able to tell which one has been used from the file extensions used. This example is compressed using bzip2:
package.tar.bz2

While this is compressed using gzip
package.tar.gz

That's pretty simple to follow, but remember that unlike Windows, Linux doesn't rely on file extensions to ascertain what a file is in quite the same way, and someone can easily give a package a completely different extension. So you may see variations such as .tgz.

There are plenty of graphical applications available to deal with these packages, but as usual we're going to go the command-line route! For this example I'm going to use the Wordpress software, as this is available as a gzipped tar package, and is a mere 1MB in size.

Click on this link to download the current version of Wordpress. If the package winds up on your desktop, move it to your /home directory to make it more convenient to work with.

Now, to extract the contents of an uncompressed tar archive, you would enter the following:
tar -xf packagename.tar

The x tells tar that you want to extract the files, while the f indicates that the filename for the package follows.

But this won't deal with the compression. To make it work in this case, you need to tell tar to uncompress the archive as well. In order to do so, tar needs to know whether the package uses bzip2 or gzip compression.


In our example, the package uses gzip. To handle this, add a z to the options. So, to uncompress and extract Wordpress, we need to enter the following:
tar -xzf wordpress-2.6.2.tar.gz

This will create a new folder called wordpress, which contains all the files you need to get Wordpress working on your system. But we're not going to do that (at least, not until at least after the end of the lesson!). Instead, we're going to become familiar with the tar command by using it to both create and extract archives.

If you run ls, you'll notice the original archive is still there. Move it to another directory so it's out of the way, but still there in case you need it. Now, we'll use the wordpress folder you just extracted to create a bzipped tar file.

Now, because you're creating an archive rather than extracting one, you don't use the x option. Instead you use the c option. This is easy to remember as it's c to create, x to extract. You need to include the f option as again you need to specify the filename. The difference is, you also need to specify the path to the folder you want to create an archive from. This makes the default something like this:
tar -cf packagename.tar /folder

But as with extracting the archive, you need to add options to compress it. Fortunately, you can used the same option for each type of compression whether you're creating or extracting, so with gzip you would always put a z. For bzip2, you use j. So, to create an archive of the wordpress folder and compress it with bzip2, you'd enter the following:
tar -cjf wordpress.tar.bz2 wordpress

Notice that again, the original folder is still there. Remove it with the following command to get it out of the way:
rm -rf wordpress

Now, let's just extract this again, then create another archive using and you'll be familiar with extracting and creating both gzipped and bzipped tar files. Enter the following to extract your bzipped tar file:
tar -xjf wordpress.tar.bz2

That should extract the file once more as wordpress. Finally, lets turn it into a gzipped tar file:
tar -czf wordpress.tar.gz wordpress

That's it! Now you should be sufficiently savvy with tar files to be able to extract and create them as you wish!
There are many more options available for tar, such as v, which verbosely lists extracted files (so as it extracts files, it lists them), among others. If you want to know more, I suggest you study the man page for tar, using the following command:
man tar

One final note. Ubuntu does include the unzip utility, which enables you to extract Windows .zip files using the following command:
unzip file.zip

So if you need to transfer something across from Windows to Linux, it's fine to zip it up in Windows and extract it in Linux.
Dealing with tar files is necessary for installing any software that isn't specifically packaged for your distro. If you download a tar file for an application, to install it you need to extract the archive and look in the resulting folder for a text file called something like INSTALL or README. You will usually find something that will give instructions on how to install it. For source code, there will usually be instructions on how to compile the application from source.

No comments: