0

LFCS filesystem & storage part 1

Welcome to post 20 of my 100 day challenge. Checkout my introduction for some background.

This is post five of my LFCS series. This post is the first part of file system and storage. In it I will be discussing how to archive and compress data using various compression algorithms

You can go back to the overview post for a brief introduction or take a look at post one for instructions on setting up the exam practice system which I will be using throughout this series. For the posts regarding the Linux Command Line see posts 2, 3 and 4.

LFCS filesystem & storage part 1

Archiving and compressing data

Archiving

tar is an archiving utility it stands for tape archive. Originally it was used to archive files and folders to tape. The software is useful as it allows you to tar up a directory and all sub files and folders into a single file. The end result is known as a tarball.

Example

Create a directory called archive_test in your home directory and create in it the following files:

archive_test/
├── file1.txt
├── file2.txt
├── file3.txt
└── test.sh

test.sh contains a simple echo command:

#!/bin/bash
echo "Hello World!"

Use tar to create an archive of archive_test:

[root@centospractice ~]# tar -cf archive_test.tar archive_test

Then use the file utility to see what file type archive_test.tar is:

[root@centospractice ~]# file archive_test.tar
archive_test.tar: POSIX tar archive (GNU)

You can see the contents of a tar archive with the -t or –-list option. This will output a list of the archives contents:

tar -tf archive_test.tar
[root@centospractice ~]# tar -tf archive_test.tar
archive_test/
archive_test/file1.txt
archive_test/file3.txt
archive_test/test.sh
archive_test/file2.txt

To test that this archive works we can rename the original directory or move it somewhere else and then extract archive_test and confirm the contents of the created directory are as the original:

[root@centospractice ~]# mv archive_test old_archive_test
[root@centospractice ~]# tar -xf archive_test.tar
[root@centospractice ~]# tree archive_test
archive_test
├── file1.txt
├── file2.txt
├── file3.txt
└── test.sh

Compressing

You can also compress the tar archive to reduce size. A utility called gzip can be used which will reduce the size of the files using the Lempel-Ziv coding (LZ77) algorithm. This can be applied to the tar archive after it has been created. You can perform both archival and compression with the gzip utility, performing both functions at the same time.

[root@centospractice ~]# ls -l |grep archive_test.tar.gz
-rw-r--r--. 1 root root 259 Apr 25 22:34 archive_test.tar.gz

You can use gunzip to uncompress the file leaving the original .tar archive:

[root@centospractice ~]# ls -l |grep archive_test.tar
-rw-r--r--. 1 root root 10240 Apr 25 22:34 archive_test.tar

A better compression method is to use the bzip2 compression utility, this is generally considered to be better than gzip.

[root@centospractice ~]# bzip2 archive_test.tar
[root@centospractice ~]# ls -l |grep archive_test.tar
-rw-r--r--. 1 root root 261 Apr 25 22:34 archive_test.tar.bz2

Unzip is done via bunzip2:

[root@centospractice ~]# bunzip2 archive_test.tar.bz2
[root@centospractice ~]# ls -l|grep archive_test.tar
-rw-r--r--. 1 root root 10240 Apr 25 22:34 archive_test.tar

Rather than do the archival and compression as a two step process the tar utility has the ability to do this in one step if you pass in the correct switch.

To gzip pass in the z switch:

[root@centospractice ~]# tar -czf archive_test.tar.gz archive_test
[root@centospractice ~]# ls -l|grep archive_test.tar
-rw-r--r--. 1 root root 242 Apr 25 23:32 archive_test.tar.gz
[root@centospractice ~]# file archive_test.tar.gz
archive_test.tar.gz: gzip compressed data, from Unix, last modified: Sat Apr 25 23:32:34 2015

To bzip2 pass in the j switch

[root@centospractice ~]# tar -cjf archive_test.tar.bz2 archive_test/
[root@centospractice ~]# ls -l|grep archive_test.tar
-rw-r--r--. 1 root root 261 Apr 25 23:34 archive_test.tar.bz2
[root@centospractice ~]# file archive_test.tar.bz2
archive_test.tar.bz2: bzip2 compressed data, block size = 900k

If you require speed over size reduction it is best to use gzip as bzip2 takes many times longer to perform the compression. Gzip is also quite light on system resource usage.  The same goes with decompression gzip is faster.

You can see the relative size of the archive_test folder using different compression algorithms below:

[root@centospractice ~]# ls -l|grep archive_test.tar
-rw-r--r--. 1 root root 10240 Apr 25 23:37 archive_test.tar
-rw-r--r--. 1 root root 261 Apr 25 23:34 archive_test.tar.bz2
-rw-r--r--. 1 root root 242 Apr 25 23:37 archive_test.tar.gz

Tune in tomorrow for the second part of my revision article on LFCS Filesystem & storage where we will discuss what Logical Volume Management is and how to use it.

Subscribe to my feed either by E-mail or by RSS to receive updates as they happen.

Can you improve on any of the tips I’ve discussed here? If you can let me know in the comments.

Jason Edwards