How to Use the Gzip Command in Linux

How to Use the Gzip Command in Linux

Gzip is a popular file compression and decompression utility. It is supported by almost all the Linux distributions and it is available for most of the UNIX/Linux flavors.

The gzip file format is based on the Deflate algorithm, which is a variation of LZ77.

The Deflate algorithm is a compression algorithm. It takes a block of data and compresses it. The compression is done by finding repeated strings of data and replacing them with a reference to the previous occurrence of the string.

Objectives

In this tutorial we’ll cover the basics of how to use the gzip command.

Prerequisites

  • Access to a Linux machine (I’ll use an Ubuntu 20.04 Virtual Machine)
  • Acting as a non-root sudo user to ensure a secure environment

What is Compression?

Compression is a method of reducing the size of one or more files. Compression is always performed using an algorithm, which takes in a source file and generates an output file of a certain type.

What is Gzip?

Gzip is the short form of GNU zip, which is a compression tool or utility. Most of us are used to the Windows/macOS operating system. In the Linux environment, Gzip is a tool equivalent to popular Windows tools, such as Winzip, WinRAR, 7-Zip, or the Archive Utility on macOS. All these tools are used for compressing and decompressing files.

Each of these tools uses a particular file extension. For example, Winzip uses .zip as the default extension. In the same fashion, Gzip uses the .gz extension in the Linux environment.

We can use Gzip to compress different types of files, such as documents, webpages, tar files, text files, and many more.

Certain file types, such as PDF or audio files like mp3, are already in compressed format. Even if we create a .gz file, the size of the compressed file is not reduced.

Gzip Help (gzip -h)

Just like any other Linux tool or command, we can get help on Gzip by running the following command:

gzip -h

The output displays a series of parameters or switches that can be used with Gzip.

Output
Usage: gzip [OPTION]... [FILE]...
Compress or uncompress FILEs (by default, compress FILES in-place).

Mandatory arguments to long options are mandatory for short options too.

  -c,  --stdout      write on standard output, keep original files unchanged
  -d,  --decompress  decompress
  -f,  --force       force overwrite of output file and compress links
  -h,  --help        give this help
  -k,  --keep        keep (don't delete) input files
  -l,  --list        list compressed file contents
  -L,  --license     display software license
  -n,  --no-name     do not save or restore the original name and timestamp
  -N,  --name        save or restore the original name and timestamp
  -q,  --quiet       suppress all warnings
  -r,  --recursive   operate recursively on directories
       --rsyncable   make rsync-friendly archive
  -S,  --suffix=SUF  use suffix SUF on compressed files
       --synchronous synchronous output (safer if system crashes, but slower)
  -t,  --test        test compressed file integrity
  -v,  --verbose     verbose mode
  -V,  --version     display version number
  -1,  --fast        compress faster
  -9,  --best        compress better

With no FILE, or when FILE is -, read standard input.

Report bugs to <[email protected]>.

Check Gzip Version (gzip -V or gzip --version)

We will use some of the parameters as we advance, but before that, let’s execute a command to see the version of Gzip:

gzip -V

It is important to note that it is a capital V. You can also use an alternate parameter:

gzip  --version

With either of the commands, we get the same output.

gzip 1.10
Copyright (C) 2018 Free Software Foundation, Inc.
Copyright (C) 1993 Jean-loup Gailly.
This is free software.  You may redistribute copies of it under the terms of
the GNU General Public License <https://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.

Written by Jean-loup Gailly.

Compressing a Single File

Compress and Delete Input (Original) File

By default, when Gzip compresses a file, it will delete the input (original) file.

Compressing a file with Gzip is quite simple. We need to provide the source file name.

gzip <source_filename>

When we run the command, a new file is created with the .gz extension.

In my example I have downloaded a large text file, The Project Gutenberg EBook of The Adventures of Sherlock Holmes by Sir Arthur Conan Doyle, hosted at https://norvig.com/big.txt

ls -l  --block-size=MB

Looks like it has about 7 megabytes in it’s decompressed state.

Output
total 7MB
-rw-r--r-- 1 edxd edxd 7MB Apr 22 2019 adventures_of_sherlock_holmes.txt

Now let’s gzip it:

gzip adventures_of_sherlock_holmes.txt

Let’s quickly do a directory listing and see what we’ve got:

ls -l  --block-size=MB

Notice that a file with the same name is created but with the .gz extension. So, the file that we now have is adventures_of_sherlock_holmes.txt.gz, and the original .txt file is gone.

Additionally, the file size is now compress and has a file size of about 3 megabytes

Output
total 3MB
-rw-r--r-- 1 edxd edxd 3MB Apr 22 2019 adventures_of_sherlock_holmes.txt.gz

Compress and Keep Input (Original) File (gzip -k)

You can keep the original file when creating gzip files by using the -k parameter:

gzip -k adventures_of_sherlock_holmes.txt

And when we list the files, we’ll se we have both the original .txt and the .gz file:

Notice that both files, source and Gzip file, are now present. The source file is not deleted or removed this time.

Compressing Multiple Files (Into Multiple .gz Files)

Let’s assume that we have several files, which can be text files, documents, or any other type of files that can be compressed. Rather than compressing one file at a time, we can compress multiple files at once. For each file that is compressed, an individual Gzip file is created. For example, if we compress three files, we will have three Gzip files as the output. Gzip does not compress them into a single file.

Let’s quickly list the files with their respective details using the following command:

ls -l  --block-size=MB

Notice that we have files with the names 1.txt, 2.txt, and 3.txt.

Output
total 48MB
-rw-r--r-- 1 edxd edxd 26MB Aug  8 22:24 1.txt
-rw-r--r-- 1 edxd edxd 13MB Aug  8 22:24 2.txt
-rw-r--r-- 1 edxd edxd 10MB Aug  8 22:24 3.txt

We will compress all of them at once with the following command:

gzip 1.txt 2.txt 3.txt

As usual, the gzip command does not return any output. We once run the command to list the files:

ls -l  --block-size=MB

Notice that we have files with the names 1.txt.gz, 2.txt.gz, and 3.txt.gz.

Output
total 1MB
-rw-r--r-- 1 edxd edxd 1MB Aug  8 22:24 1.txt.gz
-rw-r--r-- 1 edxd edxd 1MB Aug  8 22:24 2.txt.gz
-rw-r--r-- 1 edxd edxd 1MB Aug  8 22:24 3.txt.g

Recursively Compress All Files in the Current Directory (-r *)

We can compress all files at once in the current directory. It is like compressing multiple files at once but with a difference, which is that it does not require each file to be named. We can use * (asterisk) with the -r parameter. Before we do that, let’s check out the files in the current directory:

ls -lr *  --block-size=MB
Output
-rw-r--r-- 1 edxd edxd 10MB Aug  8 22:55 3.txt
-rw-r--r-- 1 edxd edxd 13MB Aug  8 22:55 2.txt
-rw-r--r-- 1 edxd edxd 13MB Aug  9 03:35 1.txt

folder:
total 43MB
-rw-r--r-- 1 edxd edxd 25MB Aug  8 22:55 5.txt
-rw-r--r-- 1 edxd edxd 19MB Aug  8 22:55 4.txt

Notice that we have the 1.txt, 2.txt, 3.txt files, and a directory called folder which contains 4.txt and 5.txt

Let’s now check out how to gzip recursively:

gzip -r *

The -r parameter is used for recursive compression on all files and subdirectories within the current directory. Gzip will compress each .txt file into a separate .gz file, even the ones inside the folder directory.

After we run this command, it is time again to list the files in the current directory:

ls -lr *  --block-size=MB

As mentioned before, we have the 1.gz, 2.gz, 3.gz, and the directory called folder containing 4.gz, and 5.gz.

Output
-rw-r--r-- 1 edxd edxd 1MB Aug  8 22:24 3.txt.gz
-rw-r--r-- 1 edxd edxd 1MB Aug  8 22:24 2.txt.gz
-rw-r--r-- 1 edxd edxd 1MB Aug  8 22:24 1.txt.gz

folder:
total 1MB
-rw-r--r-- 1 edxd edxd 1MB Aug  8 22:31 5.txt.gz
-rw-r--r-- 1 edxd edxd 1MB Aug  8 22:31 4.txt.gz

Add Multiple Files Into a Single Gzip File (Without Compression)

What if we want to add all files into a single Gzip file? Yes, that is also possible but not directly with Gzip.

We need to redirect the output of the cat command into the compressed file.

Let’s look at how this can be done:

cat 1.txt 2.txt 3.txt > 123.gz

We are sending the output of these files to the 123.gz file. After we execute the command, we can list the files:

ls -lr *  --block-size=MB

There are three things worth noticing here.

  1. First, we have retained the source files. Remember that these were being read using the cat command and, therefore, were not removed.
  2. Second, the output file is created with the 123.gz name.
  3. Third, the 123.gz file isn’t compressed.

Recursively Compress Multiple Files Into a Single tar.gz Archive (tar -czvf)

Gzip can only compress one file. To compress multiple files into a single archive, we’ll have to use the Tar command, which can also compress using gzip.

To do this we can run:

tar -czvf allfiles.tar.gz *
Output (because -v means verbose)
1.txt
2.txt
3.txt
folder/
folder/5.txt
folder/4.txt

The -v options means that we want tar to describe the processed files, that’s why we’re seeing output.

Let’s check out the files:

ls -lr *  --block-size=MB
Output
-rw-r--r-- 1 edxd edxd  1MB Aug  8 22:57 allfiles.tar.gz
-rw-r--r-- 1 edxd edxd 10MB Aug  8 22:55 3.txt
-rw-r--r-- 1 edxd edxd 13MB Aug  8 22:55 2.txt
-rw-r--r-- 1 edxd edxd 36MB Aug  8 22:54 1.txt

folder:
total 43MB
-rw-r--r-- 1 edxd edxd 25MB Aug  8 22:55 5.txt
-rw-r--r-- 1 edxd edxd 19MB Aug  8 22:55 4.txt

As you can see, the allfiles.tar.gz has the file size of 1MB, so it was compressed.

Let’s explain the -czvf options:

  • ccreate new archive. Recursion is enabled by default, as per the man pages:

    -c,  --create
        Create a new archive. Arguments supply the names of the files to be archived. Directories are archived recursively, unless the --no-recursion option is given.
  • z – gzip
  • vverbose (optional; describes the files added to the archive)
  • f – output file (and the next argument will be the output file, in our case allfiles.tar.gz)

Viewing the Compression Information (gzip -l)

After we compress a file, we can view details of the compressed file with the following command:

gzip -l allfiles.tar.gz

The output displays certain parameters, such as compressed, uncompressed, ratio, and uncompressed name.

Output
compressed        uncompressed  ratio uncompressed_name
     98118           100669440  99.9% allfiles.tar

Decompressing Files (gzip -d)

I’ll be using a .gz file I have already, called afile.txt.gz. Here’s how it looks when I list my files:

-rw-r--r-- 1 edxd edxd 1MB Aug  8 22:54 afile.txt.gz

To decompress the file, we can use two different commands. The first one is the gzip command with the -d parameter:

gzip -d afile.txt.gz

The -d parameter is used for decompressing the specified file. After we execute the command, it does not return any output. Let’s now list the files in the current directory:

ls -l --block-size=MB
Output
-rw-r--r-- 1 edxd edxd 36MB Aug  8 22:54 afile.txt

As we can see, decompressing a file deletes the .gz file but restores the compressed file.

Decompressing Using Gunzip

We can also use the gunzip command to decompress the files. The gunzip command does not require any parameter but the compressed filename, and it has the same result as gzip -d:

gunzip afile.txt.gz

Conclusion

Well done. Hopefully, this tutorial helped you understand the basics of the gzip command. If you encountered any issues, please feel free to leave a comment or contact us, and we’ll get back to we as soon as we can.

0 Shares:
Subscribe
Notify of
guest
Receive notifications when your comment receives a reply. (Optional)
Your username will link to your website. (Optional)

0 Comments
Inline Feedbacks
View all comments
You May Also Like
Bash Until Loop
Read More

Bash Until Loop

Loop is a fundamental concept in computer programming languages. This concept can be used in bash scripts as…