Gzip is a popular file compression and decompression utility. It is supported by almost all the Linux distributions and it is available for most of the UNIX/Linux flavors.
The Deflate algorithm is a compression algorithm. It takes a block of data and compresses it. The compression is done by finding repeated strings of data and replacing them with a reference to the previous occurrence of the string.
Objectives
In this tutorial we’ll cover the basics of how to use the gzip
command.
Table of Contents
- Objectives
- Prerequisites
- What is Compression?
- What is Gzip?
- Gzip Help (gzip -h)
- Check Gzip Version (gzip -V or gzip --version)
- Compressing a Single File
- Compressing Multiple Files (Into Multiple .gz Files)
- Recursively Compress All Files in the Current Directory (-r *)
- Add Multiple Files Into a Single Gzip File (Without Compression)
- Recursively Compress Multiple Files Into a Single tar.gz Archive (tar -czvf)
- Viewing the Compression Information (gzip -l)
- Decompressing Files (gzip -d)
- Conclusion
Prerequisites
- Access to a Linux machine (I’ll use an Ubuntu 20.04 Virtual Machine)
- Acting as a non-root sudo user to ensure a secure environment
What is Compression?
Compression is a method of reducing the size of one or more files. Compression is always performed using an algorithm, which takes in a source file and generates an output file of a certain type.
What is Gzip?
Gzip is the short form of GNU zip, which is a compression tool or utility. Most of us are used to the Windows/macOS operating system. In the Linux environment, Gzip is a tool equivalent to popular Windows tools, such as Winzip, WinRAR, 7-Zip, or the Archive Utility on macOS. All these tools are used for compressing and decompressing files.
Each of these tools uses a particular file extension. For example, Winzip uses .zip as the default extension. In the same fashion, Gzip uses the .gz extension in the Linux environment.
We can use Gzip to compress different types of files, such as documents, webpages, tar files, text files, and many more.
Gzip Help (gzip -h)
Just like any other Linux tool or command, we can get help on Gzip by running the following command:
gzip -h
The output displays a series of parameters or switches that can be used with Gzip.
Usage: gzip [OPTION]... [FILE]... Compress or uncompress FILEs (by default, compress FILES in-place). Mandatory arguments to long options are mandatory for short options too. -c, --stdout write on standard output, keep original files unchanged -d, --decompress decompress -f, --force force overwrite of output file and compress links -h, --help give this help -k, --keep keep (don't delete) input files -l, --list list compressed file contents -L, --license display software license -n, --no-name do not save or restore the original name and timestamp -N, --name save or restore the original name and timestamp -q, --quiet suppress all warnings -r, --recursive operate recursively on directories --rsyncable make rsync-friendly archive -S, --suffix=SUF use suffix SUF on compressed files --synchronous synchronous output (safer if system crashes, but slower) -t, --test test compressed file integrity -v, --verbose verbose mode -V, --version display version number -1, --fast compress faster -9, --best compress better With no FILE, or when FILE is -, read standard input. Report bugs to <[email protected]>.
Check Gzip Version (gzip -V or gzip --version)
We will use some of the parameters as we advance, but before that, let’s execute a command to see the version of Gzip:
gzip -V
It is important to note that it is a capital V. You can also use an alternate parameter:
gzip --version
With either of the commands, we get the same output.
gzip 1.10 Copyright (C) 2018 Free Software Foundation, Inc. Copyright (C) 1993 Jean-loup Gailly. This is free software. You may redistribute copies of it under the terms of the GNU General Public License <https://www.gnu.org/licenses/gpl.html>. There is NO WARRANTY, to the extent permitted by law. Written by Jean-loup Gailly.
Compressing a Single File
Compress and Delete Input (Original) File
By default, when Gzip compresses a file, it will delete the input (original) file.
Compressing a file with Gzip is quite simple. We need to provide the source file name.
gzip <source_filename>
When we run the command, a new file is created with the .gz extension.
In my example I have downloaded a large text file, The Project Gutenberg EBook of The Adventures of Sherlock Holmes by Sir Arthur Conan Doyle, hosted at https://norvig.com/big.txt
ls -l --block-size=MB
Looks like it has about 7 megabytes in it’s decompressed state.
total 7MB -rw-r--r-- 1 edxd edxd 7MB Apr 22 2019 adventures_of_sherlock_holmes.txt
Now let’s gzip it:
gzip adventures_of_sherlock_holmes.txt
Let’s quickly do a directory listing and see what we’ve got:
ls -l --block-size=MB
Notice that a file with the same name is created but with the .gz extension. So, the file that we now have is adventures_of_sherlock_holmes.txt.gz, and the original .txt file is gone.
Additionally, the file size is now compress and has a file size of about 3 megabytes
total 3MB -rw-r--r-- 1 edxd edxd 3MB Apr 22 2019 adventures_of_sherlock_holmes.txt.gz
Compress and Keep Input (Original) File (gzip -k)
You can keep the original file when creating gzip files by using the -k
parameter:
gzip -k adventures_of_sherlock_holmes.txt
And when we list the files, we’ll se we have both the original .txt and the .gz file:
Notice that both files, source and Gzip file, are now present. The source file is not deleted or removed this time.
Compressing Multiple Files (Into Multiple .gz Files)
Let’s assume that we have several files, which can be text files, documents, or any other type of files that can be compressed. Rather than compressing one file at a time, we can compress multiple files at once. For each file that is compressed, an individual Gzip file is created. For example, if we compress three files, we will have three Gzip files as the output. Gzip does not compress them into a single file.
Let’s quickly list the files with their respective details using the following command:
ls -l --block-size=MB
Notice that we have files with the names 1.txt, 2.txt, and 3.txt.
total 48MB -rw-r--r-- 1 edxd edxd 26MB Aug 8 22:24 1.txt -rw-r--r-- 1 edxd edxd 13MB Aug 8 22:24 2.txt -rw-r--r-- 1 edxd edxd 10MB Aug 8 22:24 3.txt
We will compress all of them at once with the following command:
gzip 1.txt 2.txt 3.txt
As usual, the gzip command does not return any output. We once run the command to list the files:
ls -l --block-size=MB
Notice that we have files with the names 1.txt.gz, 2.txt.gz, and 3.txt.gz.
total 1MB -rw-r--r-- 1 edxd edxd 1MB Aug 8 22:24 1.txt.gz -rw-r--r-- 1 edxd edxd 1MB Aug 8 22:24 2.txt.gz -rw-r--r-- 1 edxd edxd 1MB Aug 8 22:24 3.txt.g
Recursively Compress All Files in the Current Directory (-r *)
We can compress all files at once in the current directory. It is like compressing multiple files at once but with a difference, which is that it does not require each file to be named. We can use * (asterisk) with the -r parameter. Before we do that, let’s check out the files in the current directory:
ls -lr * --block-size=MB
-rw-r--r-- 1 edxd edxd 10MB Aug 8 22:55 3.txt -rw-r--r-- 1 edxd edxd 13MB Aug 8 22:55 2.txt -rw-r--r-- 1 edxd edxd 13MB Aug 9 03:35 1.txt folder: total 43MB -rw-r--r-- 1 edxd edxd 25MB Aug 8 22:55 5.txt -rw-r--r-- 1 edxd edxd 19MB Aug 8 22:55 4.txt
Notice that we have the 1.txt, 2.txt, 3.txt files, and a directory called folder which contains 4.txt and 5.txt
Let’s now check out how to gzip
recursively:
gzip -r *
The -r parameter is used for recursive compression on all files and subdirectories within the current directory. Gzip will compress each .txt file into a separate .gz file, even the ones inside the folder directory.
After we run this command, it is time again to list the files in the current directory:
ls -lr * --block-size=MB
As mentioned before, we have the 1.gz, 2.gz, 3.gz, and the directory called folder containing 4.gz, and 5.gz.
-rw-r--r-- 1 edxd edxd 1MB Aug 8 22:24 3.txt.gz -rw-r--r-- 1 edxd edxd 1MB Aug 8 22:24 2.txt.gz -rw-r--r-- 1 edxd edxd 1MB Aug 8 22:24 1.txt.gz folder: total 1MB -rw-r--r-- 1 edxd edxd 1MB Aug 8 22:31 5.txt.gz -rw-r--r-- 1 edxd edxd 1MB Aug 8 22:31 4.txt.gz
Add Multiple Files Into a Single Gzip File (Without Compression)
What if we want to add all files into a single Gzip file? Yes, that is also possible but not directly with Gzip.
We need to redirect the output of the cat command into the compressed file.
Let’s look at how this can be done:
cat 1.txt 2.txt 3.txt > 123.gz
We are sending the output of these files to the 123.gz file. After we execute the command, we can list the files:
ls -lr * --block-size=MB
There are three things worth noticing here.
- First, we have retained the source files. Remember that these were being read using the cat command and, therefore, were not removed.
- Second, the output file is created with the 123.gz name.
- Third, the 123.gz file isn’t compressed.
Recursively Compress Multiple Files Into a Single tar.gz Archive (tar -czvf)
Gzip can only compress one file. To compress multiple files into a single archive, we’ll have to use the Tar command, which can also compress using gzip
.
To do this we can run:
tar -czvf allfiles.tar.gz *
1.txt 2.txt 3.txt folder/ folder/5.txt folder/4.txt
The -v
options means that we want tar to describe the processed files, that’s why we’re seeing output.
Let’s check out the files:
ls -lr * --block-size=MB
-rw-r--r-- 1 edxd edxd 1MB Aug 8 22:57 allfiles.tar.gz -rw-r--r-- 1 edxd edxd 10MB Aug 8 22:55 3.txt -rw-r--r-- 1 edxd edxd 13MB Aug 8 22:55 2.txt -rw-r--r-- 1 edxd edxd 36MB Aug 8 22:54 1.txt folder: total 43MB -rw-r--r-- 1 edxd edxd 25MB Aug 8 22:55 5.txt -rw-r--r-- 1 edxd edxd 19MB Aug 8 22:55 4.txt
As you can see, the allfiles.tar.gz has the file size of 1MB, so it was compressed.
Let’s explain the -czvf
options:
- c – create new archive. Recursion is enabled by default, as per the man pages:
-c, --create Create a new archive. Arguments supply the names of the files to be archived. Directories are archived recursively, unless the --no-recursion option is given.
- z – gzip
- v – verbose (optional; describes the files added to the archive)
- f – output file (and the next argument will be the output file, in our case allfiles.tar.gz)
Viewing the Compression Information (gzip -l)
After we compress a file, we can view details of the compressed file with the following command:
gzip -l allfiles.tar.gz
The output displays certain parameters, such as compressed, uncompressed, ratio, and uncompressed name.
compressed uncompressed ratio uncompressed_name 98118 100669440 99.9% allfiles.tar
Decompressing Files (gzip -d)
I’ll be using a .gz file I have already, called afile.txt.gz. Here’s how it looks when I list my files:
-rw-r--r-- 1 edxd edxd 1MB Aug 8 22:54 afile.txt.gz
To decompress the file, we can use two different commands. The first one is the gzip command with the -d parameter:
gzip -d afile.txt.gz
The -d
parameter is used for decompressing the specified file. After we execute the command, it does not return any output. Let’s now list the files in the current directory:
ls -l --block-size=MB
-rw-r--r-- 1 edxd edxd 36MB Aug 8 22:54 afile.txt
As we can see, decompressing a file deletes the .gz
file but restores the compressed file.
Decompressing Using Gunzip
We can also use the gunzip command to decompress the files. The gunzip command does not require any parameter but the compressed filename, and it has the same result as gzip -d
:
gunzip afile.txt.gz
Conclusion
Well done. Hopefully, this tutorial helped you understand the basics of the gzip
command. If you encountered any issues, please feel free to leave a comment or contact us, and we’ll get back to we as soon as we can.