Wget Command in Linux with Examples

Wget Command in Linux with Examples

In this tutorial, you will learn about the wget command, its options, and how you can use these options in different scenarios.

Normally the File Transfer Protocol (FTP) utility is used to transfer files between client and server computers in a network. However, this requires that a computer is set up as an FTP server and you have to log in manually into it using a username and password.

In contrast, wget is a command-line utility available in Linux, that allows you to download files un-interactively.

The wget utility is used to download tar, zip files, or rpm packages from different websites. It works on several protocols such as HTTP, HTTPS, FTP, etc.

By default, the wget utility is installed in most modern Linux distributions. However, if it is not present you can issue the following command to install it manually.

sudo apt install wget

Wget Command Syntax

You can use the wget command using the following syntax.

wget [OPTIONS] [URL]

An example of how to use a wget command is shown below.

wget https://tldp.org/LDP/intro-linux/intro-linux.pdf
Output
--2022-02-10 11:53:05--  https://tldp.org/LDP/intro-linux/intro-linux.pdf
Resolving tldp.org (tldp.org)... 152.19.134.152, 152.19.134.151
Connecting to tldp.org (tldp.org)|152.19.134.152|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1600364 (1.5M) [application/pdf]
Saving to: ‘intro-linux.pdf’

intro-linux.pdf               100%[=================================================>]   1.53M  2.01MB/s    in 0.8s

2022-02-10 11:53:07 (2.01 MB/s) - ‘intro-linux.pdf’ saved [1600364/1600364]

In the above example, an Introduction to Linux (which is in the form of a .pdf file) is downloaded by using the wget command by specifying the file’s URL.

Useful options used in wget command

There are a large number of options you can use with the wget command. You can consult the wget man page or use the option --help to find their details. Some of the most useful options are explained below.

To save a file with another name

Sometimes the file you want to save has a very long name or you don’t want to save it with the original name or you may want to save the output to a file. To do this you can use the -O option to save the file with another name that you specify.

wget -O introduction_to_linux.pdf https://tldp.org/LDP/intro-linux/intro-linux.pdf

Here the file will be saved as introduction_to_linux.pdf, as shown below.

ls
Output
introduction_to_linux.pdf

To save the file in another directory

Usually, the wget command saves the downloaded file in the current directory. If you want to save the file in another folder, you can use the -P option. When you use -P, you need to specify the name of the destination folder as well.

wget -P ~/document https://tldp.org/LDP/intro-linux/intro-linux.pdf

Now the file will be downloaded in the ‘document’ folder present in the home directory.

.
└── document
    └── intro-linux.pdf

To set limits on your rate of download

Suppose you are working on a task that requires high Internet bandwidth constantly. In this case, you would not want that a downloading file to occupy all of the bandwidth and affect your work. In fact, you want to utilize your bandwidth for multiple tasks.

You can set a specific limit (say 500 KB) to your download rate. In other words, you want to allocate only 500 kilobytes for downloading and leaving the rest for other tasks. You can do so by using the following option.

wget --limit-rate=500k https://tldp.org/LDP/intro-linux/intro-linux.pdf
Output
--2022-02-10 12:17:13--  https://tldp.org/LDP/intro-linux/intro-linux.pdf
Resolving tldp.org (tldp.org)... 152.19.134.151, 152.19.134.152
Connecting to tldp.org (tldp.org)|152.19.134.151|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1600364 (1.5M) [application/pdf]
Saving to: ‘intro-linux.pdf’

intro-linux.pdf               100%[=================================================>]   1.53M   508KB/s    in 3.1s

2022-02-10 12:17:17 (508 KB/s) - ‘intro-linux.pdf’ saved [1600364/1600364]

Here --limit-rate=500k is used to limit your bandwidth. It means that now the wget command will not use more bandwidth than 500KB/sec. This is very useful especially when you are downloading a very large size file and you don’t want it to consume all of your bandwidth.

To continue the download once it is stopped

Consider a case of downloading a large file, where you lose your Internet connection. The file will definitely not completely download. There is no need to worry because you can use the -c option to continue downloading the file from the point up to where it has been downloaded.

wget https://tldp.org/LDP/intro-linux/intro-linux.pdf
Output
--2022-02-10 12:20:54--  https://tldp.org/LDP/intro-linux/intro-linux.pdf
Resolving tldp.org (tldp.org)... 152.19.134.151, 152.19.134.152
Connecting to tldp.org (tldp.org)|152.19.134.151|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1600364 (1.5M) [application/pdf]
Saving to: ‘intro-linux.pdf’

intro-linux.pdf                71%[==================================>               ]   1.08M  1.28MB/s               ^C

Here you can see that the file downloading is interrupted at 71%. However, when wget is used with the -c option the file download is started from 71% again. This can be observed from the + sign mentioned in the progress indicators in the output of the second command.

wget -c https://tldp.org/LDP/intro-linux/intro-linux.pdf
Output
--2022-02-10 12:21:09--  https://tldp.org/LDP/intro-linux/intro-linux.pdf
Resolving tldp.org (tldp.org)... 152.19.134.151, 152.19.134.152
Connecting to tldp.org (tldp.org)|152.19.134.151|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 1600364 (1.5M), 372037 (363K) remaining [application/pdf]
Saving to: ‘intro-linux.pdf’

intro-linux.pdf               100%[++++++++++++++++++++++++++++++++++++++===========>]   1.53M   605KB/s    in 0.6s

2022-02-10 12:21:10 (605 KB/s) - ‘intro-linux.pdf’ saved [1600364/1600364]

To download a file in the background

To put the download of a large file in the background is also possible. So that you get the prompt immediately where you can do other tasks. For this purpose, you can use the -b option.

wget -b https://tldp.org/LDP/intro-linux/intro-linux.pdf
Output
Continuing in background, pid 432.
Output will be written to ‘wget-log’.

As you can see, the prompt is immediately available to be used for other tasks. If you want to check the progress or output of the wget command, it is available in the wget-log file.

.
├── intro-linux.pdf
└── wget-log

The contents of the wget-log file look something like this:

--2022-02-10 12:23:35--  https://tldp.org/LDP/intro-linux/intro-linux.pdf
Resolving tldp.org (tldp.org)... 152.19.134.151, 152.19.134.152
Connecting to tldp.org (tldp.org)|152.19.134.151|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1600364 (1.5M) [application/pdf]
Saving to: ‘intro-linux.pdf’

     0K .......... .......... .......... .......... ..........  3%  208K 7s
    50K .......... .......... .......... .......... ..........  6%  444K 5s
   100K .......... .......... .......... .......... ..........  9%  636K 4s
    ...
  1450K .......... .......... .......... .......... .......... 95% 9.26M 0s
  1500K .......... .......... .......... .......... .......... 99% 7.71M 0s
  1550K .......... ..                                         100% 11.6M=0.9s

2022-02-10 12:23:37 (1.69 MB/s) - ‘intro-linux.pdf’ saved [1600364/1600364]

This option is very useful when you are downloading large-size files (say for example a 20 GB .iso file).

To download multiple files using a single wget command

If you have to download multiple files, it would be difficult and tiresome to issue the wget command again and again. Instead what you can do is to write all the URLs you want to download from in one text file and use the name of the file in the following way.

wget -i file.txt

Here -i is used to specify the file name (file.txt, which is the text file that contains all the URLs). Now, wget will download all files written in this text file.

The URLs file should contain an URL on each line. For example, here is how my file.txt looks like:

https://tldp.org/LDP/intro-linux/intro-linux.pdf
https://alex.smola.org/drafts/thebook.pdf

To save a web page

You can save a web page by specifying its URL like this

wget example.com

However, it will download the first page (index.html) of this website. You can use the -r option to download all the pages recursively that are linked to this page.

wget -r example.com

To clone a website using wget

You can also make a local copy of a remote website using the wget command. This is because wget can follow links in HTML, XHTML and XML pages.

In order to download a complete website, you can use the -m option.

wget -m example.com

Now wget will recreate the full directory structure of the mentioned site. After downloading the website, you can navigate among the pages of the website locally as if you are accessing it over the internet.

To skip SSL certificate check

The wget usually check the SSL certificate of the website from which you want to download. However, you can skip this check using the following option, if you are sure that the website already has an SSL certificate.

wget – no-check-certificates example.com

Now wget will not check the SSL certificate of this website.

How to crawl a website using wget

You can also use the wget command as a spider to crawl the web pages. It is a great tool to discover broken links of your website and provide you with that output. To do so, you have to use some options of wget command as shown below.

wget -r – spider -nd -nv -l 1 -w 1 -o wget.log https://exammple.com

These options are explained below.

  1. -r (recursive) download pages recursively, that are linked to the webpage you are downloading.
  2. --spider will not download anything (as you don’t want to download anything, you just want to find the broken links).
  3. -nd (no directories) option will not create any directory.
  4. -nv (no verbose) option will turn off verboseness. It means the extra and not needed information will not be displayed, as you are interested to find the broken links only.
  5. -l (level) option is used to define how many levels deep you want to go in the recursion process. By default, wget goes up to five levels. But in the above command -l 1 represents that you want to go to only 1 level deep in the recursion.
  6. -w (waiting) time per request between retrievals that is specified in seconds. It means wget will check one link then wait for the number of seconds specified before going to the other link. In the above example, wget will wait for one second (specified as -w 1) before retrieving the next link.
  7. -O (output) will write the output to a file (in the above example it is wget.log). This option is also explained above.

When you enter the above command you will see nothing. Although the crawling process will start, it would be outputted to the wget.log file. To see the output, you have to open the wget.log file using a new terminal as follows.

tail -f wget.log

Now you will see the output of wget which worked as a crawler. It means, you will find the status of all the links of the mentioned website, the wget command is retrieving.

The correct links have a number 200 written at the end. The links with 404 or 500 written at the end are faulty or broken links that need to be repaired.

Conclusion

In this tutorial, you have learned about the wget command which is a non-interactive utility available in several Linux distributions and Mac operating systems. Through this utility, you can download any type of file, web page, or even a complete website. Since this is a command-line utility with no graphical element involved it is much faster.

If you encountered any issues or have any questions, feel free to let us know in the comments and we’ll get back to you as soon as we can.

 

0 Shares:
Subscribe
Notify of
guest
Receive notifications when your comment receives a reply. (Optional)
Your username will link to your website. (Optional)

0 Comments
Inline Feedbacks
View all comments
You May Also Like