Learning to use the AWK utility in Linux is a skill that most Linux users yearn to have. It can save you time and energy, as well as help you better understand the inner workings of your computer.
While it may seem hard at first, you will become well-versed with this command-line utility with the right guide and frequent practice.
Once you understand the AWK utility well, you will find it a necessary tool when working on your Linux Terminal.
What is AWK?
AWK is a programming language designed for text processing and data extraction. It’s often used in conjunction with other programs, such as grep or sed, to extract information from large text files.
The AWK program reads the input line by line, breaks it up into fields, runs one of its built-in functions on each field, then prints the output line. If you’re new to AWK or just need a refresher, this beginner’s guide will show you how to use it in Linux. Let’s dive in!
Table of Contents
AWK Vs. GAWK Vs. NAWK
Before diving much deeper into the post, we need to understand AWK, GAWK, and NAWK. They are all different implementations of the same programming language. AWK was the original language, and the name came from the initials of its creators. Alfred Aho (egrep author), Peter J. Weinberger (dealt with tiny RDBMS), and Brian Kernighan (creativity).
- AWK the original language
- NAWK stands for New AWK, mainly utilized by AT & T.
- GAWK stands for GNU AWK, commonly included in most Linux distributions.
[powerkit_alert type=”info” dismissible=”false” multiline=”false”]
Tip: AWK is just a symlink of GAWK on most Linux systems. Therefore, when you run the AWK command on the Terminal, you are invoking the GAWK command.
[/powerkit_alert]
How to install AWK on Linux
AWK is pre-installed on most Linux distributions. However, if that’s not the case for you, execute the commands below depending on your operating system.
Execute the command below if you use any Debian-based system like Ubuntu or any other distro that uses the APT package manager.
sudo apt-get install gawk
For RHEL/ CentOS and Fedora users, use the commands below.
yum install gawk Fedora dnf install gawk
If you are on ArchLlinx, execute the command below.
sudo pacman -S gawk
If your distribution doesn’t support gawk, try using nawk instead.
AWK Syntax
The basic AWK syntax is as follows:
awk {options} {filename}
AWK command can take the following options:
-f filename
: Here, the AWK command reads the script from a file instead of the first command-line argument.-F fs
: This option specifies a file separator.-v var=value
: Used to declare a variable.
Don’t panic if you haven’t understood all about AWK options. We will utilize these options in our examples below, and you will have a solid understanding.
1. Read AWK Script
As shown below, you can quickly execute a simple AWK script on the Terminal to print any information passed.
awk '{print "Hello John Doe, welcome to Linux"}'
Hit the Enter key two times, and AWK will return our welcome message on the Terminal. To terminate the program, use the Ctrl + C combination.
2. Execute AWK on an External File
To better understand the AWK print
command, we will create a file called employees.txt and enter the details below.
John Manager Branch 1 Stacy CEO Branch 2 Duke Manager Branch 3 Kate CEO Branch 5 Sunil Manager Branch 14 Duke CEO Branch 3 Kate Manager Branch 5 Sunil CEO Branch 14
When we run the command below, AWK will print all the details on the Terminal.
awk '{print}' employee.txt
3. Print Lines with a Give Pattern
To print all the lines with a matching pattern, we can use the syntax below. For example, let’s print all the lines with the word Manager.
awk '/<unique-pattern>/ {print}' employees.txt
E.g
awk '/Manager/ {print}' employees.txt
In the above image, you can see AWK printed all the lines with the word Manager.
4. AWK Variables
AWK is a powerful language that you can use to process text files. It assigns variables to every data field. For example:
$0
: This variable represents the whole line$1
: This variable represents the first data field$n
: This variable represents the nth data field.
5. Print Lines with AWK Variables
In point 1 above, Read AWK Script, we saw that we could print all the contents of a file by executing the command below:
awk '{print}' employees.txt
We can still achieve the same result by running the command below.
awk '{print $0}' employees.txt
That’s because whenever you pass an entire file to AWK, it assigns the variable $0
to all the contents.
Let’s now try executing AWK with the $1
variable and see what happens.
awk '{print $1}' employees.txt
You can see AWK printed the first word in every line on the file from the image above. If you are a Python developer, you might have realized AWK has a straightforward syntax just like Python. Very simple to read and write.
If the $1 variable represents the first word in every line, then $2 must represent the third word. Let’s execute the command below. We expect AWK to print all the first and second words in every line.
awk '{print $1, $2}' employees.txt
Wow! From the image above, we got what we expected.
The file we are using, ‘employees.txt,’ is just an example. Let’s look at one real-life scenario. Suppose we want to see all the users present in the system using the AWK command. We know that users are located in the /etc/passwd
file and are the first in every line. Therefore, we can use the $1 variable to print every first word in all lines.
However, there is one more thing we need to understand – the separator value. In the ‘employees.txt‘ file above, values were separated using spaces. In a file where data fields/ values are separated with spaces or tabs, you don’t necessarily need to pass any options in the AWK command.
However, values are separated using a colon (:) in the /etc/passwd file. We need to let the AWK command understand this separator using the -F option. With that in mind, we can simply execute the command below.
awk -F: '{print $1}' /etc/passwd
6. Execute Multiple Commands
You can efficiently execute multiple commands on the Terminal by separating them using a semicolon. Take a look at the example below.
echo "Welcome JohnDoe" | awk '{$2="Sunil"; print $0}'
Let’s understand what’s happening above. The echo command prints whatever is passed to the Terminal. However, we are using the pipe ( | ) parameter to use the echo output as an input for the AWK command. AWK assigns variables to this statement. Therefore, ‘Welcome’ becomes variable $1 and ‘JohnDoe’ becomes variable $2. Next, AWK reassigns the variable $2 to “Sunil” and prints the whole line.
7. Read a Script From a File
You can also use a script to manipulate how AWK prints the output. For example, we want AWK to print to all users and their home directory. We already know users are the first word in every line and are assigned to variable $1. The home directory is the 6th field and will be set to Variable $6.
We will start by creating a simple script, as shown below. We will name it ‘sampleScript.’
{ userPath = $1 " my home directory is " $6 print userPath }
Now we will write our AWK script as shown below. Previously we looked at how to use the -F option. Here, we will also include the -f option to specify our ‘sampleScript’ file.
awk -F: -f sampleScript /etc/passwd
From the image above, you can see we are getting an output like: root my home directory is /root.
8. AWK Preprocessing and Postprocessing (BEGIN & END Keywords)
BEGIN is an AWK preprocessing keyword that you can use to assign header information to AWK output. For example, you were processing a text file, and you wanted the result to have a title/header like Employees from Branch A.
The BEGIN keyword will help you achieve that with ease.
On the other hand, END is an AWK postprocessing keyword that you can use to assign footers to your file. The best way to implement these keywords is using a simple script, and let’s name it as ‘FooterHeaderScript’.
BEGIN { print "System Users and their Home Directories" print " UserName \t HomeDirectory" print "___________ \t __________" } { FS=":" print $1 " \t \t" $6 } END { print "You have reached the end of the File" }
Let’s run the AWK command using the script above.
awk -f 'FooterHeaderScript /etc/passwd
The image above shows that our AWK output has a header similar to what we specified in the script.
AWK Built-In Variables
Up to this post, we have used several AWK variables like $0, $1, $2, $nth, and options like -F for the line separator and the -f for specifying a script file. However, there are more AWK built-in variables:
- FIELDWIDTHS: Used to specify the field width.
- RS: Specifies the record separator
- FS: This variable specifies the field separator, and we used it in the previous example of AWK postprocessing and preprocessing.
- OFS: Specifies the output field separator.
- ORS: Specifies the output record separator.
- NR: This variable specifies the number of processed records.
- NF: Keeps count of the number of fields processed in every line.
- IGNORECASE: Tells AWK to ignore character case.
- ARGC: Specifies the total number of parameters passed to the AWK script on the Terminal. This variable always has a value of one or more since the program name is counted as the first argument.
- ARGV: This variable is an array that stores all arguments passed to AWK. Like any other array, it counts from index zero (0).
- ENVIRON: This is an array containing the values of environment variables for the current process
- FILENAME: This is the name of the file being processed.
- FNR: Specifies the total number of records we have read in the current input file.
Now let’s look at how you can work with these numerous built-in variables using examples.
1. OFS Variable (Output Field Separator)
By default, the Output File Separator (OFS) variable in AWK is a space. However, you can specify the output separator using the OFS variable. Let’s look at the example below.
awk 'BEGIN{FS=":"; OFS="---"} {print $1, $6}' /etc/passwd
2. The RS Variable (Record Separator)
Let’s look at a sample file that contains the following records about company employees.
John Doe Branch A CEO Phone No: 23445678 Jane Doe Branch B Manager Phone No: 78564423
We won’t get the correct output when we use AWK to process this file like we have been doing before because a new line separates the values, and a blank line separates the records. Therefore, we will set the FS variable to \n
, which specifies that the file separator is a new line and the RS to blank text ""
which specifies the record separator is a blank line.
awk 'BEGIN{FS="\n"; RS=""}{print $1, $2, $3}' testFile
3. The ARGC and ARGV Variables (Argument Count and Argument Vector)
The ARGC
variable specifies the total number of parameters passed to the AWK script, while the ARGV
variable is an array that stores all arguments passed to AWK. Let’s use the example below to know the number of parameters passed to AWK.
awk 'BEGIN{print ARGC}' testFile
Now that we know we passed two arguments to the AWK command, let’s use the ARGV
variable. Since ARGV
is an array, we will use indexes [ ]
to retrieve the arguments. Remember, an array starts from index zero.
awk 'BEGIN{print ARGV[0],ARGV[1]}' testFile
4. ENVIRON Variable
The ENVIRON variable is an array containing the values of environment variables for the current process. To retrieve all shell variables, we can use the command below.
awk 'BEGIN{print ENVIRON["PATH"]}'
We can use the command below to use bash variables without ENVIRON variables.
echo | awk -v homeDir=$HOME '{print "My home Directory is " homeDir}'
5. The NF Variable (Number of Fields)
We can use the NF (Number of Fields) variable to print the last value in a line. Look at the example below. Here we want to print the user (first word of every line) and the user shell (last word in every line) from the /etc/passwd
file.
awk 'BEGIN{FS=":"; OFS=":"} {print $1,$NF}' /etc/passwd
Additionally, we can use the NR variable to print a range of lines. Let’s modify the employees.txt file to look as follows.
1. John Manager Branch 1 2. Stacy CEO Branch 2 3. Duke Manager Branch 3 4. Kate CEO Branch 5 5. Sunil Manager Branch 14 6. Duke CEO Branch 3 7. Kate Manager Branch 5 8. Sunil CEO Branch 14
To print lines 3 to 6, we can use the command below.
awk 'NR==3, NR==6 {print $0}' employees.txt
6. User-Defined Variables
AWK allows one to use user-defined variables. However, like any other programming language, there are rules to declaring a variable. For example, it shouldn’t start with a number. See the example below.
awk 'name = "My name is John Doe"; age="I am 30" {print name, age}'
7. IF Statement
Like any other programming language, AWK also uses If-Else statements and Loops. Let’s take a sample text file below called numbers.txt
.
23 34 45 56 78 89
To print all numbers greater than or equal to 70, we will use the command below.
awk '{if ($1 > 70) print $0}' numbers.txt
Quite simple! Right? If you want the if statement to have a body, you will need curly braces, as shown below.
awk '{if ($1 > 70){ num1 = $1 *10; print num1} }' numbers.txt
8. IF – ELSE statement
If you have worked with if-else statements in other programming languages, I believe you have already joined the dots on how we can write the if-else statement in AWK. See the example below.
awk '{if ($1 > 30){ x = $1 * 3; print x } else{ x = $1 / 2 print x} }' numbers.txt
9. While Loop
We can also write loops with AWK. Look at the example below.
awk '{ sum = 0; i = 1 while (i < 5) { sum += $i; i++ } average = sum / 3; print "Average:",average }' numbers.txt
10. For Loop
To write a For loop in AWK, you can use the syntax in the command below.
awk '{ total = 0; for (var = 1; var < 5; var++) { total += $var; } avg = total / 3; print "Average:",avg; }' numbers.txt
Conclusion
This post has given you a comprehensive beginner guide to using the AWK command. If you encountered any errors or difficulties executing any of the commands above feel free to let us know in the comments and we’ll get back to you as soon as we can.