AWK Command Examples for Beginners / AWK Linux Tutorial

AWK Command Examples for Beginners AWK Linux Tutorial

Learning to use the AWK utility in Linux is a skill that most Linux users yearn to have. It can save you time and energy, as well as help you better understand the inner workings of your computer.

While it may seem hard at first, you will become well-versed with this command-line utility with the right guide and frequent practice.

Once you understand the AWK utility well, you will find it a necessary tool when working on your Linux Terminal.

What is AWK?

AWK is a programming language designed for text processing and data extraction. It’s often used in conjunction with other programs, such as grep or sed, to extract information from large text files.

The AWK program reads the input line by line, breaks it up into fields, runs one of its built-in functions on each field, then prints the output line. If you’re new to AWK or just need a refresher, this beginner’s guide will show you how to use it in Linux. Let’s dive in!

AWK Vs. GAWK Vs. NAWK

Before diving much deeper into the post, we need to understand AWK, GAWK, and NAWK. They are all different implementations of the same programming language. AWK was the original language, and the name came from the initials of its creators. Alfred Aho (egrep author), Peter J. Weinberger (dealt with tiny RDBMS), and Brian Kernighan (creativity).

  • AWK the original language
  • NAWK stands for New AWK, mainly utilized by AT & T.
  • GAWK stands for GNU AWK, commonly included in most Linux distributions.

[powerkit_alert type=”info” dismissible=”false” multiline=”false”]
Tip: AWK is just a symlink of GAWK on most Linux systems. Therefore, when you run the AWK command on the Terminal, you are invoking the GAWK command.
[/powerkit_alert]

How to install AWK on Linux

AWK is pre-installed on most Linux distributions. However, if that’s not the case for you, execute the commands below depending on your operating system.

Execute the command below if you use any Debian-based system like Ubuntu or any other distro that uses the APT package manager.

sudo apt-get install gawk

For RHEL/ CentOS and Fedora users, use the commands below.

yum install gawk Fedora
dnf install gawk

If you are on ArchLlinx, execute the command below.

sudo pacman -S gawk

If your distribution doesn’t support gawk, try using nawk instead.

AWK Syntax

The basic AWK syntax is as follows:

awk {options} {filename}

AWK command can take the following options:

  • -f filename: Here, the AWK command reads the script from a file instead of the first command-line argument.
  • -F fs: This option specifies a file separator.
  • -v var=value: Used to declare a variable.

Don’t panic if you haven’t understood all about AWK options. We will utilize these options in our examples below, and you will have a solid understanding.

1. Read AWK Script

As shown below, you can quickly execute a simple AWK script on the Terminal to print any information passed.

awk '{print "Hello John Doe, welcome to Linux"}'

Hit the Enter key two times, and AWK will return our welcome message on the Terminal. To terminate the program, use the Ctrl + C combination.

word image 22

2. Execute AWK on an External File

To better understand the AWK print command, we will create a file called employees.txt and enter the details below.

John Manager Branch 1
Stacy CEO Branch 2
Duke Manager Branch 3
Kate CEO Branch 5
Sunil Manager Branch 14
Duke CEO Branch 3
Kate Manager Branch 5
Sunil CEO Branch 14

When we run the command below, AWK will print all the details on the Terminal.

awk '{print}' employee.txt

word image 23

3. Print Lines with a Give Pattern

To print all the lines with a matching pattern, we can use the syntax below. For example, let’s print all the lines with the word Manager.

awk '/<unique-pattern>/ {print}' employees.txt

E.g

awk '/Manager/ {print}' employees.txt

word image 24

In the above image, you can see AWK printed all the lines with the word Manager.

4. AWK Variables

AWK is a powerful language that you can use to process text files. It assigns variables to every data field. For example:

  • $0: This variable represents the whole line
  • $1: This variable represents the first data field
  • $n: This variable represents the nth data field.

5. Print Lines with AWK Variables

In point 1 above, Read AWK Script, we saw that we could print all the contents of a file by executing the command below:

awk '{print}' employees.txt

We can still achieve the same result by running the command below.

awk '{print $0}' employees.txt

word image 25

That’s because whenever you pass an entire file to AWK, it assigns the variable $0 to all the contents.

Let’s now try executing AWK with the $1 variable and see what happens.

awk '{print $1}' employees.txt

word image 26

You can see AWK printed the first word in every line on the file from the image above. If you are a Python developer, you might have realized AWK has a straightforward syntax just like Python. Very simple to read and write.

If the $1 variable represents the first word in every line, then $2 must represent the third word. Let’s execute the command below. We expect AWK to print all the first and second words in every line.

awk '{print $1, $2}' employees.txt

word image 27

Wow! From the image above, we got what we expected.

The file we are using, ‘employees.txt,’ is just an example. Let’s look at one real-life scenario. Suppose we want to see all the users present in the system using the AWK command. We know that users are located in the /etc/passwd file and are the first in every line. Therefore, we can use the $1 variable to print every first word in all lines.

However, there is one more thing we need to understand – the separator value. In the ‘employees.txt‘ file above, values were separated using spaces. In a file where data fields/ values are separated with spaces or tabs, you don’t necessarily need to pass any options in the AWK command.

However, values are separated using a colon (:) in the /etc/passwd file. We need to let the AWK command understand this separator using the -F option. With that in mind, we can simply execute the command below.

awk -F: '{print $1}' /etc/passwd

word image 28

6. Execute Multiple Commands

You can efficiently execute multiple commands on the Terminal by separating them using a semicolon. Take a look at the example below.

echo "Welcome JohnDoe" | awk '{$2="Sunil"; print $0}'

word image 29

Let’s understand what’s happening above. The echo command prints whatever is passed to the Terminal. However, we are using the pipe ( | ) parameter to use the echo output as an input for the AWK command. AWK assigns variables to this statement. Therefore, ‘Welcome’ becomes variable $1 and ‘JohnDoe’ becomes variable $2. Next, AWK reassigns the variable $2 to “Sunil” and prints the whole line.

7. Read a Script From a File

You can also use a script to manipulate how AWK prints the output. For example, we want AWK to print to all users and their home directory. We already know users are the first word in every line and are assigned to variable $1. The home directory is the 6th field and will be set to Variable $6.

We will start by creating a simple script, as shown below. We will name it ‘sampleScript.’

{
    userPath = $1 " my home directory is " $6
    print userPath
}

Now we will write our AWK script as shown below. Previously we looked at how to use the -F option. Here, we will also include the -f option to specify our ‘sampleScript’ file.

awk -F: -f sampleScript /etc/passwd

word image 30

From the image above, you can see we are getting an output like: root my home directory is /root.

8. AWK Preprocessing and Postprocessing (BEGIN & END Keywords)

BEGIN is an AWK preprocessing keyword that you can use to assign header information to AWK output. For example, you were processing a text file, and you wanted the result to have a title/header like Employees from Branch A. The BEGIN keyword will help you achieve that with ease.

On the other hand, END is an AWK postprocessing keyword that you can use to assign footers to your file. The best way to implement these keywords is using a simple script, and let’s name it as ‘FooterHeaderScript’.

BEGIN {
print "System Users and their Home Directories"
print " UserName \t HomeDirectory"
print "___________ \t __________"
}

{
FS=":"
print $1 " \t \t" $6
}

END {
print "You have reached the end of the File"
}

Let’s run the AWK command using the script above.

awk -f 'FooterHeaderScript /etc/passwd

word image 31

The image above shows that our AWK output has a header similar to what we specified in the script.

AWK Built-In Variables

Up to this post, we have used several AWK variables like $0, $1, $2, $nth, and options like -F for the line separator and the -f for specifying a script file. However, there are more AWK built-in variables:

  • FIELDWIDTHS: Used to specify the field width.
  • RS: Specifies the record separator
  • FS: This variable specifies the field separator, and we used it in the previous example of AWK postprocessing and preprocessing.
  • OFS: Specifies the output field separator.
  • ORS: Specifies the output record separator.
  • NR: This variable specifies the number of processed records.
  • NF: Keeps count of the number of fields processed in every line.
  • IGNORECASE: Tells AWK to ignore character case.
  • ARGC: Specifies the total number of parameters passed to the AWK script on the Terminal. This variable always has a value of one or more since the program name is counted as the first argument.
  • ARGV: This variable is an array that stores all arguments passed to AWK. Like any other array, it counts from index zero (0).
  • ENVIRON: This is an array containing the values of environment variables for the current process
  • FILENAME: This is the name of the file being processed.
  • FNR: Specifies the total number of records we have read in the current input file.

Now let’s look at how you can work with these numerous built-in variables using examples.

1. OFS Variable (Output Field Separator)

By default, the Output File Separator (OFS) variable in AWK is a space. However, you can specify the output separator using the OFS variable. Let’s look at the example below.

awk 'BEGIN{FS=":"; OFS="---"} {print $1, $6}' /etc/passwd

word image 32

2. The RS Variable (Record Separator)

Let’s look at a sample file that contains the following records about company employees.

John Doe
Branch A CEO
Phone No: 23445678
Jane Doe
Branch B Manager
Phone No: 78564423

We won’t get the correct output when we use AWK to process this file like we have been doing before because a new line separates the values, and a blank line separates the records. Therefore, we will set the FS variable to \n, which specifies that the file separator is a new line and the RS to blank text "" which specifies the record separator is a blank line.

awk 'BEGIN{FS="\n"; RS=""}{print $1, $2, $3}' testFile

word image 33

3. The ARGC and ARGV Variables (Argument Count and Argument Vector)

The ARGC variable specifies the total number of parameters passed to the AWK script, while the ARGV variable is an array that stores all arguments passed to AWK. Let’s use the example below to know the number of parameters passed to AWK.

awk 'BEGIN{print ARGC}' testFile

word image 34

Now that we know we passed two arguments to the AWK command, let’s use the ARGV variable. Since ARGV is an array, we will use indexes [ ] to retrieve the arguments. Remember, an array starts from index zero.

awk 'BEGIN{print ARGV[0],ARGV[1]}' testFile

word image 35

4. ENVIRON Variable

The ENVIRON variable is an array containing the values of environment variables for the current process. To retrieve all shell variables, we can use the command below.

awk 'BEGIN{print ENVIRON["PATH"]}'

word image 36

We can use the command below to use bash variables without ENVIRON variables.

echo | awk -v homeDir=$HOME '{print "My home Directory is " homeDir}'

word image 37

5. The NF Variable (Number of Fields)

We can use the NF (Number of Fields) variable to print the last value in a line. Look at the example below. Here we want to print the user (first word of every line) and the user shell (last word in every line) from the /etc/passwd file.

awk 'BEGIN{FS=":"; OFS=":"} {print $1,$NF}' /etc/passwd

word image 38

Additionally, we can use the NR variable to print a range of lines. Let’s modify the employees.txt file to look as follows.

1. John Manager Branch 1
2. Stacy CEO Branch 2
3. Duke Manager Branch 3
4. Kate CEO Branch 5
5. Sunil Manager Branch 14
6. Duke CEO Branch 3
7. Kate Manager Branch 5
8. Sunil CEO Branch 14

To print lines 3 to 6, we can use the command below.

awk 'NR==3, NR==6 {print $0}' employees.txt

word image 39

6. User-Defined Variables

AWK allows one to use user-defined variables. However, like any other programming language, there are rules to declaring a variable. For example, it shouldn’t start with a number. See the example below.

awk 'name = "My name is John Doe"; age="I am 30" {print name, age}'

word image 40

7. IF Statement

Like any other programming language, AWK also uses If-Else statements and Loops. Let’s take a sample text file below called numbers.txt.

23
34
45
56
78
89

To print all numbers greater than or equal to 70, we will use the command below.

awk '{if ($1 > 70) print $0}' numbers.txt

word image 41

Quite simple! Right? If you want the if statement to have a body, you will need curly braces, as shown below.

awk '{if ($1 > 70){
    num1 = $1 *10;
    print num1}
}' numbers.txt

word image 42

8. IF – ELSE statement

If you have worked with if-else statements in other programming languages, I believe you have already joined the dots on how we can write the if-else statement in AWK. See the example below.

awk '{if ($1 > 30){
    x = $1 * 3;
    print x
} else{
    x = $1 / 2
    print x}
}' numbers.txt

word image 43

9. While Loop

We can also write loops with AWK. Look at the example below.

awk '{
    sum = 0;
    i = 1
    while (i < 5)
{
    sum += $i;
    i++
}
average = sum / 3; print "Average:",average }' numbers.txt

word image 44

10. For Loop

To write a For loop in AWK, you can use the syntax in the command below.

awk '{
    total = 0;
    for (var = 1; var < 5; var++)
{
    total += $var;
}
    avg = total / 3;
    print "Average:",avg;
}' numbers.txt

word image 45

Conclusion

This post has given you a comprehensive beginner guide to using the AWK command. If you encountered any errors or difficulties executing any of the commands above feel free to let us know in the comments and we’ll get back to you as soon as we can.

0 Shares:
Subscribe
Notify of
guest
Receive notifications when your comment receives a reply. (Optional)
Your username will link to your website. (Optional)

0 Comments
Inline Feedbacks
View all comments
You May Also Like
Bash Comments
Read More

Bash Comments

Comments are used in programming languages as well as in Bash script for writing descriptions. Often you want…
gnome logo with theme icon
Read More

How to Install GNOME Themes

Knowing how to install GNOME themes empowers you to customize the appearance of your Linux graphical user interface.…