Text processing is of great importance for a computer user, as the computer can perform tedious tasks that a human can’t even think of trying.
Using text processing tools enables the user to search for a specific pattern match, replace matches with other text of the user’s choice, invoke an action upon the presence of a certain condition, or even do more complex tasks.
This article explains the differences between the most three well-known text processing tools in Linux awk, sed
and grep.
Ordered from the most rich and complex tool (awk), to the simplest (grep), but before delving into the body of the article, we have to know a little about regex.
Table of Contents
What are Regex (regular expressions)
Regular expressions are a way to specify a search pattern in text, where you tell the computer what are you looking for in the text, in a form of a sequence of characters, regex is used in the three tools we are discussing.
For example, if we were to find all occurrences of the sequence text, you would write \btext\b
, where \b
stands for a word boundary, that will result in two matches.
There are more complex patterns that can be achieved using regex, such as finding numerals or a text that satisfies a certain set of rules.
Let’s assume we have this text file (called feb_groc.txt); all examples will refer to this file:
In February 2021, I bought these groceries: 2-2-2021 apples 10$ 3-2-2021 sugar 12$ 3-2-2021 toast 20$ 3-2-2021 apples 07$ 3-2-2021 tomatoes 05$ 4-2-2021 meat 35$ 4-2-2021 toast 10$ …
Using the Awk Command
Being the most powerful tool of the three, awk is a text processing and scanning language(Scripting language). The name is derived from the initials of the authors of the tool (Aho, Weinberger, Kernighan).
You can use this tool for simple tasks such as printing matches, to complex ones such as doing arithmetic operations on numerals. You can set conditions under which you want the commands to be executed. Another powerful aspect of awk that it can operate on multiple files without the use of other tools as in the case of sed.
Search & Print All Occurrences of a Sequence of Characters
Search and print all occurrences of a sequence of characters in a text file (say apples in feb_groc.txt):
awk /apples/{print $0} feb_groc.txt
2-2-2021 apples 10$ 3-2-2021 apples 07$
Find & Replace
Find and replace (replace apples with bananas an save the output to bfeb_groc.txt):
awk '{gsub(/apples/, "bananas")}{print}' feb_groc.txt > bfeb_groc.txt
In February 2021, I bought these groceries: 2-2-2021 bananas 10$ 3-2-2021 sugar 12$ 3-2-2021 toast 20$ 3-2-2021 bananas 07$ 3-2-2021 tomatoes 05$ 4-2-2021 meat 35$ 4-2-2021 toast 10$ …
Sum a Column of Numerals
Sum a column of numerals (the cost column and omitting the $
, and skipping the first line):
awk -v sum=0 'NR>1{sum += int($3)} END {print sum}' feb_groc.txt
99
-v
is used to set a variableNR
to start from lines >1$3
means the third entry in the lineint
to trim the $, changing the entry to an integerEND
means after finishing the code block execute the next statement
Sum of Specific Numerals
Sum the cost of a specific grocery from a set of invoices (say toast):
awk -v sum=0 'NR>1 && /toast/ {sum += int($3)} END {print sum}' feb_groc.txt
30
Search & Print for Multiple Files
Searching and printing for multiple files (copied the file under different extension and made some changes):
awk '/\$/{print $0}' feb_groc*
2-2-2021 apples 10$ 3-2-2021 sugar 12$ 3-2-2021 toast 20$ 3-2-2021 apples 07$ 3-2-2021 tomatoes 05$ 4-2-2021 meat 35$ 4-2-2021 toast 10$
Add Header & Footer
Add header and footer (let’s take the previous command and edit it):
awk 'BEGIN{print "\n\nDate Item Cost\n-----------------------"} /\$/{print $0} END {print "-----------------------\n total cost = $$$"}' feb_groc*
Date Item Cost ----------------------- 2-2-2021 apples 10$ 3-2-2021 sugar 12$ 3-2-2021 toast 20$ 3-2-2021 apples 07$ 3-2-2021 tomatoes 05$ 4-2-2021 meat 35$ 4-2-2021 toast 10$ ----------------------- total cost = $$$
These examples are only demos, awk
has more usages like:
- input/output statements
- getting the index of a substring within the text
- splitting a text into an array where each element satisfies a condition
- change the case of the characters of the text and more math functions (sin(), cos(), rand(), exp(), … etc)
You can execute this man command to know more: man awk
Using the Sed Command
The name is an abbreviation of stream editor. Sed
is simpler to use than awk
, it is best suited for finding and substituting patterns, but you can also perform other tasks using sed
.
Search & Print All Occurrences of a Sequence of Characters
Search and print all occurrences of a sequence of characters in a text file ( let’s search for toast):
sed -n '/toast/ p' feb_groc.txt
3-2-2021 toast 20$ 4-2-2021 toast 10$
-n
to limit the default printing(all lines), to just the lines we are working on.p
is the print command.
Find & Replace
It uses the substitute command (s
) to replace a given text with another:
sed 's/tomatoes/potatoes/' feb_groc.txt
n February 2021, I bought these groceries: 2-2-2021 apples 10$ 3-2-2021 sugar 12$ 3-2-2021 toast 20$ 3-2-2021 apples 07$ 3-2-2021 potatoes 05$ 4-2-2021 meat 35$ 4-2-2021 toast 10$ …
Instead of directing the output to a file, we can use the in-place flag (i
), and add an extension to the file we are newly creating(here the new file will be the same as the old + pot at the end ):
sed -ipot 's/tomatoes/potatoes/' feb_groc.txt
In February 2021, I bought these groceries: 2-2-2021 apples 10$ 3-2-2021 sugar 12$ 3-2-2021 toast 20$ 3-2-2021 apples 07$ 3-2-2021 potatoes 05$ 4-2-2021 meat 35$ 4-2-2021 toast 10$ …
Limit Search/Substitution Within Specific Lines
You can limit your search or substitution within one line or range of lines:
Changing all a
’s to j
’s:
sed 's/a/j/' feb_groc.txt
In Februjry 2021, I bought these groceries: 2-2-2021 jpples 10$ 3-2-2021 sugjr 12$ 3-2-2021 tojst 20$ 3-2-2021 jpples 07$ 3-2-2021 potjtoes 05$ 4-2-2021 mejt 35$ 4-2-2021 tojst 10$ …
Changing all a’s in the 4th line to j
’s:
sed '4s/a/j/' feb_groc.txt
In February 2021, I bought these groceries: 2-2-2021 apples 10$ 3-2-2021 sugar 12$ 3-2-2021 tojst 20$ 3-2-2021 apples 07$ 3-2-2021 potatoes 05$ 4-2-2021 meat 35$ 4-2-2021 toast 10$ …
Changing all a’s within the range of lines (3,7) to j
’s:
sed '3,7s/a/j/' feb_groc.txt
In February 2021, I bought these groceries: 2-2-2021 apples 10$ 3-2-2021 sugjr 12$ 3-2-2021 tojst 20$ 3-2-2021 jpples 07$ 3-2-2021 potjtoes 05$ 4-2-2021 mejt 35$ 4-2-2021 toast 10$ …
Note: you can set a bound within which the search is applicable as a part of the regex.
sed -n '4,/es/ p' feb_groc.txt
3-2-2021 toast 20$ 3-2-2021 apples 07$
Using the Grep Command
Grep is the simplest one, it stands for Global Regular Expression Print.
Grep works simply: you pass it a pattern and a text file in which you want to find all occurrences of that pattern, grep loads the text file line by line in a buffer and checks for a match, if found it prints out the match. So, the best usage of grep is finding and printing pattern matches.
Search & Print All Occurrences of a Sequence of Characters
Search and print all occurrences of a sequence of characters in a text file (let’s search all s’s in our file):
grep s feb_groc.txt
There are various flags that can be used with grep like:
-A [n_lines]
: outputting n_lines after any match.-B [n_lines]
: outputting n_lines before any match.-C [n_lines]
: outputting n_lines before any match and n_lines after any match
-F
: search for literal strings.-l
: when searching in multiple files, this is an option to just show the file name-i
: ignore case.-n
: adding a line number before every output line.-v
: negating the match (outputting lines that doesn’t match the pattern).-x
: finding exact lines
Conclusion
In this article we covered some examples of how Awk, Sed and Grep work with the aim of giving you an idea of how they are different. If you have any feedback or questions feel free to leave a comment and we’ll get back to you as soon as we can.