Powerful Text Processing in Bash Using sed, awk, grep, and More

Powerful Text Processing in Bash Using sed, awk, grep, and More

Processing text is an essential part of many bash scripts. Bash provides many handy commands and utilities for manipulating plain text that should be in every scripter's toolkit.

This guide will demonstrate how to use text processing utilities like sed, awk, grep, cut, tr, and more to search, edit, and analyze text data in your bash shell scripts.

Finding Text with grep

The grep command is used to find matching lines of text in files or input streams. It is based on regular expressions, which allows flexible pattern matching.

Some examples of using grep to find text:

Basic Matching

# Find lines containing text
grep "some_text" myfile.txt

This prints any lines in myfile.txt that contain "some_text".

Inverted Match

# Print lines that do NOT match 
grep -v "some_text" myfile.txt

The -v flag inverts the match, printing only non-matching lines.

# Search all files in directory recursively
grep -R "some_text" /path/to/dir

-R enables recursive directory traversal to search files in subdirectories.

As you can see, grep is very useful for finding text patterns in files quickly and flexibly. But it doesn't allow editing or modifying text, only searching. For that, we need more powerful utilities like sed and awk.

Advanced Text Processing with sed and awk

sed and awk are programming languages built for processing text. They allow more advanced search and replace operations, numeric calculations, data formatting, and more.

Replacing Text with sed

The sed utility is ideal for simple search and replaces on text. For example:

# Replace text 
sed 's/day/night/' myfile.txt

This replaces "day" with "night" in myfile.txt. The s signifies substitution.

You can also use regular expressions:

# Replace with regex
sed 's/[0-9][0-9]/XX/' myfile.txt

This replaces any 2 digits with XX.

Parsing Text with awk

awk is more full-featured for structured text processing. For example, to print the 5th column of a CSV:

# Print 5th CSV column
awk -F, '{print $5}' myfile.csv

The -F flag sets the column delimiter. $5 prints the 5th column value.

awk can also process text line-by-line:

# Print lines based on condition 
awk '{if ($3 > 10) print $0}' myfile.txt

This prints lines where the 3rd column is greater than 10.

As you can see, awk enables more advanced text processing and analysis.

Cutting, Sorting, Changing Case

A few more useful Bash text processing utilities:

  • cut - Cut columns from a file:

      # Get username from /etc/passwd
      cut -d: -f1 /etc/passwd
    
  • sort - Sort lines alphabetically:

      # Sort contents of file
      sort myfile.txt
    
  • tr - Translate or delete characters:

      # Change lowercase to uppercase
      tr '[:lower:]' '[:upper:]' < myfile.txt
    
  • fmt - Format text by wrapping lines at width:

      # Wrap lines at 80 chars
      fmt -w 80 myfile.txt
    

Mastering these text processing tools will enable you to easily search, parse, transform, and analyze text in your bash scripts. They can be combined and piped together for incredibly flexible data processing capabilities.

Refer to this guide for syntax examples of grep, sed, awk, cut, tr, fmt, and more. Text processing is essential for generating reports, parsing logs/CSVs/JSON, and handling string manipulation.