G
GuideDevOps
Lesson 9 of 15

Text Processing (awk, sed, jq)

Part of the Shell Scripting (Bash) tutorial series.

Text processing is at the heart of Linux and DevOps. Whether you're parsing log files, updating configuration files, or extracting data from APIs, these tools are essential.

awk: The Field Processor

awk is a complete programming language designed for text processing. It's best used when your data is organized into columns or fields.

Basic Syntax

awk 'pattern { action }' file

Examples with Results

1. Print Specific Fields

By default, awk uses whitespace as a delimiter. $1, $2, etc., represent fields.

Command:

# Get the first and third columns of a file
echo "Alice 25 Engineer
Bob 30 Designer
Charlie 22 Developer" | awk '{ print $1, $3 }'

Result:

Alice Engineer
Bob Designer
Charlie Developer

2. Using a Custom Delimiter

Use -F to specify a delimiter (e.g., : for /etc/passwd).

Command:

echo "root:x:0:0:root:/root:/bin/bash" | awk -F: '{ print "User: " $1 ", Shell: " $7 }'

Result:

User: root, Shell: /bin/bash

3. Filtering by Condition

Command:

# Print users with UID (field 3) greater than 500
echo "user1:x:100:
user2:x:1000:
user3:x:450:" | awk -F: '$3 > 500 { print $1 }'

Result:

user2

4. Advanced: Calculating Sums

Command:

# Calculate the total file size in the current directory
ls -l | awk '{ sum += $5 } END { print "Total size: " sum " bytes" }'

Result: (Depends on your directory)

Total size: 45023 bytes

sed: The Stream Editor

sed is primarily used for finding and replacing text in files or streams.

Basic Syntax

sed 's/old/new/g' file

Examples with Results

1. Basic Substitution

Command:

echo "The server is DOWN" | sed 's/DOWN/UP/'

Result:

The server is UP

2. In-place File Editing

Use -i to save changes directly to the file.

sed -i 's/localhost/127.0.0.1/g' config.yaml

3. Delete Lines

Command:

# Delete the second line
echo -e "Line 1\nLine 2\nLine 3" | sed '2d'

Result:

Line 1
Line 3

jq: The JSON Processor

In modern DevOps, managing JSON (from APIs, AWS CLI, Kubernetes) is mandatory. jq is the industry standard.

Examples with Results

1. Extract a Field

Command:

echo '{"name": "production", "status": "running"}' | jq '.status'

Result:

"running"

2. Extract from an Array

Command:

echo '[{"id": 1, "name": "web"}, {"id": 2, "name": "db"}]' | jq '.[0].name'

Result:

"web"

3. Filtering Arrays

Command:

echo '[{"name": "app1", "cpu": 80}, {"name": "app2", "cpu": 10}]' | jq '.[] | select(.cpu > 50) | .name'

Result:

"app1"

DevOps Tool Comparison

ToolBest Use Case
grepSearching for patterns in lines
sedModifying text (search and replace)
awkProcessing columnar data and reports
jqParsing and transforming JSON data

Practice Exercise

Try parsing the output of df -h to find filesystems used more than 80%:

df -h | awk 'NR>1 && +$5 > 80 { print $1 ": " $5 }'