Shell Course 3
Chapter 3: Working with Text
Text processing is where the shell truly shines. In our increasingly data-driven world, the ability to extract, transform, and analyze text efficiently is invaluable. Whether you’re parsing log files, cleaning data sets, or automating document processing, the shell offers a remarkable toolkit for text manipulation.
Let’s dive deeper into the tools that make text processing in the shell so powerful.
Before we explore text processing commands, we need to understand how to control where text comes from and where it goes.
Standard Streams
The shell uses three standard “streams” for input and output:
- stdin (0): Standard input - where commands read their input
- stdout (1): Standard output - where commands send their normal output
- stderr (2): Standard error - where commands send error messages
Redirecting Output
To save command output to a file instead of displaying it on the screen:
1 | $ ls -l > file_list.txt # Save output to file (overwrites existing file) |
To redirect error messages:
1 | $ find / -name "*.conf" 2> errors.txt # Save only errors to file |
To redirect both standard output and errors to the same file:
1 | $ ls -l non_existent_file > output.txt 2>&1 # Redirect both to output.txt |
1 | $ ls -l non_existent_file &> output.txt # Redirect both to output.txt |
Redirecting Input
To use a file as input to a command:
1 | $ sort < unsorted_list.txt # Use file content as input to sort |
The pipe operator (|
) connects the output of one command to the input of another, allowing you to build powerful command chains.
1 | $ ls -l | grep "Mar" # List files and filter for those containing "Mar" |
Pipes can be chained to create complex data processing workflows:
1 | $ cat access.log | grep "ERROR" | sort | uniq -c | sort -nr |
This pipeline:
- Reads the log file
- Filters for lines containing “ERROR”
- Sorts the matching lines
- Counts unique occurrences
- Sorts numerically in reverse order (most frequent first)
Now let’s explore the essential text processing tools in your shell toolkit.
The Swiss Army Knives: grep
, sed
, and awk
These three commands form the cornerstone of text processing in the shell. They each deserve their own book, but we’ll cover the essentials.
grep
: Pattern Matching and Text Search
We introduced grep
in the previous chapter, but let’s explore it further:
1 | $ grep "pattern" file.txt # Find lines matching pattern |
grep
comes from the ed editor command g/re/p
(globally search for a regular expression and print matching lines) - a delightful piece of computing archaeology that reminds us how command names that seem arbitrary today often have perfectly logical historical origins.
sed
: Stream Editor
sed
is designed for transforming text with search-and-replace operations:
1 | $ sed 's/old/new/' file.txt # Replace first occurrence on each line |
sed
prints every line, modified or not. To print only modified lines:
1 | $ sed -n 's/old/new/p' file.txt # Print only lines that were changed |
To delete lines:
1 | $ sed '5d' file.txt # Delete line 5 |
Remember that sed
doesn’t modify the original file. To save changes:
1 | $ sed 's/old/new/g' file.txt > new_file.txt # Save to new file |
-i
option for in-place editing is like performing surgery without a backup plan. Consider using -i.bak
instead, which creates a backup file with the .bak extension:
1 | $ sed -i.bak 's/old/new/g' file.txt # Creates file.txt.bak before modifying |
awk
: Text Processing Language
While grep
finds patterns and sed
performs replacements, awk
is a full-featured text processing language for more complex transformations:
1 | $ awk '{print $1}' file.txt # Print first column of each line |
awk
particularly powerful:
1 | $ awk '/pattern/ {print $1}' file.txt # Print first column of matching lines |
awk
can format output with printf:
1 | $ awk '{printf "Name: %-10s Age: %d\n", $1, $2}' people.txt |
awk
is rather like discovering that your adjustable spanner is actually a complete workshop in disguise. What seems like a simple command at first reveals itself to be a full programming language with variables, functions, and control structures.
Sorting and Uniqueness
sort
: Arranging Text
The sort
command arranges lines of text:
1 | $ sort file.txt # Sort alphabetically |
uniq
: Finding Unique Lines
The uniq
command reports or filters out repeated lines:
1 | $ uniq file.txt # Remove adjacent duplicate lines |
1 | $ sort file.txt | uniq -c | sort -nr # Count and sort by frequency |
Cutting and Pasting Text
cut
: Extracting Columns
The cut
command extracts sections from each line:
1 | $ cut -c 1-5 file.txt # Characters 1-5 from each line |
paste
: Merging Lines
The paste
command merges lines from different files:
1 | $ paste file1.txt file2.txt # Combine lines side by side with tabs |
Text Transformation
tr
: Translating Characters
The tr
command translates or deletes characters:
1 | $ cat file.txt | tr 'a-z' 'A-Z' # Convert to uppercase |
rev
: Reversing Lines
The rev
command reverses characters in each line:
1 | $ echo "hello" | rev # Outputs "olleh" |
rev
can be surprisingly useful for certain text processing tasks, such as extracting parts of filenames from the end.
fold
: Wrapping Lines
The fold
command wraps lines to a specified width:
1 | $ fold -w 80 file.txt # Wrap at 80 characters |
Sometimes you need to edit text directly rather than processing it through pipes. Let’s explore the most common text editors available in the shell.
nano
: Beginner-Friendly Editor
Nano is a simple, user-friendly editor that displays commands at the bottom of the screen:
1 | $ nano file.txt |
Common commands (^ means Ctrl):
- ^O: Save file
- ^X: Exit
- ^K: Cut line
- ^U: Paste
- ^W: Search
- ^G: Get help
vim
: The Programmer’s Editor
Vim is a powerful, modal editor with a steeper learning curve but tremendous capabilities:
1 | $ vim file.txt |
Vim has different modes:
- Normal mode: For navigation and commands (default)
- Insert mode: For typing text (press
i
to enter) - Visual mode: For selecting text (press
v
to enter) - Command mode: For saving, quitting, etc. (press
:
to enter)
:q!
to quit without saving or :wq
to save and quit - consider this your emergency exit information.
Basic vim commands (in normal mode):
i
: Enter insert modeEsc
: Return to normal mode:w
: Save file:q
: Quit:wq
: Save and quit:q!
: Quit without saving/pattern
: Search for patternn
: Next search resultdd
: Delete lineyy
: Copy linep
: Paste
emacs
: The Extensible Editor
Emacs is another powerful editor with its own ecosystem:
1 | $ emacs file.txt |
Common commands:
- C-x C-s: Save file (Ctrl+x, Ctrl+s)
- C-x C-c: Exit
- C-k: Cut line
- C-y: Paste
- C-s: Search forward
Let’s cement our understanding with some practical examples of text processing in the shell.
Example 1: Analyzing Log Files
Extract all ERROR messages from a log file, count their occurrences, and sort by frequency:
1 | $ grep "ERROR" application.log | cut -d: -f4 | sort | uniq -c | sort -nr |
Example 2: CSV Processing
Extract and format specific columns from a CSV file:
1 | $ cat data.csv | cut -d, -f1,3,5 | sed 's/,/ | /g' | sort -k 2 |
Example 3: Finding Duplicate Files
List potential duplicate files based on size:
1 | $ find . -type f -exec ls -l {} \; | awk '{print $5, $9}' | sort -n | uniq -D -f1 |
Example 4: Word Count in a Document
Count words in a document and find the most common ones:
1 | $ cat document.txt | tr -cs '[:alpha:]' '\n' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -nr | head -10 |
This pipeline:
- Extracts words by converting non-letters to newlines
- Converts everything to lowercase
- Sorts the words
- Counts occurrences
- Sorts by frequency (most common first)
- Shows the top 10
Example 5: Converting File Formats
Convert a Windows text file to Unix format (removing carriage returns):
1 | $ cat windows.txt | tr -d '\r' > unix.txt |
Many text processing tools use regular expressions (regex) for pattern matching. While a complete regex tutorial is beyond our scope, here are some fundamental patterns:
.
- Matches any single character^
- Matches the start of a line$
- Matches the end of a line*
- Matches zero or more of the previous character+
- Matches one or more of the previous character?
- Matches zero or one of the previous character[abc]
- Matches any one of the characters in brackets[^abc]
- Matches any character NOT in brackets\d
- Matches a digit\w
- Matches a word character (alphanumeric)\s
- Matches a whitespace character
1 | $ grep "^#" file.txt # Lines starting with # |
Let’s practice what we’ve learned with some hands-on exercises:
Create a text file with the following content:
1
2
3
4
5Alice,25,London
Bob,32,Manchester
Charlie,45,Birmingham
Diana,28,Glasgow
Edward,39,LiverpoolExtract only the names and ages:
1
$ cut -d, -f1,2 people.txt
Sort the file by age (numerically):
1
$ sort -t, -k2,2n people.txt
Replace all commas with tabs:
1
$ cat people.txt | tr ',' '\t'
Add line numbers to the file:
1
$ cat -n people.txt
Find all people from cities that contain the letter ‘m’:
1
$ grep -i "m" people.txt
cut -d, -f3 people.txt | sort | uniq
Text processing is one of the shell’s greatest strengths. The commands we’ve explored in this chapter provide a powerful toolkit for manipulating, analyzing, and transforming text data of all kinds. While there’s certainly more to learn, mastering these fundamental tools will enable you to handle a vast array of text processing tasks efficiently.
Remember that the real power comes from combining these tools using pipes and redirection. Each command does one thing well, and their strength emerges when you connect them in creative ways to solve complex problems.
In the next chapter, we’ll explore process management – how to control running programs, monitor system resources, and multitask effectively in the shell environment.