Text Manipulation

Quick Reference

Tool	Use it for
`cat` / `tac`	View / reverse-view file contents
`less`	Page through large files
`head` / `tail`	First/last N lines
`tail -f`	Follow a growing log file
`grep`	Search for patterns in text
`sed`	Stream editing - find/replace, delete lines
`awk`	Field extraction, structured data processing
`sort` / `uniq`	Sort and deduplicate lines
`cut` / `paste` / `join`	Column operations
`tr`	Translate or delete characters
`wc`	Count lines, words, characters
`tee`	Split stdout - to screen AND a file
`split`	Break large files into segments
`strings`	Extract printable text from binary files

Viewing Files

`cat` - Concatenate and Display

cat reads and prints files; its main purpose is to combine (concatenate) multiple files. The tac command prints lines in reverse order.

cat filename                        # view file
cat file1 file2                     # concatenate and display
cat file1 file2 > newfile           # combine into new file
cat file >> existingfile            # append to existing file
tac file                            # reverse-order lines

Interactive Input with `cat`

If no file argument is given, cat reads from stdin (your keyboard):

cat > newfile           # type content, CTRL-D to save
cat >> existingfile     # append typed content
cat > newfile << EOF    # heredoc - type until you enter 'EOF' on its own line
Hello world
EOF

`echo` - Display Text

echo "Hello world"                 # print string
echo $USERNAME                     # print environment variable value
echo -e "Line1\nLine2"             # -e enables escape sequences (\n, \t)
echo string > newfile              # write to file (overwrite)
echo string >> existingfile        # append to file

`less` - Page Through Large Files

Opening large files in editors causes problems - the editor tries to load the entire file into memory. less avoids this by streaming:

less filename                   # open and page
cat filename | less             # pipe into less

# Inside less:
# /pattern   - search forward
# ?pattern   - search backward
# n / N      - next / previous match
# q          - quit
# g / G      - first / last line

`head` and `tail`

head filename              # first 10 lines (default)
head -n 20 filename        # first 20 lines
head -20 filename          # shorthand

tail filename              # last 10 lines (default)
tail -n 20 filename        # last 20 lines
tail -f logfile            # follow mode - stream new lines as they appear
tail -F logfile            # follow by filename - handles log rotation

tail -f is essential for monitoring logs during deployments, troubleshooting, or service restarts.

Stream Editor - `sed`

sed (stream editor) is a powerful text processing tool. It modifies the contents of a file or input stream according to a set of editing commands, and outputs the result to stdout without modifying the original file (unless -i is used).

How it works internally: Data flows line-by-line from input to a working space. The entire list of sed operations is applied to each line in the working space, then the (possibly modified) line moves to stdout.

sed -e 'command' filename            # specify editing command directly
sed -f scriptfile filename           # read commands from a script file
echo "I hate Mondays" | sed 's/hate/love/'   # filter stdin

The -e option allows multiple editing commands at once; it’s unnecessary for a single operation.

Substitution

sed 's/pattern/replace/' file         # replace first occurrence per line
sed 's/pattern/replace/g' file        # replace all occurrences (global)
sed '1,3s/pattern/replace/g' file     # replace in lines 1–3 only
sed 's/foo/bar/2' file                # replace only 2nd occurrence per line

Editing In-Place

sed -i 's/old/new/g' file            # modify file in place

Other Operations

sed '/pattern/d' file                 # delete lines matching pattern
sed -n '5,10p' file                   # print only lines 5 through 10
sed 's/^ *//' file                    # strip leading whitespace

`awk` - Structured Field Processing

awk extracts and processes structured text - files with field-separated columns. It’s named after its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan (Bell Labs, 1970s).

Key concepts:

A record is a line of input
A field is a piece of data within a record (a column)
The default field separator is whitespace; use -F to change it
$0 is the whole line; $1, $2, $3 are individual fields

awk '{ print $0 }' /etc/passwd           # print entire file
awk -F: '{ print $1 }' /etc/passwd       # print first field (username)
awk -F: '{ print $1, $7 }' /etc/passwd   # print username and shell
awk 'NR==5, NR==10 { print }' file       # print lines 5–10
awk '/pattern/ { print $1 }' file        # print field 1 where line matches
awk '{ sum += $3 } END { print sum }' data.csv   # sum third column

awk -f script.awk file                   # read awk program from file

File Manipulation Utilities

`sort`

sort rearranges lines in ascending or descending order. Default key: ASCII character order (effectively alphabetical).

sort filename                    # alphabetical sort
sort -r filename                 # reverse order
sort -k 3 filename               # sort by 3rd field
sort -u filename                 # sort + deduplicate (equivalent to sort | uniq)
sort -n filename                 # numeric sort (not lexicographic)
sort -rn filename                # numeric reverse sort
cat file1 file2 | sort           # sort combined output of two files

`uniq`

uniq removes consecutive duplicate lines. Because it only operates on adjacent lines, it almost always follows sort in a pipeline:

sort file | uniq                         # remove all duplicate lines
sort -u file                             # same, in one step
sort file1 file2 | uniq > file3         # combine, deduplicate, save

uniq -c filename                         # count occurrences (prefix with count)
uniq -d filename                         # show only duplicate lines
uniq -u filename                         # show only unique (non-duplicate) lines

`paste` and `join`

paste file1 file2                   # merge line-by-line (tab-separated by default)
paste -d, file1 file2               # use comma as delimiter
paste -s file1                      # merge all lines of file1 into one line

join file1 file2                    # join files on common first field

join is an enhanced paste - it checks for matching fields before joining, like a database inner join.

`split`

split breaks large files into equal-sized segments. The original file is unchanged; new files get an auto-incremented suffix:

split filename                       # split into 1000-line segments (default)
split -l 500 filename prefix_        # 500-line segments with custom prefix
split -b 10M largefile.tar.gz seg_  # split by size (10 MB chunks)

`cut`

cut extracts columns from structured text:

cut -d: -f1 /etc/passwd              # field 1 using : as delimiter (username)
cut -d: -f1,7 /etc/passwd            # fields 1 and 7
cut -c1-10 file                      # characters 1–10 of each line
ls -l | cut -d" " -f3                # third column from ls output

Regex and Search

Regular Expression Quick Reference

Regular expressions define patterns to match - used in grep, sed, awk, and many other tools.

Pattern	Matches
`.`	Any single character
`*`	Preceding item 0 or more times
`+`	Preceding item 1 or more times (ERE)
`?`	Preceding item 0 or 1 times (ERE)
`^`	Beginning of line
`$`	End of line
`[abc]`	Any of a, b, or c
`[^abc]`	Not a, b, or c
`a\|b`	a or b
`(abc)`	Group (ERE)

`grep` - Search for Patterns

grep pattern filename                 # print lines matching pattern
grep -v pattern filename              # print lines NOT matching
grep -i pattern filename              # case-insensitive
grep -n pattern filename              # include line numbers
grep -r pattern /dir                  # recursive search in directory
grep -l pattern /dir/*               # list only filenames that match
grep -c pattern filename              # count matching lines
grep -C 3 pattern filename            # 3 lines of context around matches
grep '[0-9]' filename                 # lines containing any digit
grep '^ERROR' logfile                 # lines starting with ERROR

# Extended regex (no need to escape | + ?)
grep -E 'error|warn|critical' log
egrep 'error|warn|critical' log       # same thing

`strings` - Extract Text from Binaries

strings binary_file                   # print all printable strings
strings book1.xls | grep my_string    # find text in spreadsheet

Useful for inspecting compiled programs or data files to find embedded strings, version information, or error messages.

Miscellaneous Tools

`tr` - Translate or Delete Characters

tr translates (replaces) or deletes characters from the input stream. It reads from stdin only - always used in a pipeline.

tr 'a-z' 'A-Z' < file.txt                      # convert lowercase to uppercase
tr '{}' '()' < input.txt > output.txt           # replace { with (, } with )
echo "hello   world" | tr -s ' '               # squeeze repeated spaces to one
echo "remove digits" | tr -cd '[:alpha:] \n'   # delete everything but letters
echo "a1b2c3" | tr -d '[:digit:]'              # delete all digits
tr -s '\n' ' ' < file.txt                       # join all lines into one line
tr -cd '[:print:]' < file.txt                  # remove non-printable characters

`tee` - Split the Output Stream

tee sends stdout to both a file and the terminal simultaneously. Essential for logging script output while still seeing it live:

ls -l | tee listing.txt               # see output AND save to file
command 2>&1 | tee install.log        # capture stdout + stderr to file
cat listing.txt                        # verify the saved output

`wc` - Count Lines, Words, Characters

wc filename              # lines, words, bytes
wc -l filename           # line count only
wc -w filename           # word count only
wc -c filename           # byte count
wc -m filename           # character count (differs from bytes for multibyte)
wc /etc/passwd           # count users + lines

Working with Compressed Files

Many standard text tools won’t work on compressed files - use the z* variants for gzip-compressed files, or bz*/xz* equivalents.

Command	Description
`zcat file.gz`	View compressed file
`zless file.gz`	Page through compressed file
`zgrep pattern file.gz`	Search inside compressed file
`zdiff file1.gz file2.gz`	Compare two compressed files
`bzcat file.bz2`	View bzip2-compressed file
`xzcat file.xz`	View xz-compressed file

Text Manipulation

Quick Reference

Viewing Files

cat - Concatenate and Display

Interactive Input with cat

echo - Display Text

less - Page Through Large Files

head and tail

Stream Editor - sed