Skip to content

Text Manipulation

ToolUse it for
cat / tacView / reverse-view file contents
lessPage through large files
head / tailFirst/last N lines
tail -fFollow a growing log file
grepSearch for patterns in text
sedStream editing - find/replace, delete lines
awkField extraction, structured data processing
sort / uniqSort and deduplicate lines
cut / paste / joinColumn operations
trTranslate or delete characters
wcCount lines, words, characters
teeSplit stdout - to screen AND a file
splitBreak large files into segments
stringsExtract printable text from binary files

cat reads and prints files; its main purpose is to combine (concatenate) multiple files. The tac command prints lines in reverse order.

Terminal window
cat filename # view file
cat file1 file2 # concatenate and display
cat file1 file2 > newfile # combine into new file
cat file >> existingfile # append to existing file
tac file # reverse-order lines

If no file argument is given, cat reads from stdin (your keyboard):

Terminal window
cat > newfile # type content, CTRL-D to save
cat >> existingfile # append typed content
cat > newfile << EOF # heredoc - type until you enter 'EOF' on its own line
Hello world
EOF
Terminal window
echo "Hello world" # print string
echo $USERNAME # print environment variable value
echo -e "Line1\nLine2" # -e enables escape sequences (\n, \t)
echo string > newfile # write to file (overwrite)
echo string >> existingfile # append to file

Opening large files in editors causes problems - the editor tries to load the entire file into memory. less avoids this by streaming:

Terminal window
less filename # open and page
cat filename | less # pipe into less
# Inside less:
# /pattern - search forward
# ?pattern - search backward
# n / N - next / previous match
# q - quit
# g / G - first / last line
Terminal window
head filename # first 10 lines (default)
head -n 20 filename # first 20 lines
head -20 filename # shorthand
tail filename # last 10 lines (default)
tail -n 20 filename # last 20 lines
tail -f logfile # follow mode - stream new lines as they appear
tail -F logfile # follow by filename - handles log rotation

tail -f is essential for monitoring logs during deployments, troubleshooting, or service restarts.


sed (stream editor) is a powerful text processing tool. It modifies the contents of a file or input stream according to a set of editing commands, and outputs the result to stdout without modifying the original file (unless -i is used).

How it works internally: Data flows line-by-line from input to a working space. The entire list of sed operations is applied to each line in the working space, then the (possibly modified) line moves to stdout.

Terminal window
sed -e 'command' filename # specify editing command directly
sed -f scriptfile filename # read commands from a script file
echo "I hate Mondays" | sed 's/hate/love/' # filter stdin

The -e option allows multiple editing commands at once; it’s unnecessary for a single operation.

Terminal window
sed 's/pattern/replace/' file # replace first occurrence per line
sed 's/pattern/replace/g' file # replace all occurrences (global)
sed '1,3s/pattern/replace/g' file # replace in lines 1–3 only
sed 's/foo/bar/2' file # replace only 2nd occurrence per line
Terminal window
sed -i 's/old/new/g' file # modify file in place
Terminal window
sed '/pattern/d' file # delete lines matching pattern
sed -n '5,10p' file # print only lines 5 through 10
sed 's/^ *//' file # strip leading whitespace

awk extracts and processes structured text - files with field-separated columns. It’s named after its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan (Bell Labs, 1970s).

Key concepts:

  • A record is a line of input
  • A field is a piece of data within a record (a column)
  • The default field separator is whitespace; use -F to change it
  • $0 is the whole line; $1, $2, $3 are individual fields
Terminal window
awk '{ print $0 }' /etc/passwd # print entire file
awk -F: '{ print $1 }' /etc/passwd # print first field (username)
awk -F: '{ print $1, $7 }' /etc/passwd # print username and shell
awk 'NR==5, NR==10 { print }' file # print lines 5–10
awk '/pattern/ { print $1 }' file # print field 1 where line matches
awk '{ sum += $3 } END { print sum }' data.csv # sum third column
Terminal window
awk -f script.awk file # read awk program from file

sort rearranges lines in ascending or descending order. Default key: ASCII character order (effectively alphabetical).

Terminal window
sort filename # alphabetical sort
sort -r filename # reverse order
sort -k 3 filename # sort by 3rd field
sort -u filename # sort + deduplicate (equivalent to sort | uniq)
sort -n filename # numeric sort (not lexicographic)
sort -rn filename # numeric reverse sort
cat file1 file2 | sort # sort combined output of two files

uniq removes consecutive duplicate lines. Because it only operates on adjacent lines, it almost always follows sort in a pipeline:

Terminal window
sort file | uniq # remove all duplicate lines
sort -u file # same, in one step
sort file1 file2 | uniq > file3 # combine, deduplicate, save
uniq -c filename # count occurrences (prefix with count)
uniq -d filename # show only duplicate lines
uniq -u filename # show only unique (non-duplicate) lines
Terminal window
paste file1 file2 # merge line-by-line (tab-separated by default)
paste -d, file1 file2 # use comma as delimiter
paste -s file1 # merge all lines of file1 into one line
join file1 file2 # join files on common first field

join is an enhanced paste - it checks for matching fields before joining, like a database inner join.

split breaks large files into equal-sized segments. The original file is unchanged; new files get an auto-incremented suffix:

Terminal window
split filename # split into 1000-line segments (default)
split -l 500 filename prefix_ # 500-line segments with custom prefix
split -b 10M largefile.tar.gz seg_ # split by size (10 MB chunks)

cut extracts columns from structured text:

Terminal window
cut -d: -f1 /etc/passwd # field 1 using : as delimiter (username)
cut -d: -f1,7 /etc/passwd # fields 1 and 7
cut -c1-10 file # characters 1–10 of each line
ls -l | cut -d" " -f3 # third column from ls output

Regular expressions define patterns to match - used in grep, sed, awk, and many other tools.

PatternMatches
.Any single character
*Preceding item 0 or more times
+Preceding item 1 or more times (ERE)
?Preceding item 0 or 1 times (ERE)
^Beginning of line
$End of line
[abc]Any of a, b, or c
[^abc]Not a, b, or c
a|ba or b
(abc)Group (ERE)
Terminal window
grep pattern filename # print lines matching pattern
grep -v pattern filename # print lines NOT matching
grep -i pattern filename # case-insensitive
grep -n pattern filename # include line numbers
grep -r pattern /dir # recursive search in directory
grep -l pattern /dir/* # list only filenames that match
grep -c pattern filename # count matching lines
grep -C 3 pattern filename # 3 lines of context around matches
grep '[0-9]' filename # lines containing any digit
grep '^ERROR' logfile # lines starting with ERROR
# Extended regex (no need to escape | + ?)
grep -E 'error|warn|critical' log
egrep 'error|warn|critical' log # same thing
Terminal window
strings binary_file # print all printable strings
strings book1.xls | grep my_string # find text in spreadsheet

Useful for inspecting compiled programs or data files to find embedded strings, version information, or error messages.


tr translates (replaces) or deletes characters from the input stream. It reads from stdin only - always used in a pipeline.

Terminal window
tr 'a-z' 'A-Z' < file.txt # convert lowercase to uppercase
tr '{}' '()' < input.txt > output.txt # replace { with (, } with )
echo "hello world" | tr -s ' ' # squeeze repeated spaces to one
echo "remove digits" | tr -cd '[:alpha:] \n' # delete everything but letters
echo "a1b2c3" | tr -d '[:digit:]' # delete all digits
tr -s '\n' ' ' < file.txt # join all lines into one line
tr -cd '[:print:]' < file.txt # remove non-printable characters

tee sends stdout to both a file and the terminal simultaneously. Essential for logging script output while still seeing it live:

Terminal window
ls -l | tee listing.txt # see output AND save to file
command 2>&1 | tee install.log # capture stdout + stderr to file
cat listing.txt # verify the saved output
Terminal window
wc filename # lines, words, bytes
wc -l filename # line count only
wc -w filename # word count only
wc -c filename # byte count
wc -m filename # character count (differs from bytes for multibyte)
wc /etc/passwd # count users + lines

Many standard text tools won’t work on compressed files - use the z* variants for gzip-compressed files, or bz*/xz* equivalents.

CommandDescription
zcat file.gzView compressed file
zless file.gzPage through compressed file
zgrep pattern file.gzSearch inside compressed file
zdiff file1.gz file2.gzCompare two compressed files
bzcat file.bz2View bzip2-compressed file
xzcat file.xzView xz-compressed file