Text Manipulation
Quick Reference
Section titled “Quick Reference”| Tool | Use it for |
|---|---|
cat / tac | View / reverse-view file contents |
less | Page through large files |
head / tail | First/last N lines |
tail -f | Follow a growing log file |
grep | Search for patterns in text |
sed | Stream editing - find/replace, delete lines |
awk | Field extraction, structured data processing |
sort / uniq | Sort and deduplicate lines |
cut / paste / join | Column operations |
tr | Translate or delete characters |
wc | Count lines, words, characters |
tee | Split stdout - to screen AND a file |
split | Break large files into segments |
strings | Extract printable text from binary files |
Viewing Files
Section titled “Viewing Files”cat - Concatenate and Display
Section titled “cat - Concatenate and Display”cat reads and prints files; its main purpose is to combine (concatenate) multiple files. The tac command prints lines in reverse order.
cat filename # view filecat file1 file2 # concatenate and displaycat file1 file2 > newfile # combine into new filecat file >> existingfile # append to existing filetac file # reverse-order linesInteractive Input with cat
Section titled “Interactive Input with cat”If no file argument is given, cat reads from stdin (your keyboard):
cat > newfile # type content, CTRL-D to savecat >> existingfile # append typed contentcat > newfile << EOF # heredoc - type until you enter 'EOF' on its own lineHello worldEOFecho - Display Text
Section titled “echo - Display Text”echo "Hello world" # print stringecho $USERNAME # print environment variable valueecho -e "Line1\nLine2" # -e enables escape sequences (\n, \t)echo string > newfile # write to file (overwrite)echo string >> existingfile # append to fileless - Page Through Large Files
Section titled “less - Page Through Large Files”Opening large files in editors causes problems - the editor tries to load the entire file into memory. less avoids this by streaming:
less filename # open and pagecat filename | less # pipe into less
# Inside less:# /pattern - search forward# ?pattern - search backward# n / N - next / previous match# q - quit# g / G - first / last linehead and tail
Section titled “head and tail”head filename # first 10 lines (default)head -n 20 filename # first 20 lineshead -20 filename # shorthand
tail filename # last 10 lines (default)tail -n 20 filename # last 20 linestail -f logfile # follow mode - stream new lines as they appeartail -F logfile # follow by filename - handles log rotationtail -f is essential for monitoring logs during deployments, troubleshooting, or service restarts.
Stream Editor - sed
Section titled “Stream Editor - sed”sed (stream editor) is a powerful text processing tool. It modifies the contents of a file or input stream according to a set of editing commands, and outputs the result to stdout without modifying the original file (unless -i is used).
How it works internally: Data flows line-by-line from input to a working space. The entire list of sed operations is applied to each line in the working space, then the (possibly modified) line moves to stdout.
sed -e 'command' filename # specify editing command directlysed -f scriptfile filename # read commands from a script fileecho "I hate Mondays" | sed 's/hate/love/' # filter stdinThe -e option allows multiple editing commands at once; it’s unnecessary for a single operation.
Substitution
Section titled “Substitution”sed 's/pattern/replace/' file # replace first occurrence per linesed 's/pattern/replace/g' file # replace all occurrences (global)sed '1,3s/pattern/replace/g' file # replace in lines 1–3 onlysed 's/foo/bar/2' file # replace only 2nd occurrence per lineEditing In-Place
Section titled “Editing In-Place”sed -i 's/old/new/g' file # modify file in placeOther Operations
Section titled “Other Operations”sed '/pattern/d' file # delete lines matching patternsed -n '5,10p' file # print only lines 5 through 10sed 's/^ *//' file # strip leading whitespaceawk - Structured Field Processing
Section titled “awk - Structured Field Processing”awk extracts and processes structured text - files with field-separated columns. It’s named after its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan (Bell Labs, 1970s).
Key concepts:
- A record is a line of input
- A field is a piece of data within a record (a column)
- The default field separator is whitespace; use
-Fto change it $0is the whole line;$1,$2,$3are individual fields
awk '{ print $0 }' /etc/passwd # print entire fileawk -F: '{ print $1 }' /etc/passwd # print first field (username)awk -F: '{ print $1, $7 }' /etc/passwd # print username and shellawk 'NR==5, NR==10 { print }' file # print lines 5–10awk '/pattern/ { print $1 }' file # print field 1 where line matchesawk '{ sum += $3 } END { print sum }' data.csv # sum third columnawk -f script.awk file # read awk program from fileFile Manipulation Utilities
Section titled “File Manipulation Utilities”sort rearranges lines in ascending or descending order. Default key: ASCII character order (effectively alphabetical).
sort filename # alphabetical sortsort -r filename # reverse ordersort -k 3 filename # sort by 3rd fieldsort -u filename # sort + deduplicate (equivalent to sort | uniq)sort -n filename # numeric sort (not lexicographic)sort -rn filename # numeric reverse sortcat file1 file2 | sort # sort combined output of two filesuniq removes consecutive duplicate lines. Because it only operates on adjacent lines, it almost always follows sort in a pipeline:
sort file | uniq # remove all duplicate linessort -u file # same, in one stepsort file1 file2 | uniq > file3 # combine, deduplicate, save
uniq -c filename # count occurrences (prefix with count)uniq -d filename # show only duplicate linesuniq -u filename # show only unique (non-duplicate) linespaste and join
Section titled “paste and join”paste file1 file2 # merge line-by-line (tab-separated by default)paste -d, file1 file2 # use comma as delimiterpaste -s file1 # merge all lines of file1 into one line
join file1 file2 # join files on common first fieldjoin is an enhanced paste - it checks for matching fields before joining, like a database inner join.
split breaks large files into equal-sized segments. The original file is unchanged; new files get an auto-incremented suffix:
split filename # split into 1000-line segments (default)split -l 500 filename prefix_ # 500-line segments with custom prefixsplit -b 10M largefile.tar.gz seg_ # split by size (10 MB chunks)cut extracts columns from structured text:
cut -d: -f1 /etc/passwd # field 1 using : as delimiter (username)cut -d: -f1,7 /etc/passwd # fields 1 and 7cut -c1-10 file # characters 1–10 of each linels -l | cut -d" " -f3 # third column from ls outputRegex and Search
Section titled “Regex and Search”Regular Expression Quick Reference
Section titled “Regular Expression Quick Reference”Regular expressions define patterns to match - used in grep, sed, awk, and many other tools.
| Pattern | Matches |
|---|---|
. | Any single character |
* | Preceding item 0 or more times |
+ | Preceding item 1 or more times (ERE) |
? | Preceding item 0 or 1 times (ERE) |
^ | Beginning of line |
$ | End of line |
[abc] | Any of a, b, or c |
[^abc] | Not a, b, or c |
a|b | a or b |
(abc) | Group (ERE) |
grep - Search for Patterns
Section titled “grep - Search for Patterns”grep pattern filename # print lines matching patterngrep -v pattern filename # print lines NOT matchinggrep -i pattern filename # case-insensitivegrep -n pattern filename # include line numbersgrep -r pattern /dir # recursive search in directorygrep -l pattern /dir/* # list only filenames that matchgrep -c pattern filename # count matching linesgrep -C 3 pattern filename # 3 lines of context around matchesgrep '[0-9]' filename # lines containing any digitgrep '^ERROR' logfile # lines starting with ERROR
# Extended regex (no need to escape | + ?)grep -E 'error|warn|critical' logegrep 'error|warn|critical' log # same thingstrings - Extract Text from Binaries
Section titled “strings - Extract Text from Binaries”strings binary_file # print all printable stringsstrings book1.xls | grep my_string # find text in spreadsheetUseful for inspecting compiled programs or data files to find embedded strings, version information, or error messages.
Miscellaneous Tools
Section titled “Miscellaneous Tools”tr - Translate or Delete Characters
Section titled “tr - Translate or Delete Characters”tr translates (replaces) or deletes characters from the input stream. It reads from stdin only - always used in a pipeline.
tr 'a-z' 'A-Z' < file.txt # convert lowercase to uppercasetr '{}' '()' < input.txt > output.txt # replace { with (, } with )echo "hello world" | tr -s ' ' # squeeze repeated spaces to oneecho "remove digits" | tr -cd '[:alpha:] \n' # delete everything but lettersecho "a1b2c3" | tr -d '[:digit:]' # delete all digitstr -s '\n' ' ' < file.txt # join all lines into one linetr -cd '[:print:]' < file.txt # remove non-printable characterstee - Split the Output Stream
Section titled “tee - Split the Output Stream”tee sends stdout to both a file and the terminal simultaneously. Essential for logging script output while still seeing it live:
ls -l | tee listing.txt # see output AND save to filecommand 2>&1 | tee install.log # capture stdout + stderr to filecat listing.txt # verify the saved outputwc - Count Lines, Words, Characters
Section titled “wc - Count Lines, Words, Characters”wc filename # lines, words, byteswc -l filename # line count onlywc -w filename # word count onlywc -c filename # byte countwc -m filename # character count (differs from bytes for multibyte)wc /etc/passwd # count users + linesWorking with Compressed Files
Section titled “Working with Compressed Files”Many standard text tools won’t work on compressed files - use the z* variants for gzip-compressed files, or bz*/xz* equivalents.
| Command | Description |
|---|---|
zcat file.gz | View compressed file |
zless file.gz | Page through compressed file |
zgrep pattern file.gz | Search inside compressed file |
zdiff file1.gz file2.gz | Compare two compressed files |
bzcat file.bz2 | View bzip2-compressed file |
xzcat file.xz | View xz-compressed file |