Malware Analysis Basics

Malware analysis is the process of understanding what a malicious program does, how it works, and what indicators it leaves behind. This note covers static and dynamic analysis techniques, sandboxing, hash-based identification, YARA rules, and common indicators of compromise (IOCs).

Analysis Approaches

STATIC ANALYSIS              DYNAMIC ANALYSIS
─────────────────            ────────────────────────────────
Examine the file without     Execute the sample in a controlled
executing it                 environment and observe behaviour

Techniques:                  Techniques:
  File type identification     Process monitoring
  Hash computation             Network traffic capture
  String extraction            Registry/file system changes
  Disassembly / decompilation  Memory inspection

Pros: Safe (no execution)    Pros: Reveals actual runtime behaviour
Cons: Obfuscation can hide   Cons: Sandbox evasion; detonation needed
      true behaviour

File Identification

The first step: understand what you’re dealing with before touching it.

# Identify file type (don't trust the extension)
file suspicious.exe
# ELF 64-bit LSB executable...
# PE32 executable (GUI) Intel 80386...
# Zip archive data...

# Common magic bytes (file headers):
# MZ (4D 5A)    → Windows PE executable
# ELF (7F 45 4C 46)  → Linux ELF binary
# PK (50 4B)    → ZIP / Office documents (docx, xlsx are ZIPs)
# %PDF          → PDF file
# \x7fELF       → ELF binary

# Check magic bytes manually
xxd suspicious.exe | head -3
# or
hexdump -C suspicious.exe | head -5

Hash Identification

Cryptographic hashes create a fingerprint for a file. Matching a hash to a known database identifies malware without analysis.

# Compute hashes
md5sum suspicious.exe       # MD5 - fast; still used for identification (not integrity)
sha1sum suspicious.exe      # SHA-1
sha256sum suspicious.exe    # SHA-256 - primary standard for malware IOCs

# Save hash for reference
sha256sum suspicious.exe | awk '{print $1}' > sample.hash
cat sample.hash

# Look up hash on VirusTotal (CLI)
pip install vt-py
export VT_API_KEY="your_api_key_here"

python3 << 'EOF'
import vt, sys

HASH = "275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f"
with vt.Client(os.environ["VT_API_KEY"]) as client:
    file = client.get_object(f"/files/{HASH}")
    print(f"Name: {file.meaningful_name}")
    print(f"Detection: {file.last_analysis_stats['malicious']}/{sum(file.last_analysis_stats.values())} engines")
    print(f"Family: {file.popular_threat_classification.get('suggested_threat_label', 'unknown')}")
EOF

# Or via curl
curl -s --request GET \
  --url "https://www.virustotal.com/api/v3/files/$(sha256sum suspicious.exe | awk '{print $1}')" \
  --header "x-apikey: $VT_API_KEY" | python3 -m json.tool | grep -E "malicious|type_description"

Static Analysis

String Extraction

Strings embedded in binaries reveal intent - URLs, registry keys, file paths, error messages, commands.

# Extract all printable strings from a binary
strings suspicious.exe

# Filter for interesting patterns
strings suspicious.exe | grep -E "http[s]?://"          # URLs
strings suspicious.exe | grep -iE "cmd|powershell|exec" # command execution
strings suspicious.exe | grep -iE "password|passwd|cred" # credential references
strings suspicious.exe | grep -E "HKCU|HKLM|Software"  # Windows registry paths
strings suspicious.exe | grep -E "\.dll|LoadLibrary"   # DLL loading
strings suspicious.exe | grep -E "CreateProcess|ShellExecute|WinExec" # process creation

# Unicode strings (wide strings - common in Windows malware)
strings -el suspicious.exe   # 16-bit little-endian (UTF-16LE)

# Extract emails and IPs
strings suspicious.exe | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
strings suspicious.exe | grep -E "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"

PE File Analysis (Windows)

# Install pefile (Python library)
pip install pefile

python3 << 'EOF'
import pefile
pe = pefile.PE("suspicious.exe")

# Basic info
print(f"Machine: {hex(pe.FILE_HEADER.Machine)}")      # 0x14c = x86, 0x8664 = x64
print(f"Timestamp: {pe.FILE_HEADER.TimeDateStamp}")
print(f"Sections: {[s.Name.decode().strip() for s in pe.sections]}")

# Check for suspicious section characteristics
for section in pe.sections:
    name = section.Name.decode().strip('\x00')
    print(f"Section: {name} | Entropy: {section.get_entropy():.2f} | RawSize: {section.SizeOfRawData}")
    # Entropy > 7.0 suggests packed/encrypted content

# Imported DLLs and functions (reveal capability)
if hasattr(pe, 'DIRECTORY_ENTRY_IMPORT'):
    for entry in pe.DIRECTORY_ENTRY_IMPORT:
        print(f"  DLL: {entry.dll.decode()}")
        for imp in entry.imports:
            if imp.name:
                print(f"    {imp.name.decode()}")
EOF

Suspicious Windows API imports by category:

Category	Suspicious APIs
Process injection	`VirtualAllocEx`, `WriteProcessMemory`, `CreateRemoteThread`
Persistence	`RegSetValueEx`, `CreateService`, `ScheduledTaskCreate`
Network	`InternetOpen`, `WSAConnect`, `URLDownloadToFile`
File system	`FindFirstFile`, `CopyFile`, `DeleteFile`
Privilege escalation	`AdjustTokenPrivileges`, `ImpersonateLoggedOnUser`
Anti-analysis	`IsDebuggerPresent`, `GetTickCount`, `QueryPerformanceCounter`

Entropy Analysis

Packed or encrypted malware has high entropy (random-looking data):

# Check section entropy (entropy > 7.2 = likely packed/encrypted)
python3 -c "
import pefile, math

def entropy(data):
    if not data:
        return 0
    counts = {}
    for byte in data:
        counts[byte] = counts.get(byte, 0) + 1
    return -sum((c/len(data)) * math.log2(c/len(data)) for c in counts.values())

pe = pefile.PE('suspicious.exe')
for s in pe.sections:
    data = s.get_data()
    print(f'{s.Name.decode().strip():10} entropy={entropy(data):.2f}')
"

Sandboxing (Dynamic Analysis)

A sandbox executes malware in a controlled, isolated environment and records everything it does.

Online Sandboxes

Service	Type	Notes
any.run	Interactive	Watch execution in real time; interact with the sample
VirusTotal	Automated	Runs 70+ AV engines + some sandbox analysis
Joe Sandbox	Automated	Detailed Windows/Linux/macOS analysis
Hybrid Analysis	Automated	Free; Falcon Intelligence sandbox
Triage (tria.ge)	Automated	Fast; API access; YARA matching

# Submit a sample via VirusTotal API
curl -s --request POST \
  --url "https://www.virustotal.com/api/v3/files" \
  --header "x-apikey: $VT_API_KEY" \
  --form [email protected] \
  | python3 -m json.tool | grep -E '"id"|"type"'

# Get analysis results after a minute
ANALYSIS_ID="<id from above>"
curl -s --request GET \
  --url "https://www.virustotal.com/api/v3/analyses/$ANALYSIS_ID" \
  --header "x-apikey: $VT_API_KEY" \
  | python3 -m json.tool | grep -E "status|malicious|harmless"

Local Sandbox - Cuckoo

# Cuckoo Sandbox - self-hosted dynamic analysis platform
# (Runs malware in a VM; intercepts system calls and network traffic)

# Installation overview (simplified)
pip install cuckoo
cuckoo init
cuckoo community   # download rules and signatures

# Configure a Windows 7 analysis VM in VirtualBox
# (Cuckoo agent must be installed in the VM)

# Submit a sample
cuckoo submit suspicious.exe
cuckoo submit --timeout 120 suspicious.exe  # run for 2 minutes

# View results in web UI
cuckoo web runserver

# What Cuckoo captures:
#   Process tree (parent-child relationships)
#   System call trace (file, registry, network, process operations)
#   Network traffic (PCAP)
#   Screenshots (periodic)
#   Memory dumps
#   Extracted network IOCs (domains, IPs, URLs contacted)

Common Malware Indicators (IOCs)

IOCs are artifacts that signal a system has been compromised.

Host-Based IOCs

# 1. Unknown processes
ps aux | grep -v "$(ps aux | awk '{print $11}' | sort -u | grep -v grep)"
# More practically:
ps auxf                    # shows process tree (parent-child)
pstree -p                  # tree view with PIDs

# 2. Suspicious network connections (beaconing / C2 traffic)
ss -tulpan                 # all connections with processes
netstat -tulpan | grep ESTABLISHED | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn
# High frequency connections to same IP = possible beacon

# 3. Suspicious files in common locations
find /tmp /var/tmp /dev/shm -type f -executable 2>/dev/null
find /tmp -name "*.sh" -o -name "*.py" -o -name "*.pl" 2>/dev/null
find / -name ".*" -type f -newer /etc/passwd 2>/dev/null | head -20   # hidden files newer than passwd

# 4. Recently modified files in system directories
find /etc /usr/bin /usr/sbin -newer /tmp -type f 2>/dev/null

# 5. Persistence mechanisms
# Crontabs
crontab -l 2>/dev/null
for user in $(cut -f1 -d: /etc/passwd); do crontab -u $user -l 2>/dev/null; done
cat /etc/cron* /var/spool/cron/crontabs/* 2>/dev/null

# Systemd units added by malware
systemctl list-units --type=service --state=running | grep -v "\.service"
find /etc/systemd /usr/lib/systemd ~/.config/systemd -name "*.service" 2>/dev/null | xargs ls -la

# Linux LD_PRELOAD injection (library hijack)
cat /etc/ld.so.preload 2>/dev/null   # should be empty or not exist

# 6. Unusual user accounts
awk -F: '$3 == 0 && $1 != "root" { print "ROOT UID: "$1 }' /etc/passwd
awk -F: '$NF !~ /nologin|false/ && $3 >= 1000 { print $1 }' /etc/passwd
lastlog | grep -v "Never logged in" | tail -20

Network-Based IOCs

# Capture network traffic from a suspicious process
PID=$(pgrep suspicious)
nsenter -t $PID -n tcpdump -i any -w /tmp/suspect-$PID.pcap &

# Look for beaconing patterns in a PCAP
tcpdump -r capture.pcap -nn 'tcp' | awk '{print $3}' | cut -d. -f1-4 | sort | uniq -c | sort -rn | head

# DNS queries (potential C2 domain generation algorithms - DGA)
tcpdump -r capture.pcap -nn 'port 53' | grep -v "ptr\|PTR\|SOA" | awk '{print $NF}' | sort | uniq -c | sort -rn

# tshark - extract all contacted IPs
tshark -r capture.pcap -T fields -e ip.dst -q | sort | uniq -c | sort -rn | head -20

# Look for long DNS names (possible DNS tunneling or DGA)
tshark -r capture.pcap -T fields -e dns.qry.name -q \
  | awk 'length($0) > 30' | sort | uniq -c | sort -rn

YARA Rules

YARA is the pattern-matching language for malware identification. Each rule describes strings or byte patterns, combined with logical conditions.

/* Basic YARA rule structure */
rule ExampleMalware {
    meta:
        description = "Detects Example malware C2 URLs"
        author = "[email protected]"
        date = "2025-03-11"
        reference = "https://threatintel.example.com/report/123"
        hash = "275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f"

    strings:
        $url1 = "malicious-c2.ru/beacon" ascii
        $url2 = "evil-domain.cn/upload" ascii wide  // wide = UTF-16LE match
        $mutex = "Global\\MutexName123" ascii
        $pdb_path = "C:\\Users\\attacker\\malware.pdb" ascii

        // Hex byte pattern (packed shellcode XOR key)
        $xor_key = { 31 C0 50 68 }

        // Regex pattern
        $ip_regex = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/ ascii

    condition:
        uint16(0) == 0x5A4D and           // PE magic bytes "MZ"
        filesize < 2MB and
        any of ($url*) and               // any URL string matches
        $mutex and
        not $pdb_path                    // exclude known samples
}

# Install YARA
apt install yara

# Write a rule and test it
cat > detect_poshell.yar << 'EOF'
rule PowerShellDownloader {
    strings:
        $ps1 = "powershell" nocase
        $dl1 = "DownloadString" nocase
        $dl2 = "Invoke-Expression" nocase
        $web = "WebClient" nocase
        $enc = "FromBase64String" nocase

    condition:
        3 of them
}
EOF

# Scan a file
yara -r detect_poshell.yar suspicious.exe

# Scan a directory recursively
yara -r detect_poshell.yar /tmp/samples/

# Scan memory of a running process
yara detect_poshell.yar -p 1234

# Use community rules (from Awesome-YARA, ESET, Mandiant, etc.)
# Clone signature sets
git clone https://github.com/Neo23x0/signature-base.git

# Scan with multiple rule files
yara -r signature-base/yara/mal_*.yar /tmp/suspicious.exe

Analysis Workflow Summary

1. SAFE ENVIRONMENT
   → Work in an isolated VM with no network access to production
   → Snapshot the VM before analysis; restore after
   → Use REMnux (remnux.org) - purpose-built Linux distro for analysis

2. IDENTIFY
   → Compute SHA-256 hash → check VirusTotal
   → file command, xxd for magic bytes
   → Check file entropy (packed = ~7.0+)

3. STATIC ANALYSIS
   → strings → filter for IPs, URLs, registry keys, commands
   → PE imports → identify capability categories
   → Disassemble: Ghidra (free) or IDA Pro (commercial)

4. DYNAMIC ANALYSIS
   → Submit to any.run or Hybrid Analysis
   → Or run in local Cuckoo sandbox
   → Capture: process tree, file/registry changes, network traffic

5. IOC EXTRACTION
   → Extract: file hashes, C2 IPs/domains, mutex names, registry keys
   → Encode in STIX/TAXII format for sharing
   → Write YARA rules for detection

6. REPORTING
   → Document: malware family, capabilities, C2 infrastructure
   → Share IOCs with your SIEM for blocking and alerting
   → Update EDR exclusions / detection rules