Credit Card Search Options for Memory Dump Analysis¶
Overview¶
This document outlines various methods for searching binary .dmp files created by the Volatility workflow for credit card numbers. This functionality would be implemented as Step 6 in the current workflow, after memory dumps are extracted and renamed.
Current Workflow Context¶
The existing Volatility workflow consists of:
1. Extract PID list from memory image
2. Parse PID list and find interesting processes
3. Save interesting PIDs to JSON
4. Extract file handles for each interesting process
5. Extract memory dumps for each interesting process (creates .dmp files)
6. [NEW] Search memory dumps for credit card patterns
Search Method Options¶
1. Pure Python Approaches¶
Option A: Memory-Mapped File Search (mmap)¶
Description: Use Python's mmap module for memory-mapped file access.
Pros: - Fast performance for large files - Memory-efficient (doesn't load entire file into RAM) - Built into Python standard library - Excellent for files in GB range
Cons: - Platform-specific behavior differences - Requires careful encoding handling - More complex error handling
Implementation Notes:
import mmap
def search_with_mmap(file_path, patterns):
with open(file_path, 'rb') as f:
with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
for pattern in patterns:
# Search for pattern in binary data
pass
Best For: Large memory dump files (>100MB)
Option B: Buffered File Reading¶
Description: Read files in chunks and search each chunk with overlap handling.
Pros: - Simple implementation - Cross-platform consistent behavior - Predictable memory usage - Good control over resource consumption
Cons: - Slower than mmap for very large files - Requires overlap handling for patterns spanning chunks
Implementation Notes:
def search_with_buffering(file_path, patterns, chunk_size=1024*1024):
with open(file_path, 'rb') as f:
while chunk := f.read(chunk_size):
# Search chunk with overlap handling
pass
Best For: Moderate-sized files or when memory usage must be strictly controlled
2. External Tool Integration¶
Option A: Native OS Tools¶
Windows (PowerShell):
Linux/Mac (grep):
Pros: - Highly optimized native tools - Excellent binary file handling - Familiar to security professionals - Very fast performance
Cons: - Platform-dependent commands - Requires subprocess management - Less integration with Rich console output - Harder to handle complex patterns
Option B: ripgrep (rg)¶
Command:
Pros: - Extremely fast performance - Excellent binary file support - Consistent cross-platform behavior - Modern, actively maintained
Cons: - External dependency (must be installed) - Subprocess overhead - Less control over output formatting
3. Specialized Libraries¶
Option A: python-magic + Custom Search¶
Description: Detect file types and optimize search strategy based on file characteristics.
Pros: - Smart file handling - Can optimize based on detected file type - Good for mixed file type environments
Cons:
- Additional dependency (python-magic)
- Complexity may not be needed for .dmp files
Option B: Enhanced regex Module¶
Description: Use the regex module instead of built-in re for better binary handling.
Pros: - Better binary data handling - More powerful regex features - Better Unicode support
Cons: - Additional dependency - Overkill for simple pattern matching
Credit Card Search Considerations¶
Pattern Variations¶
Credit card numbers may appear in various formats in memory:
- Raw numbers:
4111111111111111 - Spaced formats:
4111 1111 1111 1111 - Dashed formats:
4111-1111-1111-1111 - Mixed separators:
4111.1111.1111.1111
Encoding Formats¶
Memory dumps may contain data in different encodings: - ASCII: Single-byte encoding - UTF-8: Variable-width encoding - UTF-16: Two-byte encoding (common in Windows) - UTF-32: Four-byte encoding
Validation Strategy¶
Luhn Algorithm: Implement validation to reduce false positives by checking if found number sequences are valid credit card numbers.
def luhn_checksum(card_num):
def digits_of(n):
return [int(d) for d in str(n)]
digits = digits_of(card_num)
odd_digits = digits[-1::-2]
even_digits = digits[-2::-2]
checksum = sum(odd_digits)
for d in even_digits:
checksum += sum(digits_of(d*2))
return checksum % 10
def is_luhn_valid(card_num):
return luhn_checksum(card_num) == 0
Recommended Implementation Strategy¶
Primary Approach: Hybrid Python Solution¶
Core Method: Use mmap for large files with fallback to buffered reading.
Features: 1. Hard-coded test card list: Configurable via settings/config file 2. Command-line overrides: Allow additional patterns via CLI options 3. Multiple encoding support: Search for ASCII, UTF-8, and UTF-16 variants 4. Luhn validation: Validate found numbers to reduce false positives 5. Rich integration: Progress bars and formatted output using existing Rich console 6. JSON output: Machine-readable results for further processing
Configuration Structure¶
# Default test credit card numbers
DEFAULT_TEST_CARDS = [
"4111111111111111", # Visa
"5555555555554444", # Mastercard
"378282246310005", # American Express
"30569309025904", # Diners Club
"6011111111111117", # Discover
]
# Encoding variants to search
ENCODING_VARIANTS = ["ascii", "utf-8", "utf-16-le", "utf-16-be"]
CLI Integration¶
Add options to the existing volatility command:
@click.option(
"--search-cards",
is_flag=True,
default=False,
help="Search memory dumps for credit card numbers"
)
@click.option(
"--additional-patterns",
multiple=True,
help="Additional patterns to search for in memory dumps"
)
@click.option(
"--card-search-config",
type=click.Path(exists=True, path_type=Path),
help="Path to custom credit card search configuration file"
)
Implementation Location¶
New Module: src/kp_ssf_tools/volatility/card_search.py
Functions:
- search_memory_dumps(): Main orchestration function
- search_file_for_patterns(): Core search implementation
- validate_card_numbers(): Luhn algorithm validation
- format_search_results(): Rich-formatted output
- save_search_results(): JSON export functionality
Integration Point: Add as Step 6 in processor.py workflow:
# Step 6: Search memory dumps for credit card patterns (optional)
if input_model.search_cards:
search_memory_dumps(results_dir, renamed_files, input_model)
Performance Considerations¶
File Size Thresholds¶
- Small files (<10MB): Use simple file reading
- Medium files (10MB-100MB): Use buffered reading
- Large files (>100MB): Use mmap
Memory Usage¶
- mmap: Virtual memory usage, minimal RAM impact
- Buffered: Configurable chunk size (default: 1MB)
- Progress tracking: Update every 5% of file processed
Optimization Strategies¶
- Early termination: Stop searching file after finding X matches
- Pattern compilation: Pre-compile regex patterns
- Skip empty files: Quick file size check before processing
Security Considerations¶
Sensitive Data Handling¶
- Memory clearing: Explicitly clear variables containing card numbers
- Secure logging: Mask card numbers in log output
- Result sanitization: Option to hash found patterns instead of storing raw numbers
Output Security¶
def mask_card_number(card_num):
"""Mask all but last 4 digits of credit card number."""
if len(card_num) < 4:
return "*" * len(card_num)
return "*" * (len(card_num) - 4) + card_num[-4:]
Future Enhancements¶
Advanced Pattern Detection¶
- Context awareness: Look for surrounding keywords (CVV, expiry, etc.)
- Format detection: Automatically detect separator patterns
- Statistical analysis: Report on pattern frequency and distribution
Integration Features¶
- Report generation: PDF/HTML reports with findings
- Compliance mapping: Map findings to PCI DSS requirements
- Baseline comparison: Compare against previous scans
- Risk scoring: Assign risk levels to different types of findings
Testing Strategy¶
Unit Tests¶
- Pattern matching: Test all card number formats and encodings
- Luhn validation: Test valid and invalid card numbers
- File handling: Test with various file sizes and formats
- Error conditions: Test with corrupted/inaccessible files
Integration Tests¶
- End-to-end workflow: Full volatility workflow with card search
- Performance benchmarks: Measure search times for different file sizes
- Cross-platform testing: Verify behavior on Windows/Linux/Mac
Test Data¶
Create synthetic memory dumps with known patterns for testing: - Embedded test card numbers in various formats - Different encoding scenarios - Edge cases (partial numbers, corrupted data)
Document Version: 1.0
Last Updated: August 9, 2025
Author: GitHub Copilot