Credit Card Search Options for Memory Dump Analysis¶

Overview¶

This document outlines various methods for searching binary .dmp files created by the Volatility workflow for credit card numbers. This functionality would be implemented as Step 6 in the current workflow, after memory dumps are extracted and renamed.

Current Workflow Context¶

The existing Volatility workflow consists of: 1. Extract PID list from memory image 2. Parse PID list and find interesting processes 3. Save interesting PIDs to JSON 4. Extract file handles for each interesting process 5. Extract memory dumps for each interesting process (creates .dmp files) 6. [NEW] Search memory dumps for credit card patterns

Search Method Options¶

1. Pure Python Approaches¶

Option A: Memory-Mapped File Search (`mmap`)¶

Description: Use Python's mmap module for memory-mapped file access.

Pros: - Fast performance for large files - Memory-efficient (doesn't load entire file into RAM) - Built into Python standard library - Excellent for files in GB range

Cons: - Platform-specific behavior differences - Requires careful encoding handling - More complex error handling

Implementation Notes:

import mmap

def search_with_mmap(file_path, patterns):
    with open(file_path, 'rb') as f:
        with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
            for pattern in patterns:
                # Search for pattern in binary data
                pass

Best For: Large memory dump files (>100MB)

Option B: Buffered File Reading¶

Description: Read files in chunks and search each chunk with overlap handling.

Pros: - Simple implementation - Cross-platform consistent behavior - Predictable memory usage - Good control over resource consumption

Cons: - Slower than mmap for very large files - Requires overlap handling for patterns spanning chunks

Implementation Notes:

def search_with_buffering(file_path, patterns, chunk_size=1024*1024):
    with open(file_path, 'rb') as f:
        while chunk := f.read(chunk_size):
            # Search chunk with overlap handling
            pass

Best For: Moderate-sized files or when memory usage must be strictly controlled

2. External Tool Integration¶

Option A: Native OS Tools¶

Windows (PowerShell):

Select-String -Pattern "4111111111111111" -Path "*.dmp" -Encoding Byte

Linux/Mac (grep):

grep -a -o "4111111111111111" *.dmp

Pros: - Highly optimized native tools - Excellent binary file handling - Familiar to security professionals - Very fast performance

Cons: - Platform-dependent commands - Requires subprocess management - Less integration with Rich console output - Harder to handle complex patterns

Option B: ripgrep (rg)¶

Command:

rg --binary --only-matching "4111111111111111" *.dmp

Pros: - Extremely fast performance - Excellent binary file support - Consistent cross-platform behavior - Modern, actively maintained

Cons: - External dependency (must be installed) - Subprocess overhead - Less control over output formatting

3. Specialized Libraries¶

Option A: python-magic + Custom Search¶

Description: Detect file types and optimize search strategy based on file characteristics.

Pros: - Smart file handling - Can optimize based on detected file type - Good for mixed file type environments

Cons: - Additional dependency (python-magic) - Complexity may not be needed for .dmp files

Option B: Enhanced regex Module¶

Description: Use the regex module instead of built-in re for better binary handling.

Pros: - Better binary data handling - More powerful regex features - Better Unicode support

Cons: - Additional dependency - Overkill for simple pattern matching

Credit Card Search Considerations¶

Pattern Variations¶

Credit card numbers may appear in various formats in memory:

Raw numbers: 4111111111111111
Spaced formats: 4111 1111 1111 1111
Dashed formats: 4111-1111-1111-1111
Mixed separators: 4111.1111.1111.1111

Encoding Formats¶

Memory dumps may contain data in different encodings: - ASCII: Single-byte encoding - UTF-8: Variable-width encoding - UTF-16: Two-byte encoding (common in Windows) - UTF-32: Four-byte encoding

Validation Strategy¶

Luhn Algorithm: Implement validation to reduce false positives by checking if found number sequences are valid credit card numbers.

def luhn_checksum(card_num):
    def digits_of(n):
        return [int(d) for d in str(n)]

    digits = digits_of(card_num)
    odd_digits = digits[-1::-2]
    even_digits = digits[-2::-2]
    checksum = sum(odd_digits)
    for d in even_digits:
        checksum += sum(digits_of(d*2))
    return checksum % 10

def is_luhn_valid(card_num):
    return luhn_checksum(card_num) == 0

Recommended Implementation Strategy¶

Primary Approach: Hybrid Python Solution¶

Core Method: Use mmap for large files with fallback to buffered reading.

Features: 1. Hard-coded test card list: Configurable via settings/config file 2. Command-line overrides: Allow additional patterns via CLI options 3. Multiple encoding support: Search for ASCII, UTF-8, and UTF-16 variants 4. Luhn validation: Validate found numbers to reduce false positives 5. Rich integration: Progress bars and formatted output using existing Rich console 6. JSON output: Machine-readable results for further processing

Configuration Structure¶

# Default test credit card numbers
DEFAULT_TEST_CARDS = [
    "4111111111111111",  # Visa
    "5555555555554444",  # Mastercard
    "378282246310005",   # American Express
    "30569309025904",    # Diners Club
    "6011111111111117",  # Discover
]

# Encoding variants to search
ENCODING_VARIANTS = ["ascii", "utf-8", "utf-16-le", "utf-16-be"]

CLI Integration¶

Add options to the existing volatility command:

@click.option(
    "--search-cards",
    is_flag=True,
    default=False,
    help="Search memory dumps for credit card numbers"
)
@click.option(
    "--additional-patterns",
    multiple=True,
    help="Additional patterns to search for in memory dumps"
)
@click.option(
    "--card-search-config",
    type=click.Path(exists=True, path_type=Path),
    help="Path to custom credit card search configuration file"
)

Implementation Location¶

New Module: src/kp_ssf_tools/volatility/card_search.py

Functions: - search_memory_dumps(): Main orchestration function - search_file_for_patterns(): Core search implementation - validate_card_numbers(): Luhn algorithm validation - format_search_results(): Rich-formatted output - save_search_results(): JSON export functionality

Integration Point: Add as Step 6 in processor.py workflow:

# Step 6: Search memory dumps for credit card patterns (optional)
if input_model.search_cards:
    search_memory_dumps(results_dir, renamed_files, input_model)

Performance Considerations¶

File Size Thresholds¶

Small files (<10MB): Use simple file reading
Medium files (10MB-100MB): Use buffered reading
Large files (>100MB): Use mmap

Memory Usage¶

mmap: Virtual memory usage, minimal RAM impact
Buffered: Configurable chunk size (default: 1MB)
Progress tracking: Update every 5% of file processed

Optimization Strategies¶

Early termination: Stop searching file after finding X matches
Pattern compilation: Pre-compile regex patterns
Skip empty files: Quick file size check before processing

Security Considerations¶

Sensitive Data Handling¶

Memory clearing: Explicitly clear variables containing card numbers
Secure logging: Mask card numbers in log output
Result sanitization: Option to hash found patterns instead of storing raw numbers

Output Security¶

def mask_card_number(card_num):
    """Mask all but last 4 digits of credit card number."""
    if len(card_num) < 4:
        return "*" * len(card_num)
    return "*" * (len(card_num) - 4) + card_num[-4:]

Future Enhancements¶

Advanced Pattern Detection¶

Context awareness: Look for surrounding keywords (CVV, expiry, etc.)
Format detection: Automatically detect separator patterns
Statistical analysis: Report on pattern frequency and distribution

Integration Features¶

Report generation: PDF/HTML reports with findings
Compliance mapping: Map findings to PCI DSS requirements
Baseline comparison: Compare against previous scans
Risk scoring: Assign risk levels to different types of findings

Testing Strategy¶

Unit Tests¶

Pattern matching: Test all card number formats and encodings
Luhn validation: Test valid and invalid card numbers
File handling: Test with various file sizes and formats
Error conditions: Test with corrupted/inaccessible files

Integration Tests¶

End-to-end workflow: Full volatility workflow with card search
Performance benchmarks: Measure search times for different file sizes
Cross-platform testing: Verify behavior on Windows/Linux/Mac

Test Data¶

Create synthetic memory dumps with known patterns for testing: - Embedded test card numbers in various formats - Different encoding scenarios - Edge cases (partial numbers, corrupted data)

Document Version: 1.0
Last Updated: August 9, 2025
Author: GitHub Copilot