Skip to content

Credit Card Search Options for Memory Dump Analysis

Overview

This document outlines various methods for searching binary .dmp files created by the Volatility workflow for credit card numbers. This functionality would be implemented as Step 6 in the current workflow, after memory dumps are extracted and renamed.

Current Workflow Context

The existing Volatility workflow consists of: 1. Extract PID list from memory image 2. Parse PID list and find interesting processes 3. Save interesting PIDs to JSON 4. Extract file handles for each interesting process 5. Extract memory dumps for each interesting process (creates .dmp files) 6. [NEW] Search memory dumps for credit card patterns

Search Method Options

1. Pure Python Approaches

Option A: Memory-Mapped File Search (mmap)

Description: Use Python's mmap module for memory-mapped file access.

Pros: - Fast performance for large files - Memory-efficient (doesn't load entire file into RAM) - Built into Python standard library - Excellent for files in GB range

Cons: - Platform-specific behavior differences - Requires careful encoding handling - More complex error handling

Implementation Notes:

import mmap

def search_with_mmap(file_path, patterns):
    with open(file_path, 'rb') as f:
        with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
            for pattern in patterns:
                # Search for pattern in binary data
                pass

Best For: Large memory dump files (>100MB)


Option B: Buffered File Reading

Description: Read files in chunks and search each chunk with overlap handling.

Pros: - Simple implementation - Cross-platform consistent behavior - Predictable memory usage - Good control over resource consumption

Cons: - Slower than mmap for very large files - Requires overlap handling for patterns spanning chunks

Implementation Notes:

def search_with_buffering(file_path, patterns, chunk_size=1024*1024):
    with open(file_path, 'rb') as f:
        while chunk := f.read(chunk_size):
            # Search chunk with overlap handling
            pass

Best For: Moderate-sized files or when memory usage must be strictly controlled


2. External Tool Integration

Option A: Native OS Tools

Windows (PowerShell):

Select-String -Pattern "4111111111111111" -Path "*.dmp" -Encoding Byte

Linux/Mac (grep):

grep -a -o "4111111111111111" *.dmp

Pros: - Highly optimized native tools - Excellent binary file handling - Familiar to security professionals - Very fast performance

Cons: - Platform-dependent commands - Requires subprocess management - Less integration with Rich console output - Harder to handle complex patterns


Option B: ripgrep (rg)

Command:

rg --binary --only-matching "4111111111111111" *.dmp

Pros: - Extremely fast performance - Excellent binary file support - Consistent cross-platform behavior - Modern, actively maintained

Cons: - External dependency (must be installed) - Subprocess overhead - Less control over output formatting


3. Specialized Libraries

Description: Detect file types and optimize search strategy based on file characteristics.

Pros: - Smart file handling - Can optimize based on detected file type - Good for mixed file type environments

Cons: - Additional dependency (python-magic) - Complexity may not be needed for .dmp files


Option B: Enhanced regex Module

Description: Use the regex module instead of built-in re for better binary handling.

Pros: - Better binary data handling - More powerful regex features - Better Unicode support

Cons: - Additional dependency - Overkill for simple pattern matching


Credit Card Search Considerations

Pattern Variations

Credit card numbers may appear in various formats in memory:

  1. Raw numbers: 4111111111111111
  2. Spaced formats: 4111 1111 1111 1111
  3. Dashed formats: 4111-1111-1111-1111
  4. Mixed separators: 4111.1111.1111.1111

Encoding Formats

Memory dumps may contain data in different encodings: - ASCII: Single-byte encoding - UTF-8: Variable-width encoding - UTF-16: Two-byte encoding (common in Windows) - UTF-32: Four-byte encoding

Validation Strategy

Luhn Algorithm: Implement validation to reduce false positives by checking if found number sequences are valid credit card numbers.

def luhn_checksum(card_num):
    def digits_of(n):
        return [int(d) for d in str(n)]

    digits = digits_of(card_num)
    odd_digits = digits[-1::-2]
    even_digits = digits[-2::-2]
    checksum = sum(odd_digits)
    for d in even_digits:
        checksum += sum(digits_of(d*2))
    return checksum % 10

def is_luhn_valid(card_num):
    return luhn_checksum(card_num) == 0

Primary Approach: Hybrid Python Solution

Core Method: Use mmap for large files with fallback to buffered reading.

Features: 1. Hard-coded test card list: Configurable via settings/config file 2. Command-line overrides: Allow additional patterns via CLI options 3. Multiple encoding support: Search for ASCII, UTF-8, and UTF-16 variants 4. Luhn validation: Validate found numbers to reduce false positives 5. Rich integration: Progress bars and formatted output using existing Rich console 6. JSON output: Machine-readable results for further processing

Configuration Structure

# Default test credit card numbers
DEFAULT_TEST_CARDS = [
    "4111111111111111",  # Visa
    "5555555555554444",  # Mastercard
    "378282246310005",   # American Express
    "30569309025904",    # Diners Club
    "6011111111111117",  # Discover
]

# Encoding variants to search
ENCODING_VARIANTS = ["ascii", "utf-8", "utf-16-le", "utf-16-be"]

CLI Integration

Add options to the existing volatility command:

@click.option(
    "--search-cards",
    is_flag=True,
    default=False,
    help="Search memory dumps for credit card numbers"
)
@click.option(
    "--additional-patterns",
    multiple=True,
    help="Additional patterns to search for in memory dumps"
)
@click.option(
    "--card-search-config",
    type=click.Path(exists=True, path_type=Path),
    help="Path to custom credit card search configuration file"
)

Implementation Location

New Module: src/kp_ssf_tools/volatility/card_search.py

Functions: - search_memory_dumps(): Main orchestration function - search_file_for_patterns(): Core search implementation - validate_card_numbers(): Luhn algorithm validation - format_search_results(): Rich-formatted output - save_search_results(): JSON export functionality

Integration Point: Add as Step 6 in processor.py workflow:

# Step 6: Search memory dumps for credit card patterns (optional)
if input_model.search_cards:
    search_memory_dumps(results_dir, renamed_files, input_model)

Performance Considerations

File Size Thresholds

  • Small files (<10MB): Use simple file reading
  • Medium files (10MB-100MB): Use buffered reading
  • Large files (>100MB): Use mmap

Memory Usage

  • mmap: Virtual memory usage, minimal RAM impact
  • Buffered: Configurable chunk size (default: 1MB)
  • Progress tracking: Update every 5% of file processed

Optimization Strategies

  1. Early termination: Stop searching file after finding X matches
  2. Pattern compilation: Pre-compile regex patterns
  3. Skip empty files: Quick file size check before processing

Security Considerations

Sensitive Data Handling

  1. Memory clearing: Explicitly clear variables containing card numbers
  2. Secure logging: Mask card numbers in log output
  3. Result sanitization: Option to hash found patterns instead of storing raw numbers

Output Security

def mask_card_number(card_num):
    """Mask all but last 4 digits of credit card number."""
    if len(card_num) < 4:
        return "*" * len(card_num)
    return "*" * (len(card_num) - 4) + card_num[-4:]

Future Enhancements

Advanced Pattern Detection

  1. Context awareness: Look for surrounding keywords (CVV, expiry, etc.)
  2. Format detection: Automatically detect separator patterns
  3. Statistical analysis: Report on pattern frequency and distribution

Integration Features

  1. Report generation: PDF/HTML reports with findings
  2. Compliance mapping: Map findings to PCI DSS requirements
  3. Baseline comparison: Compare against previous scans
  4. Risk scoring: Assign risk levels to different types of findings

Testing Strategy

Unit Tests

  1. Pattern matching: Test all card number formats and encodings
  2. Luhn validation: Test valid and invalid card numbers
  3. File handling: Test with various file sizes and formats
  4. Error conditions: Test with corrupted/inaccessible files

Integration Tests

  1. End-to-end workflow: Full volatility workflow with card search
  2. Performance benchmarks: Measure search times for different file sizes
  3. Cross-platform testing: Verify behavior on Windows/Linux/Mac

Test Data

Create synthetic memory dumps with known patterns for testing: - Embedded test card numbers in various formats - Different encoding scenarios - Edge cases (partial numbers, corrupted data)


Document Version: 1.0
Last Updated: August 9, 2025
Author: GitHub Copilot