Skip to content

SSF Tools Analyze Command - Overview and Requirements

Executive Summary

This document outlines the implementation plan for the ssf_tools analyze command, a sophisticated code analysis tool which includes: - Entropy Analysis to detect variations in expected ranges for different types of files - Credential scanning to detect hard-coded, common credentials in both source code and binary files (using OS-provided strings implementations)

The implementation supports determining application PCI Secure Software Standard requirement 2.3, and other use cases where calculating Shannon Entropy variations within different types of files would be helpful.

PCI SSF Requirement 2.3

Requirement: Default authentication credentials or keys for built-in accounts are not used after installation, initialization, or first use.

Relevant Testing Requirements: * 2.3.b: The assessor shall test the software to confirm that all default credentials, keys, certificates, and other critical assets used for authentication by the software are supported by the evidence examined. NOTE: It is expected that this analysis will include, but not necessarily be limited to, the use of entropy analysis tools to look for: * Hard-coded cryptographic keys * Searches for common cryptographic function calls and structures such as S-Boxes and big-number library functions (and tracing these functions backward to search for hard-coded keys) * As well as checking for strings containing common user account names or password values

Project Overview

The analyze sub-command will provide thorough entropy analysis capabilities including:

  • Shannon entropy calculation with normalization (0-8 bits/byte)
  • Sliding window analysis for localized anomaly detection
  • Credential scanning using externally provided and user-configurable wordlists
  • Cryptographic structure recognition (S-boxes, round constants, etc.)
  • File type-aware analysis with adaptive thresholds
  • Statistical anomaly detection and correlation
  • Multiple output formats for different use cases

Content-Aware Analysis Benefits

The content-aware entropy analysis approach provides significant advantages over traditional static threshold methods:

Accuracy Improvements

  • Detects File Type: Identifies programming language and binary files, applying different entropy paramters based on unique characteristics
  • Reduced False Positives: Minified JavaScript (5.8 entropy) is normal, not suspicious
  • Enhanced True Positive Detection: 7.2+ entropy in source code is genuinely anomalous
  • Context-Sensitive Scoring: Same entropy value interpreted differently per file type
  • Language-Specific Adjustments: Python vs Assembly have different normal patterns

Operational Benefits

  • Lower Alert Fatigue: Security teams see relevant findings, not noise
  • Actionable Results: Findings come with context about why they're suspicious
  • Scalable Analysis: Can process mixed repositories without manual threshold tuning
  • Compliance Ready: File-type-aware analysis better supports audit requirements

Technical Advantages

  • Empirically Derived: Thresholds based on analysis of real-world file corpora
  • Extensible: Easy to add new file types and refine existing thresholds
  • Configurable: Sensitivity levels allow tuning without losing content awareness
  • Standards-Based: Uses industry-standard MIME type classification

Success Criteria

Functional Requirements

  • [X] Accurately detect entropy anomalies in both source and binary files
  • [X] Identify common cryptographic structures with >95% accuracy
  • [X] Process files up to >1GB efficiently
  • [X] Generate actionable reports in multiple formats

Performance Requirements

  • [X] Process typical source files (<1MB) in under 1 second
  • [X] Handle large binary files (>100MB) with progress indication
  • [X] Memory usage scales linearly with file size
  • [X] False positive rate below 5%

Usability Requirements

  • [X] Intuitive CLI with detailed help
  • [X] Clear, actionable error messages
  • [X] Rich console output with progress indicators
  • [X] Complete documentation and examples

Compliance and Standards

PCI SSF 2.3 Compliance

  • Risk Assessment: Automated risk level classification (HIGH, MEDIUM, LOW) for compliance findings
  • Audit-Ready Output: XML and JSON formats suitable for enterprise compliance systems

Dependencies

Core Dependencies

# Entropy analysis dependencies are part of core ssf-tools installation
dependencies = [
    # ... existing core dependencies
    "numpy>=1.24.0",          # Optimized mathematical operations
    "scipy>=1.10.0",          # Statistical analysis and scientific computing
    "puremagic>=1.15",        # Pure Python file type detection
    "matplotlib>=3.6.0",      # Basic report visualizations
    "jinja2>=3.1.0",         # HTML template rendering
    "httpx>=0.24.0",          # SecLists wordlist downloading (modern requests replacement)
    "seaborn>=0.12.0",        # Statistical plots
    "pyahocorasick>=2.2.0",     # Aso-Corasick search engine for parallel wordlist search operations
]

Optional Advanced Features

[project.optional-dependencies]
advanced-analytics = [
    "plotly>=5.15.0",        # Interactive charts for detailed analysis
    "pandas>=2.0.0",         # Advanced data analysis and CSV processing
    "statsmodels>=0.14.0",   # Advanced statistical analysis and time series
]

Integration with Existing SSF Tools

CLI Integration

The entropy command will be registered in cli/main.py following the existing pattern:

# In cli/main.py
def register_commands() -> None:
    from kp_ssf_tools.cli.commands.volatility import volatility
    from kp_ssf_tools.cli.commands.entropy import entropy
    cli.add_command(volatility)
    cli.add_command(entropy)

Shared Utilities

The entropy command will leverage existing utilities from core/services: - rich_output for consistent console output - timestamp_service for common date/time operations - file_processing for common file operations - http_client for accessing web content such as word lists - cache_service for storing local copies of online content and other high-cost data structures - Base models from models/base.py

Testing Integration

Tests will follow the established structure: - tests/unit/entropy/analysis/test_entropy_*.py - Unit tests for each analysis module - tests/integration/entropy/test_entropy_cli.py - CLI integration tests
- tests/e2e/workflows/test_entropy_workflows.py - End-to-end workflow tests

Risk Mitigation

Technical Risks

  1. High Memory Usage: Mitigated through streaming and chunked processing
  2. False Positives: Mitigated through statistical correlation and confidence scoring
  3. Performance Issues: Mitigated through profiling and optimization
  4. Dependency Conflicts: Mitigated through careful dependency management

Operational Risks

  1. Complex Configuration: Mitigated through sensible defaults and clear documentation
  2. Integration Difficulties: Mitigated through following established patterns
  3. Maintenance Overhead: Mitigated through comprehensive testing and documentation