SSF Tools Analyze Command - Overview and Requirements¶

Executive Summary¶

This document outlines the implementation plan for the ssf_tools analyze command, a sophisticated code analysis tool which includes: - Entropy Analysis to detect variations in expected ranges for different types of files - Credential scanning to detect hard-coded, common credentials in both source code and binary files (using OS-provided strings implementations)

The implementation supports determining application PCI Secure Software Standard requirement 2.3, and other use cases where calculating Shannon Entropy variations within different types of files would be helpful.

PCI SSF Requirement 2.3¶

Requirement: Default authentication credentials or keys for built-in accounts are not used after installation, initialization, or first use.

Relevant Testing Requirements: * 2.3.b: The assessor shall test the software to confirm that all default credentials, keys, certificates, and other critical assets used for authentication by the software are supported by the evidence examined. NOTE: It is expected that this analysis will include, but not necessarily be limited to, the use of entropy analysis tools to look for: * Hard-coded cryptographic keys * Searches for common cryptographic function calls and structures such as S-Boxes and big-number library functions (and tracing these functions backward to search for hard-coded keys) * As well as checking for strings containing common user account names or password values

Project Overview¶

The analyze sub-command will provide thorough entropy analysis capabilities including:

Shannon entropy calculation with normalization (0-8 bits/byte)
Sliding window analysis for localized anomaly detection
Credential scanning using externally provided and user-configurable wordlists
Cryptographic structure recognition (S-boxes, round constants, etc.)
File type-aware analysis with adaptive thresholds
Statistical anomaly detection and correlation
Multiple output formats for different use cases

Content-Aware Analysis Benefits¶

The content-aware entropy analysis approach provides significant advantages over traditional static threshold methods:

Accuracy Improvements¶

Detects File Type: Identifies programming language and binary files, applying different entropy paramters based on unique characteristics
Reduced False Positives: Minified JavaScript (5.8 entropy) is normal, not suspicious
Enhanced True Positive Detection: 7.2+ entropy in source code is genuinely anomalous
Context-Sensitive Scoring: Same entropy value interpreted differently per file type
Language-Specific Adjustments: Python vs Assembly have different normal patterns

Operational Benefits¶

Lower Alert Fatigue: Security teams see relevant findings, not noise
Actionable Results: Findings come with context about why they're suspicious
Scalable Analysis: Can process mixed repositories without manual threshold tuning
Compliance Ready: File-type-aware analysis better supports audit requirements

Technical Advantages¶

Empirically Derived: Thresholds based on analysis of real-world file corpora
Extensible: Easy to add new file types and refine existing thresholds
Configurable: Sensitivity levels allow tuning without losing content awareness
Standards-Based: Uses industry-standard MIME type classification

Success Criteria¶

Functional Requirements¶

[X] Accurately detect entropy anomalies in both source and binary files
[X] Identify common cryptographic structures with >95% accuracy
[X] Process files up to >1GB efficiently
[X] Generate actionable reports in multiple formats

Performance Requirements¶

[X] Process typical source files (<1MB) in under 1 second
[X] Handle large binary files (>100MB) with progress indication
[X] Memory usage scales linearly with file size
[X] False positive rate below 5%

Usability Requirements¶

[X] Intuitive CLI with detailed help
[X] Clear, actionable error messages
[X] Rich console output with progress indicators
[X] Complete documentation and examples

Compliance and Standards¶

PCI SSF 2.3 Compliance¶

Risk Assessment: Automated risk level classification (HIGH, MEDIUM, LOW) for compliance findings
Audit-Ready Output: XML and JSON formats suitable for enterprise compliance systems

Dependencies¶

Core Dependencies¶

# Entropy analysis dependencies are part of core ssf-tools installation
dependencies = [
    # ... existing core dependencies
    "numpy>=1.24.0",          # Optimized mathematical operations
    "scipy>=1.10.0",          # Statistical analysis and scientific computing
    "puremagic>=1.15",        # Pure Python file type detection
    "matplotlib>=3.6.0",      # Basic report visualizations
    "jinja2>=3.1.0",         # HTML template rendering
    "httpx>=0.24.0",          # SecLists wordlist downloading (modern requests replacement)
    "seaborn>=0.12.0",        # Statistical plots
    "pyahocorasick>=2.2.0",     # Aso-Corasick search engine for parallel wordlist search operations
]

Optional Advanced Features¶

[project.optional-dependencies]
advanced-analytics = [
    "plotly>=5.15.0",        # Interactive charts for detailed analysis
    "pandas>=2.0.0",         # Advanced data analysis and CSV processing
    "statsmodels>=0.14.0",   # Advanced statistical analysis and time series
]

Integration with Existing SSF Tools¶

CLI Integration¶

The entropy command will be registered in cli/main.py following the existing pattern:

# In cli/main.py
def register_commands() -> None:
    from kp_ssf_tools.cli.commands.volatility import volatility
    from kp_ssf_tools.cli.commands.entropy import entropy
    cli.add_command(volatility)
    cli.add_command(entropy)

Shared Utilities¶

The entropy command will leverage existing utilities from core/services: - rich_output for consistent console output - timestamp_service for common date/time operations - file_processing for common file operations - http_client for accessing web content such as word lists - cache_service for storing local copies of online content and other high-cost data structures - Base models from models/base.py

Testing Integration¶

Tests will follow the established structure: - tests/unit/entropy/analysis/test_entropy_*.py - Unit tests for each analysis module - tests/integration/entropy/test_entropy_cli.py - CLI integration tests
- tests/e2e/workflows/test_entropy_workflows.py - End-to-end workflow tests

Risk Mitigation¶

Technical Risks¶

High Memory Usage: Mitigated through streaming and chunked processing
False Positives: Mitigated through statistical correlation and confidence scoring
Performance Issues: Mitigated through profiling and optimization
Dependency Conflicts: Mitigated through careful dependency management

Operational Risks¶

Complex Configuration: Mitigated through sensible defaults and clear documentation
Integration Difficulties: Mitigated through following established patterns
Maintenance Overhead: Mitigated through comprehensive testing and documentation