Skip to content

SSF Toolkit User Guide: analyze entropy Command

Overview

This guide explains how to use the ssf_tools analyze entropy sub-command to perform Shannon entropy analysis for PCI SSF 2.3 compliance. You will learn how to detect suspicious patterns in files, customize analysis options, and export results efficiently.

The Entropy Analysis implementation uses an adaptable threshold system which accounts for different types of files such as binary executables vs. Java, or Python source code. Details on this system can be found in the Entropy Thresholds.

Prerequisites

  • Python 3.13 or later
  • SSF Toolkit installed
  • Access to files or directories to analyze

Quick Start

To analyze a file for entropy-based risk patterns, run:

ssf_tools analyze entropy sample.bin

To view all available options:

ssf_tools analyze entropy --help

Usage

The analyze entropy command scans files or directories for regions with high entropy, which may indicate sensitive or obfuscated data. Results are streamed directly to Excel for efficient handling of large datasets.

Common Commands

  • ssf_tools analyze entropy <target>: Analyze a file or directory
  • ssf_tools analyze entropy data/ --no-recurse: Analyze a directory without recursion

Workflow

  1. Run the Analysis

    • Execute the command with your desired options:
    ssf_tools analyze entropy <target> [options]
    
    • The tool displays progress and summary information in the terminal.
    • Results are saved to an Excel file named entropy-analysis-<timestamp>.xlsx in your working directory.
  2. Open the Excel Results

    • Locate the generated Excel file.
    • Open it in Microsoft Excel or a compatible spreadsheet application.
  3. Review the Findings

    • Each row represents a region of a file with entropy above the selected risk threshold.
    • Columns include file name, region offset, entropy score, risk level and a data sample.
    • Use Excel's filtering and sorting features to focus on high-risk regions.
  4. Interpret and Act

    • Investigate regions flagged as high or critical risk.
    • Use the file offset and sample data (if included) to locate suspicious content.
    • Share or archive the Excel file for compliance documentation or further analysis.

Tips

  • If the analysis produces too many results, increase the risk threshold or adjust block/step sizes.
  • For large datasets:
    • Watch for Excel worksheet limits -- Excel has a 256-worksheet limit. Consider breaking up large, recursive scans up into smaller chunks.
    • Watch for Excel row limits -- the tool will warn you if using a low --rish-threshold on a large file.
  • Use ignore patterns to exclude irrelevant files (--ignore __pycache__ in a Python source-tree).
  • See additional tuning considerations below

Configuration

You can customize the analysis using the following options:

Option Description
--ignore-pattern Glob pattern(s) to ignore when searching for files (e.g. --ignore-pattern='pycache')
--risk-threshold Minimum risk level for regions to include (very_low, low, medium, medium_high, high, critical). Default: medium_high
--file-block-size File I/O block size in bytes. Default: 65536
--analysis-block-size Analysis block size in bytes. Default: 64
--step-size Step size for sliding window analysis. Default: 16
--no-recurse Disable recursive directory analysis (analyze current directory only)
--include-samples Include data samples in region analysis (increases file size)
--help / -h Show help message and exit

Tuning

Tuning the entropy calculations is directly tied to the following:

Parameter Description Impact
analysis-block-size The amount of data (in bytes) over which Shannon Entropy will be calculated. Think of it as "resolution" Smaller values will be more precise, but will take more time
step-size The number of bytes to advance for each entropy calculation Recommend <= analysis-block-size/4
file-block-size The amount of data to read from the file at one time Primarily impacts RAM utilization

The most important consideration for tuning is to know your secrets ahead of time

  • A 32-byte (256-bit) AES encryption key could be missed entirely with the default settings (64/16), but a hard-coded Private RSA key (256 bytes/2048-bits or larger) is right on target
  • Increase the defaults with caution as barrelling through the file in larger blocks could mask smaller secrets since they'll be averaged in which lower-entropy data within the same analysis block
  • We recommend that step-size is not more than 1/4 of the analysis-block-size parameter. This will reduce the likelihood of missing embedded high-entropy data in the analysis
  • Lower the defaults if you have a specific use case -- maybe you need to prove that AES-128 or 3DES (168-bit) keys are not hard-coded
  • The risk-threshold option only filters the items that will be exported to Excel; it does not change the entropy analysis calculations

Example: Customizing Analysis

ssf_tools analyze entropy src/ --risk-threshold high --analysis-block-size 32 --step-size 8 --ignore-pattern='*.log' --no-recurse

This command analyzes the src/ directory, includes only high-risk regions, uses a smaller analysis block and step size, ignores .log files, and disables recursion.

NOTE: The smaller analysis block and step size seen in this example would be ideally tuned to detect AES-256 keys embedded in source code.

Example: Analyzing Binary Files

ssf_tools analyze entropy bin/

This command analyzes the bin/ directory and uses all other defaults. Medium_high and higher entropy regions will be reported.

Advanced Features

  • Excel results export uses a constant-memory (10MB) streaming approach regardless of input file size
  • Warns if estimated regions may exceed Excel row limits
  • Supports verbose output for detailed progress

Troubleshooting

  • If no files are found, check your target path and ignore patterns
  • For Excel export issues, review worksheet and row limits and risk thresholds
  • Use verbose mode for more detailed output

FAQ

  • Q: How do I analyze a single file?

    A: Use ssf_tools analyze entropy <filename>.

  • Q: Can I limit analysis to the current directory?

    A: Yes, use --no-recurse.

  • Q: What does risk threshold mean?

    A: It sets the minimum entropy level for regions to be included in results.

Additional Resources


Use this guide to get started with entropy analysis. Adjust options as needed for your workflow and refer to other guides for advanced topics.