SSF Toolkit User Guide: analyze entropy Command¶
Overview¶
This guide explains how to use the ssf_tools analyze entropy sub-command to perform Shannon entropy analysis for PCI SSF 2.3 compliance. You will learn how to detect suspicious patterns in files, customize analysis options, and export results efficiently.
The Entropy Analysis implementation uses an adaptable threshold system which accounts for different types of files such as binary executables vs. Java, or Python source code. Details on this system can be found in the Entropy Thresholds.
Prerequisites¶
- Python 3.13 or later
- SSF Toolkit installed
- Access to files or directories to analyze
Quick Start¶
To analyze a file for entropy-based risk patterns, run:
To view all available options:
Usage¶
The analyze entropy command scans files or directories for regions with high entropy, which may indicate sensitive or obfuscated data. Results are streamed directly to Excel for efficient handling of large datasets.
Common Commands¶
ssf_tools analyze entropy <target>: Analyze a file or directoryssf_tools analyze entropy data/ --no-recurse: Analyze a directory without recursion
Workflow¶
-
Run the Analysis
- Execute the command with your desired options:
- The tool displays progress and summary information in the terminal.
- Results are saved to an Excel file named
entropy-analysis-<timestamp>.xlsxin your working directory.
-
Open the Excel Results
- Locate the generated Excel file.
- Open it in Microsoft Excel or a compatible spreadsheet application.
-
Review the Findings
- Each row represents a region of a file with entropy above the selected risk threshold.
- Columns include file name, region offset, entropy score, risk level and a data sample.
- Use Excel's filtering and sorting features to focus on high-risk regions.
-
Interpret and Act
- Investigate regions flagged as high or critical risk.
- Use the file offset and sample data (if included) to locate suspicious content.
- Share or archive the Excel file for compliance documentation or further analysis.
Tips¶
- If the analysis produces too many results, increase the risk threshold or adjust block/step sizes.
- For large datasets:
- Watch for Excel worksheet limits -- Excel has a 256-worksheet limit. Consider breaking up large, recursive scans up into smaller chunks.
- Watch for Excel row limits -- the tool will warn you if using a low
--rish-thresholdon a large file.
- Use ignore patterns to exclude irrelevant files (
--ignore __pycache__in a Python source-tree). - See additional tuning considerations below
Configuration¶
You can customize the analysis using the following options:
| Option | Description |
|---|---|
--ignore-pattern |
Glob pattern(s) to ignore when searching for files (e.g. --ignore-pattern='pycache') |
--risk-threshold |
Minimum risk level for regions to include (very_low, low, medium, medium_high, high, critical). Default: medium_high |
--file-block-size |
File I/O block size in bytes. Default: 65536 |
--analysis-block-size |
Analysis block size in bytes. Default: 64 |
--step-size |
Step size for sliding window analysis. Default: 16 |
--no-recurse |
Disable recursive directory analysis (analyze current directory only) |
--include-samples |
Include data samples in region analysis (increases file size) |
--help / -h |
Show help message and exit |
Tuning¶
Tuning the entropy calculations is directly tied to the following:
| Parameter | Description | Impact |
|---|---|---|
analysis-block-size |
The amount of data (in bytes) over which Shannon Entropy will be calculated. Think of it as "resolution" | Smaller values will be more precise, but will take more time |
step-size |
The number of bytes to advance for each entropy calculation | Recommend <= analysis-block-size/4 |
file-block-size |
The amount of data to read from the file at one time | Primarily impacts RAM utilization |
The most important consideration for tuning is to know your secrets ahead of time
- A 32-byte (256-bit) AES encryption key could be missed entirely with the default settings (64/16), but a hard-coded Private RSA key (256 bytes/2048-bits or larger) is right on target
- Increase the defaults with caution as barrelling through the file in larger blocks could mask smaller secrets since they'll be averaged in which lower-entropy data within the same analysis block
- We recommend that
step-sizeis not more than 1/4 of theanalysis-block-sizeparameter. This will reduce the likelihood of missing embedded high-entropy data in the analysis - Lower the defaults if you have a specific use case -- maybe you need to prove that AES-128 or 3DES (168-bit) keys are not hard-coded
- The
risk-thresholdoption only filters the items that will be exported to Excel; it does not change the entropy analysis calculations
Example: Customizing Analysis¶
ssf_tools analyze entropy src/ --risk-threshold high --analysis-block-size 32 --step-size 8 --ignore-pattern='*.log' --no-recurse
This command analyzes the src/ directory, includes only high-risk regions, uses a smaller analysis block and step size, ignores .log files, and disables recursion.
NOTE: The smaller analysis block and step size seen in this example would be ideally tuned to detect AES-256 keys embedded in source code.
Example: Analyzing Binary Files¶
This command analyzes the bin/ directory and uses all other defaults. Medium_high and higher entropy regions will be reported.
Advanced Features¶
- Excel results export uses a constant-memory (10MB) streaming approach regardless of input file size
- Warns if estimated regions may exceed Excel row limits
- Supports verbose output for detailed progress
Troubleshooting¶
- If no files are found, check your target path and ignore patterns
- For Excel export issues, review worksheet and row limits and risk thresholds
- Use verbose mode for more detailed output
FAQ¶
-
Q: How do I analyze a single file?
A: Use
ssf_tools analyze entropy <filename>. -
Q: Can I limit analysis to the current directory?
A: Yes, use
--no-recurse. -
Q: What does risk threshold mean?
A: It sets the minimum entropy level for regions to be included in results.
Additional Resources¶
Use this guide to get started with entropy analysis. Adjust options as needed for your workflow and refer to other guides for advanced topics.