Skip to content

Content-Aware Entropy Thresholds

This guide explains the content-aware threshold system used by SSF Tools to classify entropy levels in different file types. The system adapts entropy analysis based on the detected content type, providing more accurate risk assessment.

The content-aware thresholds have been set based on available research.

Overview

SSF Tools uses a sophisticated threshold system that adjusts entropy analysis based on file content rather than applying universal thresholds. This approach recognizes that different file types have inherently different entropy characteristics.

For example: - Documentation files typically have lower entropy (4.8 ± 0.65) - Encrypted files have maximum entropy (7.99 ± 0.01) - Programming languages vary significantly based on syntax and patterns

Content Detection Process

The system uses a two-stage detection process to determine appropriate thresholds:

1. MIME Type Detection

SSF Tools first identifies the file's MIME type using multiple detection methods:

  • File extension analysis - Quick identification based on file extension
  • Magic number detection - Binary header analysis using puremagic and python-magic
  • Content sampling - Analysis of file content patterns

This provides the broad file category (text, binary, executable, etc.).

2. Language Detection

For text-based files, SSF Tools performs language detection using:

  • Pygments lexer analysis - Identifies programming language syntax
  • Pattern recognition - Detects specific language constructs
  • Content structure analysis - Identifies documentation formats

This fine-tunes the threshold selection to the specific programming language or content type.

Threshold Classification System

Each file type has five entropy threshold levels:

Level Description Typical Content
VERY_LOW Highly repetitive content Simple scripts, template files
LOW Basic structured content Well-commented code, simple documentation
MEDIUM Normal content complexity Typical source code, regular documentation
MEDIUM_HIGH Complex but legitimate content Minified code, technical documentation
HIGH Suspicious entropy levels Obfuscated code, packed binaries

Current Threshold Definitions

The current thresholds are defined in kp_ssf_tools.analyze.models.content_aware which is included below.

File type-specific entropy thresholds loaded from configuration.

Source code in src\kp_ssf_tools\analyze\models\content_aware.py
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
class ContentAwareThresholds(SSFToolsBaseModel):
    """File type-specific entropy thresholds loaded from configuration."""

    file_type: FileType
    expected_entropy: tuple[float, float]  # (mean, std_dev) for normal content
    very_low_threshold: float  # Below this = VERY_LOW
    low_threshold: float  # Below this = LOW
    medium_threshold: float  # Normal range center
    medium_high_threshold: float  # Above this = MEDIUM_HIGH
    high_threshold: float  # Above this = HIGH

    @classmethod
    def get_default_values(cls) -> dict[FileType, dict[str, object]]:
        """
        Default threshold values for configuration file generation.

        These values are derived from extensive academic research documented
        in docs/file-entropy-research.md, including:
        - Lyda & Hamrock (2007) IEEE foundational paper
        - Davies et al. (2022) NapierOne dataset (500,000+ files)
        - Practical Security Analytics (500,000 PE file analysis)
        - Multiple peer-reviewed studies with statistical validation

        Returns a dict suitable for YAML configuration file generation.
        """
        return {
            # Top 20 Programming Languages (2025 Rankings)
            FileType.PYTHON: {
                "expected_entropy": [
                    5.5,
                    0.8,
                ],  # Mean=5.5, StdDev=0.8
                "very_low_threshold": 4.0,  # Highly repetitive code
                "low_threshold": 5.0,  # Simple scripts, lots of comments
                "medium_threshold": 6.0,  # Typical Python code
                "medium_high_threshold": 6.8,  # Complex logic, minified
                "high_threshold": 7.2,  # Obfuscated/packed code | > Likely suspicious
            },
            FileType.JAVASCRIPT: {
                "expected_entropy": [
                    5.4,
                    0.8,
                ],  # Mean=5.4, StdDev=0.8
                "very_low_threshold": 4.0,  # Simple scripts
                "low_threshold": 4.9,  # Basic JS with comments
                "medium_threshold": 5.9,  # Typical JavaScript
                "medium_high_threshold": 6.8,  # Minified/complex
                "high_threshold": 7.2,  # Obfuscated code | > Likely suspicious
            },
            FileType.JAVA: {
                "expected_entropy": [
                    5.6,
                    0.7,
                ],  # Mean=5.6, StdDev=0.7
                "very_low_threshold": 4.0,  # Verbose Java patterns
                "low_threshold": 5.0,  # Simple classes
                "medium_threshold": 6.0,  # Typical Java code
                "medium_high_threshold": 6.8,  # Complex enterprise code
                "high_threshold": 7.2,  # Bytecode/obfuscated | > Likely suspicious
            },
            FileType.CPP: {
                "expected_entropy": [
                    5.8,
                    0.9,
                ],  # Mean=5.8, StdDev=0.9
                "very_low_threshold": 4.0,  # Header files
                "low_threshold": 5.0,  # Simple implementations
                "medium_threshold": 6.2,  # Typical C++ code
                "medium_high_threshold": 7.0,  # Template-heavy code
                "high_threshold": 7.3,  # Compiled/obfuscated | > Likely suspicious
            },
            FileType.C: {
                "expected_entropy": [
                    5.7,
                    0.9,
                ],  # Mean=5.7, StdDev=0.9
                "very_low_threshold": 4.0,  # Header files
                "low_threshold": 5.0,  # Simple C code
                "medium_threshold": 6.1,  # Typical C programs
                "medium_high_threshold": 6.9,  # Complex system code
                "high_threshold": 7.3,  # Compiled/obfuscated | > Likely suspicious
            },
            FileType.CSHARP: {
                "expected_entropy": [
                    5.6,
                    0.8,
                ],  # Mean=5.6, StdDev=0.8
                "very_low_threshold": 4.0,  # Simple classes
                "low_threshold": 5.0,  # Basic C# code
                "medium_threshold": 6.0,  # Typical C# applications
                "medium_high_threshold": 6.8,  # Complex .NET code
                "high_threshold": 7.2,  # IL bytecode/obfuscated | > Likely suspicious
            },
            FileType.TYPESCRIPT: {
                "expected_entropy": [
                    5.4,
                    0.8,
                ],  # Mean=5.4, StdDev=0.8
                "very_low_threshold": 4.0,  # Type definitions
                "low_threshold": 4.9,  # Simple TypeScript
                "medium_threshold": 5.9,  # Typical TS code
                "medium_high_threshold": 6.8,  # Complex/transpiled
                "high_threshold": 7.2,  # Obfuscated output | > Likely suspicious
            },
            FileType.PHP: {
                "expected_entropy": [
                    5.3,
                    0.8,
                ],  # Mean=5.3, StdDev=0.8
                "very_low_threshold": 4.0,  # HTML mixed PHP
                "low_threshold": 4.8,  # Simple PHP scripts
                "medium_threshold": 5.8,  # Typical PHP code
                "medium_high_threshold": 6.7,  # Complex frameworks
                "high_threshold": 7.1,  # Obfuscated PHP | > Likely suspicious
            },
            FileType.GO: {
                "expected_entropy": [
                    5.5,
                    0.7,
                ],  # Mean=5.5, StdDev=0.7
                "very_low_threshold": 4.0,  # Simple Go code
                "low_threshold": 5.0,  # Basic programs
                "medium_threshold": 6.0,  # Typical Go code
                "medium_high_threshold": 6.8,  # Complex concurrent code
                "high_threshold": 7.2,  # Compiled binary data | > Likely suspicious
            },
            FileType.SQL: {
                "expected_entropy": [
                    5.2,
                    0.9,
                ],  # Mean=5.2, StdDev=0.9
                "very_low_threshold": 3.8,  # Simple queries
                "low_threshold": 4.7,  # Basic SQL statements
                "medium_threshold": 5.7,  # Complex queries
                "medium_high_threshold": 6.6,  # Stored procedures
                "high_threshold": 7.0,  # Obfuscated SQL | > Likely suspicious
            },
            FileType.RUST: {
                "expected_entropy": [
                    5.7,
                    0.8,
                ],  # Mean=5.7, StdDev=0.8
                "very_low_threshold": 4.0,  # Simple Rust code
                "low_threshold": 5.0,  # Basic implementations
                "medium_threshold": 6.1,  # Typical Rust code
                "medium_high_threshold": 6.9,  # Complex unsafe code
                "high_threshold": 7.3,  # Compiled/obfuscated | > Likely suspicious
            },
            FileType.SWIFT: {
                "expected_entropy": [
                    5.5,
                    0.8,
                ],  # Mean=5.5, StdDev=0.8
                "very_low_threshold": 4.0,  # Simple Swift code
                "low_threshold": 5.0,  # Basic iOS code
                "medium_threshold": 6.0,  # Typical Swift apps
                "medium_high_threshold": 6.8,  # Complex frameworks
                "high_threshold": 7.2,  # Compiled/obfuscated | > Likely suspicious
            },
            FileType.KOTLIN: {
                "expected_entropy": [
                    5.5,
                    0.8,
                ],  # Mean=5.5, StdDev=0.8
                "very_low_threshold": 4.0,  # Simple Kotlin code
                "low_threshold": 5.0,  # Basic Android code
                "medium_threshold": 6.0,  # Typical Kotlin apps
                "medium_high_threshold": 6.8,  # Complex coroutines
                "high_threshold": 7.2,  # Bytecode/obfuscated | > Likely suspicious
            },
            FileType.RUBY: {
                "expected_entropy": [
                    5.3,
                    0.8,
                ],  # Mean=5.3, StdDev=0.8
                "very_low_threshold": 4.0,  # Simple Ruby scripts
                "low_threshold": 4.8,  # Basic Rails code
                "medium_threshold": 5.8,  # Typical Ruby code
                "medium_high_threshold": 6.7,  # Complex metaprogramming
                "high_threshold": 7.1,  # Obfuscated Ruby | > Likely suspicious
            },
            FileType.R: {
                "expected_entropy": [
                    5.4,
                    0.9,
                ],  # Mean=5.4, StdDev=0.9
                "very_low_threshold": 3.9,  # Simple R scripts
                "low_threshold": 4.8,  # Basic statistics
                "medium_threshold": 5.9,  # Typical R analysis
                "medium_high_threshold": 6.8,  # Complex models
                "high_threshold": 7.1,  # Compiled R code | > Likely suspicious
            },
            FileType.VISUAL_BASIC: {
                "expected_entropy": [
                    5.2,
                    0.8,
                ],  # Mean=5.2, StdDev=0.8
                "very_low_threshold": 3.9,  # Simple VB code
                "low_threshold": 4.7,  # Basic VB.NET
                "medium_threshold": 5.7,  # Typical VB apps
                "medium_high_threshold": 6.6,  # Complex forms
                "high_threshold": 7.0,  # Obfuscated VB | > Likely suspicious
            },
            FileType.SCALA: {
                "expected_entropy": [
                    5.6,
                    0.8,
                ],  # Mean=5.6, StdDev=0.8
                "very_low_threshold": 4.0,  # Simple Scala code
                "low_threshold": 5.0,  # Basic functional code
                "medium_threshold": 6.0,  # Typical Scala apps
                "medium_high_threshold": 6.8,  # Complex Spark code
                "high_threshold": 7.2,  # Bytecode/obfuscated | > Likely suspicious
            },
            FileType.MATLAB: {
                "expected_entropy": [
                    5.4,
                    0.9,
                ],  # Mean=5.4, StdDev=0.9
                "very_low_threshold": 3.9,  # Simple scripts
                "low_threshold": 4.8,  # Basic computations
                "medium_threshold": 5.9,  # Typical MATLAB code
                "medium_high_threshold": 6.8,  # Complex algorithms
                "high_threshold": 7.1,  # Compiled MEX files | > Likely suspicious
            },
            FileType.PERL: {
                "expected_entropy": [
                    5.4,
                    0.9,
                ],  # Mean=5.4, StdDev=0.9
                "very_low_threshold": 3.9,  # Simple Perl scripts
                "low_threshold": 4.8,  # Basic regex code
                "medium_threshold": 5.9,  # Typical Perl code
                "medium_high_threshold": 6.8,  # Complex one-liners
                "high_threshold": 7.1,  # Obfuscated Perl | > Likely suspicious
            },
            FileType.DART: {
                "expected_entropy": [
                    5.5,
                    0.8,
                ],  # Mean=5.5, StdDev=0.8
                "very_low_threshold": 4.0,  # Simple Dart code
                "low_threshold": 5.0,  # Basic Flutter widgets
                "medium_threshold": 6.0,  # Typical Dart apps
                "medium_high_threshold": 6.8,  # Complex async code
                "high_threshold": 7.2,  # Compiled/obfuscated | > Likely suspicious
            },
            # Documentation Files
            FileType.DOCUMENTATION: {
                "expected_entropy": [
                    4.8,
                    0.65,
                ],  # Mean=4.8, StdDev=0.65 (combined plain/markdown)
                "very_low_threshold": 3.55,  # Highly repetitive text
                "low_threshold": 4.25,  # Simple documentation
                "medium_threshold": 5.1,  # Typical documentation
                "medium_high_threshold": 5.65,  # Technical docs with code
                "high_threshold": 6.15,  # Mixed content | Anomalous for docs
            },
            # Binary Executables
            FileType.WINDOWS_PE: {
                "expected_entropy": [
                    6.0,
                    1.2,
                ],  # Mean=6.0, StdDev=1.2
                "very_low_threshold": 4.5,  # Text sections
                "low_threshold": 5.2,  # Code sections
                "medium_threshold": 6.5,  # Typical PE files
                "medium_high_threshold": 7.0,  # Complex binaries
                "high_threshold": 7.2,  # Packed/compressed | > Likely suspicious
            },
            FileType.MACOS_MACHO: {
                "expected_entropy": [
                    5.9,
                    1.2,
                ],  # Mean=5.9, StdDev=1.2
                "very_low_threshold": 4.5,  # Text sections
                "low_threshold": 5.2,  # Code sections
                "medium_threshold": 6.4,  # Typical MachO files
                "medium_high_threshold": 6.9,  # Universal binaries
                "high_threshold": 7.2,  # Packed/compressed | > Likely suspicious
            },
            FileType.LINUX_ELF: {
                "expected_entropy": [
                    5.8,
                    1.1,
                ],  # Mean=5.8, StdDev=1.1
                "very_low_threshold": 4.5,  # Text sections
                "low_threshold": 5.1,  # Code sections
                "medium_threshold": 6.3,  # Typical ELF files
                "medium_high_threshold": 6.8,  # Complex binaries
                "high_threshold": 7.2,  # Packed/compressed | > Likely suspicious
            },
            # Encrypted/Suspicious Content
            FileType.ENCRYPTED: {
                "expected_entropy": [
                    7.99,
                    0.01,
                ],  # Mean=7.99, StdDev=0.01 (AES validated)
                "very_low_threshold": 7.8,  # Weak/broken encryption
                "low_threshold": 7.85,  # Poor encryption
                "medium_threshold": 7.9,  # Possible encryption
                "medium_high_threshold": 7.95,  # Likely encrypted
                "high_threshold": 7.98,  # Strong encryption | > Max entropy
            },
            FileType.BASE64_ENCODED: {
                "expected_entropy": [
                    6.0,
                    0.3,
                ],  # Mean=6.0, StdDev=0.3
                "very_low_threshold": 5.2,  # Partial encoding
                "low_threshold": 5.5,  # Simple base64
                "medium_threshold": 6.0,  # Typical base64
                "medium_high_threshold": 6.3,  # Complex encoded data
                "high_threshold": 6.5,  # Encrypted then encoded | > Suspicious encoding
            },
            FileType.HEX_ENCODED: {
                "expected_entropy": [
                    4.0,
                    0.2,
                ],  # Mean=4.0, StdDev=0.2
                "very_low_threshold": 3.5,  # Partial hex
                "low_threshold": 3.7,  # Simple hex strings
                "medium_threshold": 4.0,  # Typical hex encoding
                "medium_high_threshold": 4.2,  # Complex hex data
                "high_threshold": 4.4,  # Anomalous hex | > Suspicious pattern
            },
            # Unknown file types - Conservative thresholds
            FileType.UNKNOWN: {
                "expected_entropy": [
                    5.5,
                    1.5,
                ],  # Mean=5.5, StdDev=1.5 (conservative mixed content)
                "very_low_threshold": 3.0,  # Likely text/structured
                "low_threshold": 4.5,  # Probable code/data
                "medium_threshold": 6.0,  # Typical binary content
                "medium_high_threshold": 7.0,  # Complex binary/media
                "high_threshold": 7.2,  # Boundary suspicious | > Conservative threshold
            },
        }

    @classmethod
    def get_default_models(cls) -> dict[FileType, ContentAwareThresholds]:
        """
        Get pre-built Pydantic model instances for all file types.

        Returns validated ContentAwareThresholds models instead of raw dicts.
        Use this method for runtime threshold management to avoid dict-to-model conversion.

        """
        models = {}
        for file_type, data in cls.get_default_values().items():
            # Cast values from object to proper types
            expected_entropy = data["expected_entropy"]
            if isinstance(expected_entropy, list | tuple):
                float_values = [float(x) for x in expected_entropy]
                # Ensure we have exactly 2 values for tuple[float, float]
                expected_tuple_length = 2
                if len(float_values) >= expected_tuple_length:
                    entropy_tuple = (float_values[0], float_values[1])
                else:
                    entropy_tuple = (0.0, 8.0)  # fallback
            else:
                entropy_tuple = (0.0, 8.0)  # fallback

            models[file_type] = cls(
                file_type=file_type,
                expected_entropy=entropy_tuple,
                very_low_threshold=cast("float", data["very_low_threshold"]),
                low_threshold=cast("float", data["low_threshold"]),
                medium_threshold=cast("float", data["medium_threshold"]),
                medium_high_threshold=cast("float", data["medium_high_threshold"]),
                high_threshold=cast("float", data["high_threshold"]),
            )
        return models

    @classmethod
    def for_file_type(cls, file_type: FileType) -> ContentAwareThresholds:
        """
        Factory method to create a threshold model for a specific file type.

        Args:
            file_type: The file type to get thresholds for

        Returns:
            ContentAwareThresholds model instance with validated data

        Raises:
            KeyError: If file_type is not supported

        """
        defaults = cls.get_default_values()
        if file_type not in defaults:
            # Return sensible defaults for unknown file types
            return cls(
                file_type=file_type,
                expected_entropy=(5.5, 1.0),
                very_low_threshold=4.0,
                low_threshold=5.0,
                medium_threshold=6.0,
                medium_high_threshold=6.8,
                high_threshold=7.2,
            )

        data = defaults[file_type]
        # Cast values from object to proper types
        expected_entropy = data["expected_entropy"]
        if isinstance(expected_entropy, list | tuple):
            float_values = [float(x) for x in expected_entropy]
            # Ensure we have exactly 2 values for tuple[float, float]
            expected_tuple_length = 2
            if len(float_values) >= expected_tuple_length:
                entropy_tuple = (float_values[0], float_values[1])
            else:
                entropy_tuple = (0.0, 8.0)  # fallback
        else:
            entropy_tuple = (0.0, 8.0)  # fallback

        return cls(
            file_type=file_type,
            expected_entropy=entropy_tuple,
            very_low_threshold=cast("float", data["very_low_threshold"]),
            low_threshold=cast("float", data["low_threshold"]),
            medium_threshold=cast("float", data["medium_threshold"]),
            medium_high_threshold=cast("float", data["medium_high_threshold"]),
            high_threshold=cast("float", data["high_threshold"]),
        )

Programming Languages

The system includes thresholds for the top 20 programming languages (2025 rankings):

High-Level Languages

  • Python (5.5 ± 0.8) - Readable syntax, moderate entropy
  • JavaScript (5.4 ± 0.8) - Dynamic language patterns
  • TypeScript (5.4 ± 0.8) - Type annotations add structure
  • Java (5.6 ± 0.7) - Verbose syntax, consistent patterns

Systems Languages

  • C (5.7 ± 0.9) - Compact syntax, pointer operations
  • C++ (5.8 ± 0.9) - Template complexity increases entropy
  • Rust (5.7 ± 0.8) - Safety annotations, complex syntax
  • Go (5.5 ± 0.7) - Simple syntax, consistent patterns

Functional Languages

  • Scala (5.6 ± 0.8) - Functional constructs, complex types
  • R (5.4 ± 0.9) - Statistical patterns, data structures

Web Technologies

  • PHP (5.3 ± 0.8) - Mixed HTML/code patterns
  • SQL (5.2 ± 0.9) - Structured query patterns

Binary File Types

Executable Formats

  • Windows PE (6.0 ± 1.2) - Compiled code sections
  • macOS Mach-O (5.9 ± 1.2) - Universal binary complexity
  • Linux ELF (5.8 ± 1.1) - Symbol tables and metadata

Encoded Content

  • Base64 Encoded (6.0 ± 0.3) - Encoding artifacts
  • Hex Encoded (4.0 ± 0.2) - Limited character set
  • Encrypted (7.98 ± 0.02) - Maximum theoretical entropy

Documentation Files

  • Documentation (4.8 ± 0.65) - Natural language patterns
  • Includes Markdown, plain text, and technical documentation
  • Lower entropy due to repetitive language patterns
  • Special handling for code blocks within documentation

Research Foundation

The threshold values are based on extensive academic research:

  • Lyda & Hamrock (2007) - IEEE foundational entropy analysis paper
  • Davies et al. (2022) - NapierOne dataset analysis (500,000+ files)
  • Practical Security Analytics - PE file analysis validation
  • Multiple peer-reviewed studies - Statistical validation across file types

For detailed research references, see File Entropy Research.

Interpreting Results

When analyzing files, consider the content-aware context:

High Entropy in Expected Context

File: complex_algorithm.py
Entropy: 6.9
Level: MEDIUM_HIGH
Context: Complex Python code (threshold: 6.8)
Assessment: Normal for algorithmic code

High Entropy in Unexpected Context

File: simple_config.txt
Entropy: 7.1
Level: HIGH
Context: Documentation (threshold: 6.15)
Assessment: Anomalous - investigate for encoding/encryption

Best Practices

1. Consider File Context

Always interpret entropy results within the context of the detected file type:

  • Entropy between 6.0 and 7.0 in a binary executable is normal.
  • High entropy in source code files could indicate an embedded encryption secret.

2. Use Multiple Indicators

Combine entropy analysis with other indicators: - File size anomalies - Unexpected file extensions - Unusual file locations - ssf_tools analyze credentials results

3. Validate Detection Accuracy

Verify that MIME and language detection correctly identified the file type: - Check the detection confidence scores - Review files that fall into UNKNOWN category - Manually verify suspicious classifications

4. Custom Threshold Tuning

For specialized environments, consider adjusting thresholds: - Corporate environments may have different baseline entropy - Specific applications may generate unique patterns - Domain-specific file types may need custom thresholds

Troubleshooting

Unexpected Classifications

Empty results worksheets in Excel

  • No findings of --risk-threshold or higher (defaul medium_high)
  • Confirm by reducing --risk-threshold on a smaller sample of files

Files classified with wrong file type

  • Check MIME detection accuracy and file extension mapping with ssf_tools utils file-info

Normal files flagged as high entropy

  • Review threshold values for the detected file type

Suspicious files not detected

  • Verify file type detection with ssf_tools utils file-info
  • Lower the reporting threshold with --risk-threshold