Skip to content

SSF Tools - File Processing Service Architecture

Overview

The File Processing Service provides a unified, protocol-based approach to file operations across all SSF Tools commands. It supports various file operations, including encoding detection, MIME type detection, hashing, validation, discovery, and specialized binary content streaming for forensic analysis and entropy calculations. This architecture ensures consistent file handling while maintaining loose coupling through dependency injection patterns.

Architectural Principles

Design Goals

  • Protocol-Based Design: Define clear contracts for all file operations.
  • Separation of Concerns: Delegate each file operation to a dedicated service.
  • Testability: Enable easy mocking and stubbing for unit and integration tests.
  • Flexibility: Allow swappable implementations for different algorithms and strategies.
  • Error Handling: Implement robust error handling for file operations.

Key Benefits

  • Consistency: Unified approach to file processing across commands.
  • Reusability: Modular services can be reused across different workflows.
  • Extensibility: Easily add new file operations or detection algorithms.
  • Performance: Optimized for large file processing and streaming.
  • Maintainability: Clear separation of responsibilities simplifies updates.

Architecture Overview

graph TD CORE[Core Container<br/>Infrastructure Services] FPS[File Processing Service<br/>Unified File Operations] CORE --> FPS subgraph FILE_OPS ["📁 File Operations"] ENCODING[Encoding Detection] MIME[MIME Type Detection] HASH[Hash Generation] VALIDATION[File Validation] DISCOVERY[File Discovery] STREAMING[Content Streaming] end FPS --> FILE_OPS CLI[CLI Commands<br/>User Interaction] CLI --> FPS CONFIG[Configuration<br/>Global and Service-Specific Settings] CONFIG --> CORE classDef coreService fill:#e1f5fe,stroke:#0277bd,stroke-width:2px classDef component fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px classDef cli fill:#e3f2fd,stroke:#1565c0,stroke-width:2px classDef config fill:#fce4ec,stroke:#c2185b,stroke-width:2px class CORE,FPS coreService class FILE_OPS component class CLI cli class CONFIG config

File Processing Protocols

FileProcessingService Protocol

The FileProcessingService orchestrates multiple specialized services to provide comprehensive file analysis:

from typing import Protocol

class FileProcessingServiceProtocol(Protocol):
    """Protocol for unified file processing."""

    def detect_encoding(self, file_path: str) -> str | None:
        """Detect the encoding of a file."""

    def detect_mime_type(self, file_path: str) -> str | None:
        """Detect the MIME type of a file."""

    def generate_hash(self, file_path: str) -> str:
        """Generate a cryptographic hash for a file."""

    def validate_file(self, file_path: str) -> bool:
        """Validate the existence and accessibility of a file."""

    def discover_files(self, directory: str, pattern: str) -> list[str]:
        """Discover files matching a pattern in a directory."""

Configuration Models

File Processing Configuration

The file processing service uses the following configuration:

from pydantic import BaseModel

class FileProcessingConfig(BaseModel):
    """Configuration for file processing services."""
    max_file_size_mb: int = 100  # Maximum file size for processing
    hash_algorithm: str = "sha256"  # Default hash algorithm

Global Configuration Integration

# In ssf-tools-config.yaml
file_processing:
  max_file_size_mb: 100  # Maximum file size for processing
  hash_algorithm: sha256  # Default hash algorithm

Service Implementation

FileProcessingService Implementation

The FileProcessingService integrates multiple specialized services:

from kp_ssf_tools.core.services.file_processing import (
    CharsetNormalizerEncodingDetector,
    AutoMimeDetector,
    ConfigurableFileHashGenerator,
    BasicFileValidator,
    FileDiscoveryService,
)

class FileProcessingService:
    def __init__(
        self,
        encoding_detector: CharsetNormalizerEncodingDetector,
        mime_detector: AutoMimeDetector,
        hash_generator: ConfigurableFileHashGenerator,
        file_validator: BasicFileValidator,
        file_discovery: FileDiscoveryService,
    ):
        self.encoding_detector = encoding_detector
        self.mime_detector = mime_detector
        self.hash_generator = hash_generator
        self.file_validator = file_validator
        self.file_discovery = file_discovery

    def detect_encoding(self, file_path: str) -> str | None:
        return self.encoding_detector.detect(file_path)

    def detect_mime_type(self, file_path: str) -> str | None:
        return self.mime_detector.detect(file_path)

    def generate_hash(self, file_path: str) -> str:
        return self.hash_generator.generate(file_path)

    def validate_file(self, file_path: str) -> bool:
        return self.file_validator.validate(file_path)

    def discover_files(self, directory: str, pattern: str) -> list[str]:
        return self.file_discovery.discover(directory, pattern)

Container Integration

CoreContainer Registration

The CoreContainer registers all file processing dependencies:

from dependency_injector import containers, providers
from kp_ssf_tools.core.services.file_processing import (
    CharsetNormalizerEncodingDetector,
    AutoMimeDetector,
    ConfigurableFileHashGenerator,
    BasicFileValidator,
    FileDiscoveryService,
    FileProcessingService,
)

class CoreContainer(containers.DeclarativeContainer):
    encoding_detector = providers.Singleton(CharsetNormalizerEncodingDetector)
    mime_detector = providers.Singleton(AutoMimeDetector)
    hash_generator = providers.Singleton(ConfigurableFileHashGenerator)
    file_validator = providers.Singleton(BasicFileValidator)
    file_discovery = providers.Singleton(FileDiscoveryService)

    file_processing = providers.Singleton(
        FileProcessingService,
        encoding_detector=encoding_detector,
        mime_detector=mime_detector,
        hash_generator=hash_generator,
        file_validator=file_validator,
        file_discovery=file_discovery,
    )

CLI Integration

Example Command

from dependency_injector.wiring import inject, Provide
from kp_ssf_tools.containers import CoreContainer

@inject
def process_file_command(
    file_path: str,
    file_processing=Provide[CoreContainer.file_processing],
):
    mime_type = file_processing.detect_mime_type(file_path)
    print(f"MIME type: {mime_type}")