SSF Tools - File Processing Service Architecture¶
Overview¶
The File Processing Service provides a unified, protocol-based approach to file operations across all SSF Tools commands. It supports various file operations, including encoding detection, MIME type detection, hashing, validation, discovery, and specialized binary content streaming for forensic analysis and entropy calculations. This architecture ensures consistent file handling while maintaining loose coupling through dependency injection patterns.
Architectural Principles¶
Design Goals¶
- Protocol-Based Design: Define clear contracts for all file operations.
- Separation of Concerns: Delegate each file operation to a dedicated service.
- Testability: Enable easy mocking and stubbing for unit and integration tests.
- Flexibility: Allow swappable implementations for different algorithms and strategies.
- Error Handling: Implement robust error handling for file operations.
Key Benefits¶
- Consistency: Unified approach to file processing across commands.
- Reusability: Modular services can be reused across different workflows.
- Extensibility: Easily add new file operations or detection algorithms.
- Performance: Optimized for large file processing and streaming.
- Maintainability: Clear separation of responsibilities simplifies updates.
Architecture Overview¶
File Processing Protocols¶
FileProcessingService Protocol¶
The FileProcessingService orchestrates multiple specialized services to provide comprehensive file analysis:
from typing import Protocol
class FileProcessingServiceProtocol(Protocol):
"""Protocol for unified file processing."""
def detect_encoding(self, file_path: str) -> str | None:
"""Detect the encoding of a file."""
def detect_mime_type(self, file_path: str) -> str | None:
"""Detect the MIME type of a file."""
def generate_hash(self, file_path: str) -> str:
"""Generate a cryptographic hash for a file."""
def validate_file(self, file_path: str) -> bool:
"""Validate the existence and accessibility of a file."""
def discover_files(self, directory: str, pattern: str) -> list[str]:
"""Discover files matching a pattern in a directory."""
Configuration Models¶
File Processing Configuration¶
The file processing service uses the following configuration:
from pydantic import BaseModel
class FileProcessingConfig(BaseModel):
"""Configuration for file processing services."""
max_file_size_mb: int = 100 # Maximum file size for processing
hash_algorithm: str = "sha256" # Default hash algorithm
Global Configuration Integration¶
# In ssf-tools-config.yaml
file_processing:
max_file_size_mb: 100 # Maximum file size for processing
hash_algorithm: sha256 # Default hash algorithm
Service Implementation¶
FileProcessingService Implementation¶
The FileProcessingService integrates multiple specialized services:
from kp_ssf_tools.core.services.file_processing import (
CharsetNormalizerEncodingDetector,
AutoMimeDetector,
ConfigurableFileHashGenerator,
BasicFileValidator,
FileDiscoveryService,
)
class FileProcessingService:
def __init__(
self,
encoding_detector: CharsetNormalizerEncodingDetector,
mime_detector: AutoMimeDetector,
hash_generator: ConfigurableFileHashGenerator,
file_validator: BasicFileValidator,
file_discovery: FileDiscoveryService,
):
self.encoding_detector = encoding_detector
self.mime_detector = mime_detector
self.hash_generator = hash_generator
self.file_validator = file_validator
self.file_discovery = file_discovery
def detect_encoding(self, file_path: str) -> str | None:
return self.encoding_detector.detect(file_path)
def detect_mime_type(self, file_path: str) -> str | None:
return self.mime_detector.detect(file_path)
def generate_hash(self, file_path: str) -> str:
return self.hash_generator.generate(file_path)
def validate_file(self, file_path: str) -> bool:
return self.file_validator.validate(file_path)
def discover_files(self, directory: str, pattern: str) -> list[str]:
return self.file_discovery.discover(directory, pattern)
Container Integration¶
CoreContainer Registration¶
The CoreContainer registers all file processing dependencies:
from dependency_injector import containers, providers
from kp_ssf_tools.core.services.file_processing import (
CharsetNormalizerEncodingDetector,
AutoMimeDetector,
ConfigurableFileHashGenerator,
BasicFileValidator,
FileDiscoveryService,
FileProcessingService,
)
class CoreContainer(containers.DeclarativeContainer):
encoding_detector = providers.Singleton(CharsetNormalizerEncodingDetector)
mime_detector = providers.Singleton(AutoMimeDetector)
hash_generator = providers.Singleton(ConfigurableFileHashGenerator)
file_validator = providers.Singleton(BasicFileValidator)
file_discovery = providers.Singleton(FileDiscoveryService)
file_processing = providers.Singleton(
FileProcessingService,
encoding_detector=encoding_detector,
mime_detector=mime_detector,
hash_generator=hash_generator,
file_validator=file_validator,
file_discovery=file_discovery,
)
CLI Integration¶
Example Command¶
from dependency_injector.wiring import inject, Provide
from kp_ssf_tools.containers import CoreContainer
@inject
def process_file_command(
file_path: str,
file_processing=Provide[CoreContainer.file_processing],
):
mime_type = file_processing.detect_mime_type(file_path)
print(f"MIME type: {mime_type}")