# Audit Automation Framework ## Overview The Audit Automation Framework is a workflow-driven platform developed to support the complete lifecycle of Computer Vision dataset preparation, validation, training, model evaluation, and deployment readiness. The framework was created to address a common challenge encountered in Computer Vision projects: While model training often receives the most attention, the majority of project effort is typically spent on: * Dataset preparation * Annotation validation * Data quality checks * Training dataset generation * Model management * Workflow tracking * Operational auditing The framework consolidates these activities into a unified application that allows project teams to configure, execute, monitor, and audit AI training pipelines through a common interface. --- # Objectives The framework was designed with the following goals: * Standardize Computer Vision workflows * Improve dataset quality * Reduce manual processing effort * Enable repeatable training pipelines * Improve auditability of AI projects * Provide operational visibility into model development activities --- # Architecture Overview ```text +------------------------------------------------------+ | PROJECT CONFIGURATION | +------------------------------------------------------+ Project Details Event Definitions Class Definitions Global Parameters Model Selection +------------------------------------------------------+ | UTILITY HUB | +------------------------------------------------------+ Dataset Utilities Annotation Utilities Validation Utilities Training Utilities Prediction Utilities Model Utilities +------------------------------------------------------+ | PIPELINE HUB | +------------------------------------------------------+ Utility Selection Pipeline Sequencing Dependency Management Configuration Management Execution Planning +------------------------------------------------------+ | EXECUTION HUB | +------------------------------------------------------+ Command Generation Pipeline Execution Watchdog Monitoring Execution Tracking Status Management +------------------------------------------------------+ | REPORTING HUB | +------------------------------------------------------+ Dataset Health Pipeline Readiness Execution Audit Project Summary Final Reports ``` --- # Design Philosophy The framework follows a modular approach. Each utility performs a specific task. Rather than creating one large monolithic process, complex workflows are constructed by combining smaller reusable modules. This provides: * Better maintainability * Easier testing * Faster troubleshooting * Reusable workflows * Flexible project configurations --- # Project Configuration Module The first layer captures project-level information. ## Configuration Includes * Project Name * Event Count * Class Definitions * Model Selection * Global Parameters * Environment Initialization The configuration becomes the foundation for all downstream pipeline activities. --- # Utility Hub The Utility Hub serves as a registry of reusable processing modules. Each utility contains: * Purpose * Business Description * Technical Description * Required Parameters * Input Definitions * Output Definitions This allows utilities to be reused across projects without modification. --- # Example Utility Categories ## Dataset Utilities Purpose: Prepare datasets for model development. Examples: * Video to Image Extraction * Image Processing * Dataset Splitting * Clean Image Collection --- ## Annotation Utilities Purpose: Support annotation generation and validation. Examples: * Annotation Generation * Box Validation * Label Review * Annotation Quality Checks The framework includes validation workflows that help ensure annotation consistency before training begins. --- ## Training Utilities Purpose: Train machine learning models using prepared datasets. Capabilities include: * Dataset Preparation * Training Configuration * Model Generation * Fine-Tuning The framework supports transfer learning workflows designed for limited datasets. --- ## Prediction Utilities Purpose: Evaluate trained models against new datasets. Capabilities: * Model Loading * Event Classification * Prediction Review * Validation --- # Pipeline Hub The Pipeline Hub is the core orchestration layer. Instead of running utilities manually, users assemble them into a sequence of processing steps. Example: ```text Video Input | V Frame Extraction | V Image Processing | V Annotation Validation | V Dataset Split | V Model Training | V Prediction Validation ``` Each step becomes a reusable pipeline node. --- # Configuration Management Each pipeline node maintains its own configuration. Supported parameter types include: * Folder Paths * Files * CSV Inputs * Text Inputs * Numeric Parameters * Model Parameters Configurations can be saved and restored for repeatable execution. --- # Execution Hub The Execution Hub converts configured pipeline nodes into executable commands. Capabilities include: * Dynamic Command Generation * Execution Validation * Readiness Checks * Runtime Monitoring * Execution Counters The framework automatically verifies mandatory parameters before execution. --- # Watchdog Monitoring A monitoring layer tracks pipeline execution status. Capabilities: * Execution Tracking * Folder Monitoring * New File Detection * Synchronization Checks * Status Reporting This provides visibility into long-running dataset and training operations. --- # Dataset Health Analysis The framework performs readiness checks before training. Examples: * Folder Validation * Dataset Counts * Missing Configuration Detection * Path Verification This reduces training failures caused by incomplete project setup. --- # Reporting Framework The reporting module provides operational visibility across the project. Reports include: ## Project Snapshot * Project Details * Event Definitions * Class Definitions ## Pipeline Readiness * Configuration Coverage * Missing Parameters * Validation Status ## Dataset Health * Dataset Volumes * Annotation Counts * Data Availability ## Execution Audit * Run History * Execution Counts * Pipeline Status These reports help teams assess project readiness before model training begins. --- # Persistence Layer The framework includes a centralized persistence mechanism. Stored Components: * Project Configuration * Pipeline Definitions * Utility Configurations * Execution Counters * Dynamic Parameters This enables projects to be paused and resumed without rebuilding configurations. --- # Technology Stack ## Application Layer * Streamlit ## Backend * Python ## Data Processing * Pandas * NumPy ## Computer Vision * OpenCV * YOLO Ecosystem ## Deep Learning * TensorFlow * Keras * MobileNetV2 ## OCR * Surya OCR Components ## Reporting * Plotly * PDF Generation --- # Practical Benefits The framework helps standardize Computer Vision project execution by: * Reducing repetitive setup work * Improving dataset quality * Encouraging reusable workflows * Improving training consistency * Providing execution visibility * Supporting auditability of AI development activities --- # Key Learning A successful AI project depends not only on model architecture but also on the quality of datasets, validation processes, workflow governance, and execution discipline. This framework was developed to bring structure and repeatability to those activities. --- # Conclusion The Audit Automation Framework is a reusable workflow platform for Computer Vision projects. Rather than focusing solely on model training, it addresses the broader lifecycle of AI development including dataset preparation, annotation validation, pipeline orchestration, execution monitoring, reporting, and project governance. The project demonstrates practical experience in workflow design, MLOps-style orchestration, Computer Vision operations, reusable utility development, reporting systems, and scalable AI project management.