# Audit Automation Framework

## Overview

The Audit Automation Framework is a workflow-driven platform developed to support the complete lifecycle of Computer Vision dataset preparation, validation, training, model evaluation, and deployment readiness.

The framework was created to address a common challenge encountered in Computer Vision projects:

While model training often receives the most attention, the majority of project effort is typically spent on:

* Dataset preparation
* Annotation validation
* Data quality checks
* Training dataset generation
* Model management
* Workflow tracking
* Operational auditing

The framework consolidates these activities into a unified application that allows project teams to configure, execute, monitor, and audit AI training pipelines through a common interface.

---

# Objectives

The framework was designed with the following goals:

* Standardize Computer Vision workflows
* Improve dataset quality
* Reduce manual processing effort
* Enable repeatable training pipelines
* Improve auditability of AI projects
* Provide operational visibility into model development activities

---

# Architecture Overview

```text
+------------------------------------------------------+
|                 PROJECT CONFIGURATION                |
+------------------------------------------------------+

 Project Details

 Event Definitions

 Class Definitions

 Global Parameters

 Model Selection

+------------------------------------------------------+
|                     UTILITY HUB                      |
+------------------------------------------------------+

 Dataset Utilities

 Annotation Utilities

 Validation Utilities

 Training Utilities

 Prediction Utilities

 Model Utilities

+------------------------------------------------------+
|                     PIPELINE HUB                     |
+------------------------------------------------------+

 Utility Selection

 Pipeline Sequencing

 Dependency Management

 Configuration Management

 Execution Planning

+------------------------------------------------------+
|                    EXECUTION HUB                     |
+------------------------------------------------------+

 Command Generation

 Pipeline Execution

 Watchdog Monitoring

 Execution Tracking

 Status Management

+------------------------------------------------------+
|                    REPORTING HUB                     |
+------------------------------------------------------+

 Dataset Health

 Pipeline Readiness

 Execution Audit

 Project Summary

 Final Reports
```

---

# Design Philosophy

The framework follows a modular approach.

Each utility performs a specific task.

Rather than creating one large monolithic process, complex workflows are constructed by combining smaller reusable modules.

This provides:

* Better maintainability
* Easier testing
* Faster troubleshooting
* Reusable workflows
* Flexible project configurations

---

# Project Configuration Module

The first layer captures project-level information.

## Configuration Includes

* Project Name
* Event Count
* Class Definitions
* Model Selection
* Global Parameters
* Environment Initialization

The configuration becomes the foundation for all downstream pipeline activities.

---

# Utility Hub

The Utility Hub serves as a registry of reusable processing modules.

Each utility contains:

* Purpose
* Business Description
* Technical Description
* Required Parameters
* Input Definitions
* Output Definitions

This allows utilities to be reused across projects without modification.

---

# Example Utility Categories

## Dataset Utilities

Purpose:

Prepare datasets for model development.

Examples:

* Video to Image Extraction
* Image Processing
* Dataset Splitting
* Clean Image Collection

---

## Annotation Utilities

Purpose:

Support annotation generation and validation.

Examples:

* Annotation Generation
* Box Validation
* Label Review
* Annotation Quality Checks

The framework includes validation workflows that help ensure annotation consistency before training begins.

---

## Training Utilities

Purpose:

Train machine learning models using prepared datasets.

Capabilities include:

* Dataset Preparation
* Training Configuration
* Model Generation
* Fine-Tuning

The framework supports transfer learning workflows designed for limited datasets.

---

## Prediction Utilities

Purpose:

Evaluate trained models against new datasets.

Capabilities:

* Model Loading
* Event Classification
* Prediction Review
* Validation

---

# Pipeline Hub

The Pipeline Hub is the core orchestration layer.

Instead of running utilities manually, users assemble them into a sequence of processing steps.

Example:

```text
Video Input
     |
     V
Frame Extraction
     |
     V
Image Processing
     |
     V
Annotation Validation
     |
     V
Dataset Split
     |
     V
Model Training
     |
     V
Prediction Validation
```

Each step becomes a reusable pipeline node.

---

# Configuration Management

Each pipeline node maintains its own configuration.

Supported parameter types include:

* Folder Paths
* Files
* CSV Inputs
* Text Inputs
* Numeric Parameters
* Model Parameters

Configurations can be saved and restored for repeatable execution.

---

# Execution Hub

The Execution Hub converts configured pipeline nodes into executable commands.

Capabilities include:

* Dynamic Command Generation
* Execution Validation
* Readiness Checks
* Runtime Monitoring
* Execution Counters

The framework automatically verifies mandatory parameters before execution.

---

# Watchdog Monitoring

A monitoring layer tracks pipeline execution status.

Capabilities:

* Execution Tracking
* Folder Monitoring
* New File Detection
* Synchronization Checks
* Status Reporting

This provides visibility into long-running dataset and training operations.

---

# Dataset Health Analysis

The framework performs readiness checks before training.

Examples:

* Folder Validation
* Dataset Counts
* Missing Configuration Detection
* Path Verification

This reduces training failures caused by incomplete project setup.

---

# Reporting Framework

The reporting module provides operational visibility across the project.

Reports include:

## Project Snapshot

* Project Details
* Event Definitions
* Class Definitions

## Pipeline Readiness

* Configuration Coverage
* Missing Parameters
* Validation Status

## Dataset Health

* Dataset Volumes
* Annotation Counts
* Data Availability

## Execution Audit

* Run History
* Execution Counts
* Pipeline Status

These reports help teams assess project readiness before model training begins.

---

# Persistence Layer

The framework includes a centralized persistence mechanism.

Stored Components:

* Project Configuration
* Pipeline Definitions
* Utility Configurations
* Execution Counters
* Dynamic Parameters

This enables projects to be paused and resumed without rebuilding configurations.

---

# Technology Stack

## Application Layer

* Streamlit

## Backend

* Python

## Data Processing

* Pandas
* NumPy

## Computer Vision

* OpenCV
* YOLO Ecosystem

## Deep Learning

* TensorFlow
* Keras
* MobileNetV2

## OCR

* Surya OCR Components

## Reporting

* Plotly
* PDF Generation

---

# Practical Benefits

The framework helps standardize Computer Vision project execution by:

* Reducing repetitive setup work
* Improving dataset quality
* Encouraging reusable workflows
* Improving training consistency
* Providing execution visibility
* Supporting auditability of AI development activities

---

# Key Learning

A successful AI project depends not only on model architecture but also on the quality of datasets, validation processes, workflow governance, and execution discipline.

This framework was developed to bring structure and repeatability to those activities.

---

# Conclusion

The Audit Automation Framework is a reusable workflow platform for Computer Vision projects.

Rather than focusing solely on model training, it addresses the broader lifecycle of AI development including dataset preparation, annotation validation, pipeline orchestration, execution monitoring, reporting, and project governance.

The project demonstrates practical experience in workflow design, MLOps-style orchestration, Computer Vision operations, reusable utility development, reporting systems, and scalable AI project management.