Building MANTAX: An Ethical Facial Recognition System for NGOs

python computer-vision flask facial-recognition onnx electron

Introduction

This is a technical deep-dive into MANTAX, an ethical facial recognition system designed for NGO use cases. In this blogpost, we'll explore the complete architecture, data flows, and implementation details—everything a developer needs to understand how this system works.

System Architecture Overview

MANTAX follows a three-tier architecture with clear separation between presentation, business logic, and machine learning components.

Loading diagram...

Technology Stack

| Layer | Technology | Purpose | |-------|------------|---------| | Frontend | Electron + Vanilla JS | Desktop app with custom titlebar | | Styling | SCSS → CSS | macOS Tahoe Liquid Glass design | | Backend | Flask (Python) | REST API with 31 endpoints | | ML Runtime | ONNX Runtime (ArcFace) + PyTorch (FaceNet) | Dual-model embedding extraction | | Face Detection | OpenCV DNN + MediaPipe | Face localization + 468-point landmarks |

MANTAX Interface

The Data Pipeline

When a user uploads an image, it flows through a well-defined pipeline. Let's trace this journey:

Loading diagram...

Face Detection Module

The detection module (src/detection/__init__.py) handles the first critical step: finding where faces are in an image.

Detection Flow

Loading diagram...

Primary Detection Method: OpenCV DNN

The system uses a pre-trained Caffe model for deep learning-based face detection:

Model files required:

deploy.prototxt.txt - Caffe architecture definition
res10_300x300_ssd_iter_140000.caffemodel - Pre-trained weights

Fallback: Haar Cascade

If DNN fails to load, the system gracefully falls back to Haar Cascade:

Face ROI Extraction

Once faces are detected, the API extracts the region of interest (ROI):

Face Detection Visualization

Facial Landmark Detection

After detecting faces, the system extracts 468 facial landmarks using MediaPipe Face Mesh:

Loading diagram...

Landmark Extraction Code

Geometric Feature Extraction

For comparison purposes, we extract scale-invariant geometric features:

Landmark Detection

Embedding Extraction (Dual-Model)

This is the core of the system—converting face images into mathematical embeddings that can be compared.

Loading diagram...

ArcFace Implementation

ArcFace provides superior discrimination between different faces:

FaceNet Implementation

FaceNet provides secondary signals and neural activation visualizations:

Neural Network Activations

FaceNet also extracts intermediate layer activations for visualization:

Activation layers extracted: | Layer | Output Shape | Purpose | |-------|-------------|---------| | conv1 | (64, 112, 112) | First convolutions | | bn1 | (64, 112, 112) | First batch norm | | layer1 | (64, 56, 56) | Low-level features | | layer2 | (128, 28, 28) | Mid-level features | | layer3 | (256, 14, 14) | High-level features | | layer4 | (512, 7, 7) | Final features | | embedding | (128,) | Final embedding |

ArcFace Embedding

The Compare Endpoint (Complete Flow)

Here's the full comparison flow from the /api/compare endpoint:

Loading diagram...

Dual-Model Scoring

The scoring combines multiple signals with learned weights:

Confidence Bands

Rather than binary decisions, the system outputs confidence bands:

API Endpoints Reference

The Flask API exposes 31 endpoints for all operations:

Loading diagram...

| Endpoint | Method | Purpose | |----------|--------|---------| | /api/health | GET | System health check | | /api/embedding-info | GET | Current model info | | /api/diagnostics | GET | System diagnostics | | /api/detect | POST | Face detection | | /api/extract | POST | Embedding extraction | | /api/add-reference | POST | Add reference image | | /api/references | GET | List all references | | /api/references/<id> | DELETE | Remove reference | | /api/compare | POST | Compare embeddings | | /api/visualizations/<type> | GET | Get visualization | | /api/clear | POST | Clear session |

Visualizations (14 Types)

The system provides 14 different AI visualizations to help investigators understand why scores were computed:

Loading diagram...

Visualization Implementation Example

Neural Activations

Session State Management

The API maintains in-memory session state:

Persistence

References are saved to JSON for persistence across restarts:

Testing Infrastructure

The system includes comprehensive tests:

Loading diagram...

Running Tests

File Structure

Key Design Decisions

1. Dual-Model Architecture

Using both ArcFace (512-dim) and FaceNet (128-dim) together provides better discrimination than either alone. ArcFace handles the primary matching while FaceNet provides secondary signals and activation visualizations.

2. Confidence Bands, Not Binary Decisions

The system outputs confidence bands (Very High/High/Moderate/Insufficient) instead of "match/no-match". This ensures human investigators always make the final decision.

3. Local-Only Processing

No images are sent to external servers. All computation happens on the user's machine, addressing NGO privacy concerns.

4. Non-Reversible Embeddings

Facial embeddings cannot be used to reconstruct the original face—providing an additional layer of privacy protection.

Every reference image includes metadata about consent status, source, and purpose—essential for NGO documentation requirements.

Summary

MANTAX is a fully functional ethical facial recognition system built with:

Flask API (2,131 lines) with 31 endpoints
Dual-model embedding (ArcFace 512-dim + FaceNet 128-dim)
OpenCV DNN face detection with MediaPipe landmarks
Electron desktop app with macOS Tahoe Liquid Glass UI
Comprehensive testing (E2E, edge cases, frontend)

The system is designed for NGO use cases with:

Local-only processing (no cloud)
Human-in-the-loop verification
Consent tracking
Confidence bands instead of binary decisions

In the next blogpost, we'll explore the JavaScript refactoring journey—how we tackled a 3,429-line monolithic app.js and broke it into 7 modular files following best practices.

Demo Video

Here's a demo showing a no-match scenario:

Next: The JavaScript Refactoring Story

Building MANTAX: An Ethical Facial Recognition System for NGOs

python computer-vision flask facial-recognition onnx electron

Introduction

System Architecture Overview

MANTAX follows a three-tier architecture with clear separation between presentation, business logic, and machine learning components.

Loading diagram...

Technology Stack

MANTAX Interface

The Data Pipeline

When a user uploads an image, it flows through a well-defined pipeline. Let's trace this journey:

Loading diagram...

Face Detection Module

The detection module (src/detection/__init__.py) handles the first critical step: finding where faces are in an image.

Detection Flow

Loading diagram...

Primary Detection Method: OpenCV DNN

The system uses a pre-trained Caffe model for deep learning-based face detection:

Model files required:

deploy.prototxt.txt - Caffe architecture definition
res10_300x300_ssd_iter_140000.caffemodel - Pre-trained weights

Fallback: Haar Cascade

If DNN fails to load, the system gracefully falls back to Haar Cascade:

Face ROI Extraction

Once faces are detected, the API extracts the region of interest (ROI):

Face Detection Visualization

Facial Landmark Detection

After detecting faces, the system extracts 468 facial landmarks using MediaPipe Face Mesh:

Loading diagram...

Landmark Extraction Code

Geometric Feature Extraction

For comparison purposes, we extract scale-invariant geometric features:

Landmark Detection

Embedding Extraction (Dual-Model)

This is the core of the system—converting face images into mathematical embeddings that can be compared.

Loading diagram...

ArcFace Implementation

ArcFace provides superior discrimination between different faces:

FaceNet Implementation

FaceNet provides secondary signals and neural activation visualizations:

Neural Network Activations

FaceNet also extracts intermediate layer activations for visualization:

ArcFace Embedding

The Compare Endpoint (Complete Flow)

Here's the full comparison flow from the /api/compare endpoint:

Loading diagram...

Dual-Model Scoring

The scoring combines multiple signals with learned weights:

Confidence Bands

Rather than binary decisions, the system outputs confidence bands:

API Endpoints Reference

The Flask API exposes 31 endpoints for all operations:

Loading diagram...

Visualizations (14 Types)

The system provides 14 different AI visualizations to help investigators understand why scores were computed:

Loading diagram...

Visualization Implementation Example

Neural Activations

Session State Management

The API maintains in-memory session state:

Persistence

References are saved to JSON for persistence across restarts:

Testing Infrastructure

The system includes comprehensive tests:

Loading diagram...

Running Tests

File Structure

Key Design Decisions

1. Dual-Model Architecture

2. Confidence Bands, Not Binary Decisions

The system outputs confidence bands (Very High/High/Moderate/Insufficient) instead of "match/no-match". This ensures human investigators always make the final decision.

3. Local-Only Processing

No images are sent to external servers. All computation happens on the user's machine, addressing NGO privacy concerns.

4. Non-Reversible Embeddings

Facial embeddings cannot be used to reconstruct the original face—providing an additional layer of privacy protection.

Every reference image includes metadata about consent status, source, and purpose—essential for NGO documentation requirements.

Summary

MANTAX is a fully functional ethical facial recognition system built with:

Flask API (2,131 lines) with 31 endpoints
Dual-model embedding (ArcFace 512-dim + FaceNet 128-dim)
OpenCV DNN face detection with MediaPipe landmarks
Electron desktop app with macOS Tahoe Liquid Glass UI
Comprehensive testing (E2E, edge cases, frontend)

The system is designed for NGO use cases with:

Local-only processing (no cloud)
Human-in-the-loop verification
Consent tracking
Confidence bands instead of binary decisions

In the next blogpost, we'll explore the JavaScript refactoring journey—how we tackled a 3,429-line monolithic app.js and broke it into 7 modular files following best practices.

Demo Video

Here's a demo showing a no-match scenario:

Next: The JavaScript Refactoring Story

Introduction

System Architecture Overview

Technology Stack

The Data Pipeline

Face Detection Module

Detection Flow

Primary Detection Method: OpenCV DNN

Fallback: Haar Cascade

Face ROI Extraction

Facial Landmark Detection

Landmark Extraction Code

Geometric Feature Extraction

Embedding Extraction (Dual-Model)

ArcFace Implementation

FaceNet Implementation

Neural Network Activations

The Compare Endpoint (Complete Flow)

Dual-Model Scoring

Confidence Bands

API Endpoints Reference

Visualizations (14 Types)

Visualization Implementation Example

Session State Management

Persistence

Testing Infrastructure

Running Tests

File Structure

Key Design Decisions

1. Dual-Model Architecture

2. Confidence Bands, Not Binary Decisions

3. Local-Only Processing

4. Non-Reversible Embeddings

5. Consent Tracking

Summary

Demo Video

Introduction

System Architecture Overview

Technology Stack

The Data Pipeline

Face Detection Module

Detection Flow

Primary Detection Method: OpenCV DNN

Fallback: Haar Cascade

Face ROI Extraction

Facial Landmark Detection

Landmark Extraction Code

Geometric Feature Extraction

Embedding Extraction (Dual-Model)

ArcFace Implementation

FaceNet Implementation

Neural Network Activations

The Compare Endpoint (Complete Flow)

Dual-Model Scoring

Confidence Bands

API Endpoints Reference

Visualizations (14 Types)

Visualization Implementation Example

Session State Management

Persistence

Testing Infrastructure

Running Tests

File Structure

Key Design Decisions

1. Dual-Model Architecture

2. Confidence Bands, Not Binary Decisions

3. Local-Only Processing

4. Non-Reversible Embeddings

5. Consent Tracking

Summary

Demo Video