Building MANTAX: An Ethical Facial Recognition System for NGOs
Introduction
This is a technical deep-dive into MANTAX, an ethical facial recognition system designed for NGO use cases. In this blogpost, we'll explore the complete architecture, data flows, and implementation details—everything a developer needs to understand how this system works.
System Architecture Overview
MANTAX follows a three-tier architecture with clear separation between presentation, business logic, and machine learning components.
Technology Stack
| Layer | Technology | Purpose | |-------|------------|---------| | Frontend | Electron + Vanilla JS | Desktop app with custom titlebar | | Styling | SCSS → CSS | macOS Tahoe Liquid Glass design | | Backend | Flask (Python) | REST API with 31 endpoints | | ML Runtime | ONNX Runtime (ArcFace) + PyTorch (FaceNet) | Dual-model embedding extraction | | Face Detection | OpenCV DNN + MediaPipe | Face localization + 468-point landmarks |

The Data Pipeline
When a user uploads an image, it flows through a well-defined pipeline. Let's trace this journey:
Face Detection Module
The detection module (src/detection/__init__.py) handles the first critical step: finding where faces are in an image.
Detection Flow
Primary Detection Method: OpenCV DNN
The system uses a pre-trained Caffe model for deep learning-based face detection:
Model files required:
deploy.prototxt.txt- Caffe architecture definitionres10_300x300_ssd_iter_140000.caffemodel- Pre-trained weights
Fallback: Haar Cascade
If DNN fails to load, the system gracefully falls back to Haar Cascade:
Face ROI Extraction
Once faces are detected, the API extracts the region of interest (ROI):

Facial Landmark Detection
After detecting faces, the system extracts 468 facial landmarks using MediaPipe Face Mesh:
Landmark Extraction Code
Geometric Feature Extraction
For comparison purposes, we extract scale-invariant geometric features:

Embedding Extraction (Dual-Model)
This is the core of the system—converting face images into mathematical embeddings that can be compared.
ArcFace Implementation
ArcFace provides superior discrimination between different faces:
FaceNet Implementation
FaceNet provides secondary signals and neural activation visualizations:
Neural Network Activations
FaceNet also extracts intermediate layer activations for visualization:
Activation layers extracted: | Layer | Output Shape | Purpose | |-------|-------------|---------| | conv1 | (64, 112, 112) | First convolutions | | bn1 | (64, 112, 112) | First batch norm | | layer1 | (64, 56, 56) | Low-level features | | layer2 | (128, 28, 28) | Mid-level features | | layer3 | (256, 14, 14) | High-level features | | layer4 | (512, 7, 7) | Final features | | embedding | (128,) | Final embedding |

The Compare Endpoint (Complete Flow)
Here's the full comparison flow from the /api/compare endpoint:
Dual-Model Scoring
The scoring combines multiple signals with learned weights:
Confidence Bands
Rather than binary decisions, the system outputs confidence bands:
API Endpoints Reference
The Flask API exposes 31 endpoints for all operations:
| Endpoint | Method | Purpose |
|----------|--------|---------|
| /api/health | GET | System health check |
| /api/embedding-info | GET | Current model info |
| /api/diagnostics | GET | System diagnostics |
| /api/detect | POST | Face detection |
| /api/extract | POST | Embedding extraction |
| /api/add-reference | POST | Add reference image |
| /api/references | GET | List all references |
| /api/references/<id> | DELETE | Remove reference |
| /api/compare | POST | Compare embeddings |
| /api/visualizations/<type> | GET | Get visualization |
| /api/clear | POST | Clear session |
Visualizations (14 Types)
The system provides 14 different AI visualizations to help investigators understand why scores were computed:
Visualization Implementation Example

Session State Management
The API maintains in-memory session state:
Persistence
References are saved to JSON for persistence across restarts:
Testing Infrastructure
The system includes comprehensive tests:
Running Tests
File Structure
Key Design Decisions
1. Dual-Model Architecture
Using both ArcFace (512-dim) and FaceNet (128-dim) together provides better discrimination than either alone. ArcFace handles the primary matching while FaceNet provides secondary signals and activation visualizations.
2. Confidence Bands, Not Binary Decisions
The system outputs confidence bands (Very High/High/Moderate/Insufficient) instead of "match/no-match". This ensures human investigators always make the final decision.
3. Local-Only Processing
No images are sent to external servers. All computation happens on the user's machine, addressing NGO privacy concerns.
4. Non-Reversible Embeddings
Facial embeddings cannot be used to reconstruct the original face—providing an additional layer of privacy protection.
5. Consent Tracking
Every reference image includes metadata about consent status, source, and purpose—essential for NGO documentation requirements.
Summary
MANTAX is a fully functional ethical facial recognition system built with:
- Flask API (2,131 lines) with 31 endpoints
- Dual-model embedding (ArcFace 512-dim + FaceNet 128-dim)
- OpenCV DNN face detection with MediaPipe landmarks
- Electron desktop app with macOS Tahoe Liquid Glass UI
- Comprehensive testing (E2E, edge cases, frontend)
The system is designed for NGO use cases with:
- Local-only processing (no cloud)
- Human-in-the-loop verification
- Consent tracking
- Confidence bands instead of binary decisions
In the next blogpost, we'll explore the JavaScript refactoring journey—how we tackled a 3,429-line monolithic app.js and broke it into 7 modular files following best practices.
Demo Video
Here's a demo showing a no-match scenario:
Next: The JavaScript Refactoring Story