Tri-Netra

Securing Voice and Multimodal AI Agents Against
Deepfakes and Prompt Injection

Real-time multimodal threat detection across audio, text, and visual inputs. Powered by LCNN, Transformer, CLIP, and advanced prompt injection analysis.

🔍 Analyze Input 🛡️ Shield Chat

🎙️

Audio Path

LCNN deepfake detector + Transformer classifier + GE2E voice clone checker. Catches synthetic speech, voice cloning, and audio spoofing.

30% weight

📝

Prompt Path

Three-layer defense: keyword matching, regex pattern scanning, and structural analysis for role override and injection detection.

25% weight

👁️

Vision Path

EasyOCR text extraction, CLIP image-text mismatch detection, and ResNet amplification for embedded visual injection attacks.

30% weight

🔗

Cross-Modal Fusion

Correlates signals across all paths for holistic threat assessment. Weighted fusion produces a final risk score with explainable breakdown.

15% weight

Threat Decision Pipeline

< 0.30

PASS

Allow to agent — safe input

→

0.30 – 0.50

FLAG

Hold for human review

→

> 0.50

BLOCK

Log + discard input