// Senior Data Scientist · AI Researcher
Building AI systems
that ship
// 01 · about
About me
I'm a Senior Data Science Specialist with 5+ years of experience designing and deploying production-grade AI/ML systems across Computer Vision, NLP, and Generative AI.
I build multi-agent architectures, hybrid RAG pipelines fusing vector and graph databases, and real-time conversational AI platforms. My work spans the full stack, from semantic chunking and embedding pipelines to LLM guardrails and production observability.
I've led patent-pending research on biometric cattle identification using DINOv2 feature extraction and wavelet-based ridge analysis, validated across 197 animals with 93.5% top-5 precision and open-set unknown detection.
Received Shout-Out award for contributions to AI Management System documentation and preparedness for the ISO 42001:2023 surveillance audit.
// 02 · experience
Where I've built
Six roles, from NLP research fellow to senior data scientist
Senior Engineer - Data Science
Reflections Info Systems Pvt. Ltd.
Leading R&D on multi-agent systems, hybrid RAG, and real-time conversational AI for enterprise clients.
- Architected multi-agent log analytics with an event-driven pipeline for semantic error grouping and anomaly detection
- Designed hybrid RAG system fusing Qdrant vector search, Neo4j graph traversal, and Reciprocal Rank Fusion ranking
- Implemented 5-layer security pipeline with Llama Guard content safety, PII sanitization, and prompt injection prevention
- Architected real-time WebSocket streaming with buffer-then-sanitize pattern for secure LLM output delivery
- Built Vision-Language Model pipelines for multi-format document extraction with provider failover
- Built agentic outreach platform with MCP server architecture, multi-phase LLM workflows, and template-driven document generation
- Contributed to ISO 42001:2023 AI Management System surveillance audit - received Shout-Out award for documentation quality
Senior Data Scientist - Consultant
Digital University Kerala
Led computer vision and NLP research, including the patent-pending biometric identification system.
- Led patent-pending cattle muzzle identification research using DINOv2, later validated across 197 animals (93.5% top-5 precision)
- Built a wavelet-based ridge extraction pipeline adapting fingerprint techniques to muzzle patterns
- Built semantic document search engine with vector databases
Senior Software Engineer - AI/ML
Techversant Infotech
Built and shipped AI/ML features for enterprise products.
- Built RAG applications with memory for contextual conversations
- Developed face recognition systems with SOTA deep learning models
- Designed AI-powered proctoring tools using YOLO
Senior Engineer - Data Science
Digital University Kerala
Led development of ML-based search infrastructure and data processing systems.
- Engineered ETL pipelines for document extraction and Elasticsearch indexing
- Led team of 3 in developing semantic search infrastructure
- Created backend for 'Fun With AI' at Global Science Fest Kerala
Data Analyst
Digital University Kerala
Database optimization and analytics pipeline development.
- Created automated data pipelines reducing processing time
- Developed interactive data visualizations for reporting
Research Fellow
ICFOSS
NLP research for Malayalam language processing.
- Developed Morphological Analyzer for Malayalam
- Built sentiment analysis systems for Indian languages
- Conducted research on YouTube comment data
// 03 · research
Patent-pending research
Cattle identification from muzzle ridge patterns, no tags required
Biometric Cattle Identification System
Computer vision system for individual cattle identification using muzzle patterns as biometric markers, analogous to human fingerprint recognition. Validated across 197 animals and built out into a full enrolment and field-identification platform on FastAPI, Next.js, PostgreSQL, Neo4j lineage, and Qdrant.
Problem
RFID-based livestock identification is susceptible to tampering, loss, and requires time-consuming manual verification.
Solution
Read the identity straight from the muzzle. A photo replaces the tag, so there is nothing to lose and nothing to tamper with.
Technical Pipeline
Key Innovation
Ablation-driven design: multi-vector enrolment with an open-set margin decision rule (AUROC 0.879) that flags unknown animals instead of forcing a match, paired with a wavelet ridge-extraction chain (BayesShrink denoising, biorthogonal wavelets, skeletonization) adapting fingerprint techniques for biometric record keeping.
Research conducted at Digital University Kerala, built on a 27-paper literature review.
I created the dataset, trained the models, and designed the end-to-end system
myself.
// 04 · projects
Industry projects
Production systems across agentic AI, RAG, document AI, and voice
PRISM - On-Prem Enterprise AI Platform
Self-hostable enterprise AI platform orchestrating LLM tool-calling across a federation of MCP servers, backed by a fully self-hosted inference stack.
- MCP-federated tool orchestration with per-tool timeouts and RBAC context propagation
- Document RAG with Docling chunking, bge-m3 embeddings, Qdrant, and cross-encoder reranking
- Self-hosted OpenAI-compatible inference (vLLM, ASR, TTS) for air-gapped deployment
Hybrid RAG Sales Intelligence
Proposal search assistant fusing vector search, graph traversal, and Reciprocal Rank Fusion behind a layered security pipeline.
- 3-path query routing: metadata (Cypher) / content (semantic) / general
- Weighted Reciprocal Rank Fusion across graph, vector, and keyword retrieval
- Cross-encoder reranking with citation and groundedness checks
Document Verification (QFMA)
RegTech assistant verifying governance reports against regulatory articles with confidence-based LLM validation.
- LangGraph pipeline for section extraction and per-article verification
- Hybrid Qdrant RAG with selective LLM validation
- Bilingual (EN/AR) reporting with PDF and Excel export
Vision-Based Document Extraction
Multi-format document-to-JSON pipeline combining OCR, vision-language models, and local OCR inference.
- Docling and Tesseract OCR with page-level routing to vision-LLMs
- Local DeepSeek-OCR inference with a quality-based fallback chain
- Multi-provider routing with priority-based failover
AI-Powered Log Analytics Platform
Enterprise log analytics with multi-agent orchestration, semantic error grouping, and anomaly detection.
- Event-driven pipeline (Triage, Mapper, Analysis, Notifier)
- Semantic error grouping and anomaly scoring
- Decision-tree workflow routing by severity
AI-Powered Outreach Automation
Agentic LLM platform orchestrating multi-phase target discovery, data enrichment, fitness scoring, and branded document generation via MCP server architecture.
- 5-phase agentic workflow: discovery, enrichment, scoring, document generation, outreach
- Custom MCP server (~11 tools) with response trimming and self-correction
- Template-driven document pipeline (PPTX, DOCX, XLSX)
Agentic AI for Roadside Assistance
Conversational AI platform with real-time voice synthesis, sentiment analysis, and intelligent technician routing.
- Multi-model LLM pipeline for real-time analysis
- WebRTC-based live transcription with low latency
- Real-time sentiment tracking and escalation
AI-Powered Debt Collection
Voice AI platform with multi-model analysis pipeline for automated loan recovery conversations.
- Low-latency voice synthesis
- Automated promise extraction from calls
- Stage-specific conversation strategies
Production RAG System
Semantic search with two-stage retrieval, cross-encoder reranking, and content guardrails for specialized domains.
- High-accuracy query classification
- Multiple configurable chunking strategies
- Bilingual support with real-time SSE streaming
// 05 · skills
What I work with
The stack I use daily, grouped by depth
Focus Areas
Machine Learning & AI
Large Language Models
Computer Vision
NLP & RAG
Databases & Search
AI Safety & Observability
Backend & APIs
Voice & Conversational AI
Cloud & Infrastructure
Programming Languages
// 06 · education
Education & credentials
Academic background and professional certifications
Education
Master of Science in Computer Science
Data Analytics
Indian Institute of Information Technology and Management - Kerala (IIITM-K)
Cochin University of Science and Technology
Bachelor of Science
Computer Science, Mathematics, Statistics
Kristu Jayanti College (Autonomous), Bengaluru
Bangalore University
Certifications
Google Data Analytics Professional Certificate
Building Real-Time Video AI Applications
NVIDIA Deep Learning Institute
Getting Started with Deep Learning
NVIDIA Deep Learning Institute