Skip to main content

// Senior Data Scientist · AI Researcher

Building AI systems that ship

I'm Arnol P S: Computer Vision, Deep Learning, and Vision-Language Models in production.

Arnol P S

// 01 · about

About me

I'm a Senior Data Science Specialist with 5+ years of experience designing and deploying production-grade AI/ML systems across Computer Vision, NLP, and Generative AI.

I build multi-agent architectures, hybrid RAG pipelines fusing vector and graph databases, and real-time conversational AI platforms. My work spans the full stack, from semantic chunking and embedding pipelines to LLM guardrails and production observability.

I've led patent-pending research on biometric cattle identification using DINOv2 feature extraction and wavelet-based ridge analysis, validated across 197 animals with 93.5% top-5 precision and open-set unknown detection.

Shout-Out Award
ISO 42001:2023 Recognition

Received Shout-Out award for contributions to AI Management System documentation and preparedness for the ISO 42001:2023 surveillance audit.

// 02 · experience

Where I've built

Six roles, from NLP research fellow to senior data scientist

Aug 2025 – Present Current Reflections Info Systems Pvt. Ltd.

Senior Engineer - Data Science

Reflections Info Systems Pvt. Ltd.

Leading R&D on multi-agent systems, hybrid RAG, and real-time conversational AI for enterprise clients.

  • Architected multi-agent log analytics with an event-driven pipeline for semantic error grouping and anomaly detection
  • Designed hybrid RAG system fusing Qdrant vector search, Neo4j graph traversal, and Reciprocal Rank Fusion ranking
  • Implemented 5-layer security pipeline with Llama Guard content safety, PII sanitization, and prompt injection prevention
  • Architected real-time WebSocket streaming with buffer-then-sanitize pattern for secure LLM output delivery
  • Built Vision-Language Model pipelines for multi-format document extraction with provider failover
  • Built agentic outreach platform with MCP server architecture, multi-phase LLM workflows, and template-driven document generation
  • Contributed to ISO 42001:2023 AI Management System surveillance audit - received Shout-Out award for documentation quality
Dec 2024 – Jul 2025 Digital University Kerala

Senior Data Scientist - Consultant

Digital University Kerala

Led computer vision and NLP research, including the patent-pending biometric identification system.

  • Led patent-pending cattle muzzle identification research using DINOv2, later validated across 197 animals (93.5% top-5 precision)
  • Built a wavelet-based ridge extraction pipeline adapting fingerprint techniques to muzzle patterns
  • Built semantic document search engine with vector databases
Jun 2024 – Nov 2024 Techversant Infotech

Senior Software Engineer - AI/ML

Techversant Infotech

Built and shipped AI/ML features for enterprise products.

  • Built RAG applications with memory for contextual conversations
  • Developed face recognition systems with SOTA deep learning models
  • Designed AI-powered proctoring tools using YOLO
Sep 2023 – Jun 2024 Digital University Kerala

Senior Engineer - Data Science

Digital University Kerala

Led development of ML-based search infrastructure and data processing systems.

  • Engineered ETL pipelines for document extraction and Elasticsearch indexing
  • Led team of 3 in developing semantic search infrastructure
  • Created backend for 'Fun With AI' at Global Science Fest Kerala
Jun 2021 – Aug 2023 Digital University Kerala

Data Analyst

Digital University Kerala

Database optimization and analytics pipeline development.

  • Created automated data pipelines reducing processing time
  • Developed interactive data visualizations for reporting
Sep 2019 – Sep 2020 ICFOSS

Research Fellow

ICFOSS

NLP research for Malayalam language processing.

  • Developed Morphological Analyzer for Malayalam
  • Built sentiment analysis systems for Indian languages
  • Conducted research on YouTube comment data

// 03 · research

Patent-pending research

Cattle identification from muzzle ridge patterns, no tags required

Patent Pending

Biometric Cattle Identification System

Computer vision system for individual cattle identification using muzzle patterns as biometric markers, analogous to human fingerprint recognition. Validated across 197 animals and built out into a full enrolment and field-identification platform on FastAPI, Next.js, PostgreSQL, Neo4j lineage, and Qdrant.

Problem

RFID-based livestock identification is susceptible to tampering, loss, and requires time-consuming manual verification.

Solution

Read the identity straight from the muzzle. A photo replaces the tag, so there is nothing to lose and nothing to tamper with.

PyTorch YOLO26m ONNX DINOv2 ViT-L/14 PyWavelets OpenCV Qdrant Neo4j FastAPI

Technical Pipeline

1
Detection - Gamma correction + YOLO26m muzzle crop
2
DINOv2 Embeddings - ViT-L/14, 1024-dim feature vectors
3
Vector Search - Qdrant cosine, best vector per identity
4
Open-Set Decision - Margin rule accepts or flags unknown
5
Ridge Records - Wavelet + skeleton visualisations at enrolment
93.5%
Precision@5
87.5%
Precision@1
197
Cattle Identities
1,312
Enrolment Images

Key Innovation

Ablation-driven design: multi-vector enrolment with an open-set margin decision rule (AUROC 0.879) that flags unknown animals instead of forcing a match, paired with a wavelet ridge-extraction chain (BayesShrink denoising, biorthogonal wavelets, skeletonization) adapting fingerprint techniques for biometric record keeping.

Research conducted at Digital University Kerala, built on a 27-paper literature review. I created the dataset, trained the models, and designed the end-to-end system myself.

// 04 · projects

Industry projects

Production systems across agentic AI, RAG, document AI, and voice

PRISM - On-Prem Enterprise AI Platform

Self-hostable enterprise AI platform orchestrating LLM tool-calling across a federation of MCP servers, backed by a fully self-hosted inference stack.

  • MCP-federated tool orchestration with per-tool timeouts and RBAC context propagation
  • Document RAG with Docling chunking, bge-m3 embeddings, Qdrant, and cross-encoder reranking
  • Self-hosted OpenAI-compatible inference (vLLM, ASR, TTS) for air-gapped deployment
MCP vLLM Qdrant Keycloak Docling

Hybrid RAG Sales Intelligence

Proposal search assistant fusing vector search, graph traversal, and Reciprocal Rank Fusion behind a layered security pipeline.

  • 3-path query routing: metadata (Cypher) / content (semantic) / general
  • Weighted Reciprocal Rank Fusion across graph, vector, and keyword retrieval
  • Cross-encoder reranking with citation and groundedness checks
Qdrant Neo4j Llama Guard WebSocket Celery

Document Verification (QFMA)

RegTech assistant verifying governance reports against regulatory articles with confidence-based LLM validation.

  • LangGraph pipeline for section extraction and per-article verification
  • Hybrid Qdrant RAG with selective LLM validation
  • Bilingual (EN/AR) reporting with PDF and Excel export
LangGraph Qdrant OpenRouter FastAPI

Vision-Based Document Extraction

Multi-format document-to-JSON pipeline combining OCR, vision-language models, and local OCR inference.

  • Docling and Tesseract OCR with page-level routing to vision-LLMs
  • Local DeepSeek-OCR inference with a quality-based fallback chain
  • Multi-provider routing with priority-based failover
Docling GPT-4 Vision Gemini DeepSeek-OCR FastAPI

AI-Powered Log Analytics Platform

Enterprise log analytics with multi-agent orchestration, semantic error grouping, and anomaly detection.

  • Event-driven pipeline (Triage, Mapper, Analysis, Notifier)
  • Semantic error grouping and anomaly scoring
  • Decision-tree workflow routing by severity
Multi-Agent Qdrant Redis Pub/Sub Langfuse Presidio

AI-Powered Outreach Automation

Agentic LLM platform orchestrating multi-phase target discovery, data enrichment, fitness scoring, and branded document generation via MCP server architecture.

  • 5-phase agentic workflow: discovery, enrichment, scoring, document generation, outreach
  • Custom MCP server (~11 tools) with response trimming and self-correction
  • Template-driven document pipeline (PPTX, DOCX, XLSX)
Claude MCP Pydantic python-pptx openpyxl

Agentic AI for Roadside Assistance

Conversational AI platform with real-time voice synthesis, sentiment analysis, and intelligent technician routing.

  • Multi-model LLM pipeline for real-time analysis
  • WebRTC-based live transcription with low latency
  • Real-time sentiment tracking and escalation
ElevenLabs GPT-4o Claude WebSocket

AI-Powered Debt Collection

Voice AI platform with multi-model analysis pipeline for automated loan recovery conversations.

  • Low-latency voice synthesis
  • Automated promise extraction from calls
  • Stage-specific conversation strategies
ElevenLabs Silero VAD ONNX WebRTC

Production RAG System

Semantic search with two-stage retrieval, cross-encoder reranking, and content guardrails for specialized domains.

  • High-accuracy query classification
  • Multiple configurable chunking strategies
  • Bilingual support with real-time SSE streaming
Sentence-Transformers Qdrant Cross-Encoders FastAPI

// 05 · skills

What I work with

The stack I use daily, grouped by depth

Focus Areas

Specialization
Agentic AI / Multi-Agent Systems GraphRAG & Hybrid RAG Document AI (IDP) Vision-Language Models Knowledge Graphs On-Prem LLM Serving

Machine Learning & AI

Expert
PyTorch Transformers Scikit-learn ONNX Runtime LoRA/QLoRA vLLM

Large Language Models

Expert
Claude Gemini Qwen LangChain LangGraph MCP OpenRouter

Computer Vision

Expert
YOLO DINOv2 Vision Transformers OpenCV Wavelet Analysis Docling Tesseract OCR

NLP & RAG

Expert
RAG Systems Semantic Search Cross-Encoders Sentence-Transformers spaCy RRF

Databases & Search

Advanced
Qdrant Neo4j Elasticsearch PostgreSQL Redis

AI Safety & Observability

Advanced
Llama Guard Presidio Langfuse Prometheus/Grafana RAGAS LLM Guardrails PII Detection

Backend & APIs

Advanced
FastAPI Pydantic Python AsyncIO WebSockets Celery

Voice & Conversational AI

Advanced
ElevenLabs WebRTC Silero VAD Real-time Streaming

Cloud & Infrastructure

Proficient
Docker NVIDIA CUDA Keycloak GPU Computing

Programming Languages

Python
Expert
TypeScript
Advanced
JavaScript
Advanced
SQL
Advanced

// 06 · education

Education & credentials

Academic background and professional certifications

Education

Master of Science in Computer Science

Data Analytics

Indian Institute of Information Technology and Management - Kerala (IIITM-K)

Cochin University of Science and Technology

2017 - 2020 First Class, CGPA 7.60

Bachelor of Science

Computer Science, Mathematics, Statistics

Kristu Jayanti College (Autonomous), Bengaluru

Bangalore University

2014 - 2017 First Class

Certifications

Google Data Analytics Professional Certificate

Google

Building Real-Time Video AI Applications

NVIDIA Deep Learning Institute

Getting Started with Deep Learning

NVIDIA Deep Learning Institute

Spoken Languages

EN
English
Professional
ML
Malayalam
Native
SV
Swedish
Beginner

// 07 · contact

Let's build something

Open to collaborations, research conversations, and interesting problems in AI/ML. The fastest way to reach me is email.