🏆 Ranked 3rd out of 47 international teams at SemEval-2026 Task 4, ACL 2026, San Diego  ·  View Publications →
Junior Research Fellow NLP · GenAI · CV · IKS

Gaurav
Kumar

AI Researcher  ·  NLP  ·  Knowledge Graphs  ·  Generative AI  ·  Indian Knowledge Systems

Natural Language Processing Knowledge Graphs Generative AI Computer Vision Indian Knowledge Systems Multimodal AI

Early-career AI researcher with peer-reviewed publications and international competition recognition, currently serving as Junior Research Fellow at Swami Rama Himalayan University, Dehradun. My work bridges cutting-edge AI with interdisciplinary domains — from NLP and Knowledge Graphs to Ayurvedic diagnosis and Indian Knowledge Systems.

GK
Swami Rama Himalayan University AI Centre for Research
Dehradun, India
gauravkumarsony08@gmail.com
3rd

SemEval-2026 Task 4 · ACL 2026, San Diego

Ranked 3rd out of 47 international teams in Narrative Story Similarity — approaching the human annotator ceiling of 78% with a Hybrid Embedding and LLM Ensemble. Also served as peer reviewer for 2 workshop papers.

Research Interests

NLP & Semantic Evaluation

Narrative similarity, semantic reasoning, entity recognition, language modeling, and evaluation methodology. Active SemEval participant with ACL publication.

Knowledge Graphs & Archival AI

LLM-driven knowledge graph construction from unstructured documents. Source-linked evidence retrieval, interactive graph visualization, historical archives.

Generative AI & RAG Systems

Retrieval-augmented generation for domain-specific knowledge bases. Hallucination mitigation via local LLM benchmarking and iterative output verification.

Computer Vision & Biometrics

On-device image classification, contactless fingerprint matching, real-time liveness detection, and mobile-first biometric systems (UIDAI Open Innovation).

AI × Indian Knowledge Systems

AI integrated with Ayurveda, Vedic traditions, and consciousness studies. OCR for Sanskrit/Vedic texts, tongue diagnosis, spiritual well-being — IIT Mandi MBCC papers.

Clinical & Health AI

AI-based health insurance claims adjudication, clinical document classification, and structured medical record extraction (AB PM-JAY, NHA, IndiaAI, IISc).

3Publications
3rdSemEval Rank
47Competing Teams
2IIT Mandi Papers
5+Research Projects
2JRF Positions

Academic Timeline

Jun 2025 – Mar 2026

Junior Research Fellow

CAIR · Dev Sanskriti Vishwavidyalaya, Haridwar

HistoAI full-stack research platform (RAG, Knowledge Graphs, OCR), Ayur Scan tongue diagnosis, MBCC 2025 IIT Mandi presentation.

Jan 2025 – May 2025

Research Intern

CAIR · Dev Sanskriti Vishwavidyalaya, Haridwar

LLM-powered historical data extraction, mobile tongue diagnosis models, OCR for Sanskrit/Vedic texts, full-stack tools in React, Flask, MongoDB.

2023 – 2025

MCA in Data Science

Dev Sanskriti Vishwavidyalaya, Uttarakhand · CGPA: 8.30

Graduate research focus in AI, NLP, and Data Science.

Publications

Peer-reviewed research in NLP, Generative AI, Multimodal AI, and AI for Indian Knowledge Systems.

SemEval-2026 · ACL 2026 Publication in Progress 🏆 3rd / 47 Teams 2026

AI-Monitors at SemEval-2026 Task 4: A Hybrid Embedding and LLM Ensemble for Narrative Similarity

Gaurav Kumar et al.  (Team: AI-Monitors)

20th International Workshop on Semantic Evaluation (SemEval-2026), ACL 2026, San Diego, USA

Designed and evaluated structured prompting strategies for Large Language Models to solve narrative similarity reasoning tasks. Built a hybrid ensemble of sentence embedding and LLM-based models, selecting components by error diversity rather than accuracy alone, achieving 75% test accuracy — approaching the human annotator ceiling of 78%. Team AI-Monitors was ranked 3rd out of 47 international teams at SemEval-2026 Task 4. Also served as peer reviewer for 2 workshop papers in narrative understanding and NLP research, contributing to the academic review process.
Prompt EngineeringLLM EvaluationEnsemble MethodsSentence TransformersNarrative Reasoning
MBCC 2026 · IIT Mandi Accepted 2026

The Spiritual Well-being Index (SWBI): A Multimodal AI Framework for Assessing and Supporting Spiritual Growth

Gaurav Kumar et al.

3rd International Mind, Brain & Consciousness Conference (MBCC 2026), IIT Mandi — Accepted

Presents the Spiritual Well-being Index (SWBI), a multimodal AI framework designed to assess and support spiritual growth through quantifiable indicators. This interdisciplinary work integrates AI methodologies with Indian Knowledge Systems, employing multimodal data sources to construct a measurable, AI-assisted model of spiritual well-being — presented at the 3rd MBCC at IIT Mandi.
Multimodal AIIndian Knowledge SystemsConsciousness Studies
MBCC 2025 · IIT Mandi Publication in Progress 2025

AI Tongue Diagnosis: Expert System with LLMs for Abdominal Disease Detection through Mobile App

Gaurav Kumar et al.

Presented at 2nd International Mind, Brain & Consciousness Conference (MBCC 2025), IIT Mandi

An expert AI system for Ayurvedic tongue-based abdominal disease diagnosis via a Flutter mobile application. Enables Ayurvedic therapists to collect structured patient data and tongue imagery. Integrates an on-device TFLite binary classifier for image quality validation at capture time, and an LLM-powered expert system for diagnostic support. Demonstrates a full pipeline from mobile data collection to AI-assisted Ayurvedic diagnosis.
LLMsMobile AITFLiteAyurveda AIExpert Systems

Peer Reviewer

SemEval-2026 Workshop Papers (2 papers) · ACL 2026, San Diego — Narrative Understanding and NLP Research

Research Projects

Research systems and applied AI projects spanning NLP, Knowledge Graphs, Computer Vision, and mobile AI.

AI-Monitors: Narrative Similarity

SemEval-2026 Task 4 · ACL 2026

🏆 3rd / 47

Hybrid ensemble of sentence embeddings and LLM models for narrative similarity reasoning. Components selected by error diversity over accuracy alone, achieving 75% accuracy near the human ceiling of 78%.

→ Published at SemEval-2026, ACL 2026, San Diego

LLMsPrompt EngineeringSentence TransformersEnsemble

HistoAI — Historical Data Extraction

Full-Stack Research Platform · Developer Lead

RBAC web platform unifying four AI modules (OCR, RAG Chatbot, Knowledge Graph, Data Extraction) under JWT authentication. Features metadata-driven deduplication, Celery+Redis async processing, WebSocket real-time updates, and source-linked output verification.

→ Developer lead — Git strategy, PRs, agile sprint delivery

ReactFlaskMongoDBRAGKnowledge GraphsOCRRedis

Knowledge Graph Generation

LLMs · NLP · Information Retrieval

Pipeline for knowledge graph construction from unstructured documents using locally hosted LLMs for entity recognition and relationship extraction. Interactive graph visualization with every node linked directly to the originating PDF page for source verification.

OllamaLangChainPyvisNetworkXStreamlit

Ayur Scan — Tongue Diagnosis AI

Mobile App · Ayurvedic AI · Research Tool

Published

Flutter Android app for structured patient data collection and tongue imagery for Ayurvedic diagnosis research. On-device TFLite binary classifier validates tongue image quality at capture. Primary data tool for MBCC 2025, IIT Mandi paper.

→ Linked to MBCC 2025, IIT Mandi publication

FlutterFirebaseTFLiteMongoDB

Contactless Fingerprint Matching

UIDAI Open Innovation Proposal

Dual biometric system: contactless fingerprint capture with database matching and confidence scoring; real-time liveness detection classifying face input as real or spoofed. Full mobile-side development integrating Flask API endpoints.

React NativeFace RecognitionFlaskComputer Vision

Astrowala.world — Mobile LMS

Vedic Astrology Learning Platform

React Native Android application for Vedic astrology and spiritual learning. Implements live class discovery, course access management, and expert astrologer interaction. Developed as part of IKS+AI research initiative at SRHU.

React NativeVedic AIIKSMobile LMS

AB PM-JAY Auto-Adjudication

NHA · IndiaAI Mission · IISc Bengaluru

AI-based health insurance claims adjudication system for the NHA IndiaAI Mission Hackathon. Clinical document classification and structured information extraction from medical records using NLP and document AI techniques.

Document ClassificationInformation ExtractionNLPHealthcare AI

About

I am a Junior Research Fellow at the AI Centre for Research, Swami Rama Himalayan University, Dehradun, where I conduct applied research in Machine Learning, Natural Language Processing, and interdisciplinary AI domains bridging modern technology with Indian Knowledge Systems.

My research spans NLP (narrative similarity, entity recognition, language modeling), Knowledge Graphs, Generative AI (RAG, LLMs, prompt engineering), Computer Vision, and interdisciplinary AI applications in Ayurveda, healthcare, and spiritual well-being assessment. I am drawn to problems that sit at disciplinary boundaries — where AI methods illuminate non-computational domains.

In 2026, I co-authored a system ranked 3rd out of 47 international teams at SemEval-2026 Task 4 (ACL 2026, San Diego), approaching the human annotator ceiling of 78% accuracy. I have published and presented research at IIT Mandi's Mind, Brain & Consciousness Conference and served as a peer reviewer for NLP workshop papers.

"Seeking opportunities at the intersection of rigorous AI research, interdisciplinary collaboration, and academic instruction — toward a PhD and contributions to AI for social good."


Master of Computer Applications (Data Science)

2023 – 2025

Dev Sanskriti Vishwavidyalaya, Uttarakhand

CGPA: 8.30 / 10

Bachelor of Science in Information Technology

2019 – 2022

Patliputra University, Patna

70%

Junior Research Fellow

AI Centre for Research

Swami Rama Himalayan University
Dehradun · Apr 2026 – Present

  • Introduction to Data Science — Coursera
  • Introduction to Generative AI — Google Cloud
  • Git, GitHub & Markdown — Udemy
  • Big Data Certification — Udemy
  • Data Analytics with Tableau — Jobaaj

Teaching & Mentorship

Seeking Lecturer or Teaching Assistant positions in AI, Machine Learning, and Computer Science at the undergraduate and postgraduate level.

My approach to teaching is grounded in applied understanding over abstract memorization. Having come from research that spans NLP, computer vision, mobile AI, and interdisciplinary domains, I believe the best AI education bridges theory with real-world systems — showing students not just how algorithms work, but why they are designed the way they are and where they fall short.

UG · PG

Introduction to AI & Machine Learning

Supervised/unsupervised learning, evaluation, model selection, applied ML with Python.

Pythonscikit-learnPractical ML
PG

Natural Language Processing

Embeddings, transformers, prompt engineering, evaluation (SemEval-style), and LLM applications. Grounded in active research experience.

TransformersLLMsHuggingFace
UG · PG

Deep Learning & Neural Networks

CNNs, RNNs, transformers, on-device deployment (TFLite), and applied vision systems.

TensorFlowTFLiteCNN
UG

Python Programming & Data Structures

Foundational programming, algorithm design, data structures, and problem-solving.

PythonAlgorithmsDSA
PG

Knowledge Representation & Information Retrieval

Knowledge graphs, semantic search, RAG systems, entity-relation extraction, graph-based IR. Research-backed curriculum.

Knowledge GraphsRAGLangChain
PG · Research

Research Methods in AI & CS

Scientific writing, experiment design, statistical evaluation, shared task participation, and publication workflow — informed by active publication experience.

Research DesignLaTeXAcademic Writing

Curriculum Vitae

Request PDF → Contact

Research Positions

Junior Research Fellow

AI Centre for Research · Swami Rama Himalayan University, Dehradun
  • Applied research in ML, NLP, and interdisciplinary AI (IKS).
  • React Native Android app for Astrowala.world Vedic astrology LMS.
  • AB PM-JAY Auto-Adjudication Hackathon — NHA, IndiaAI Mission, IISc.
  • Paper submitted to 3rd MBCC 2026, IIT Mandi on AI × Spirituality × IKS.

Junior Research Fellow

CAIR · Dev Sanskriti Vishwavidyalaya, Haridwar
  • Developer lead on HistoAI — full-stack platform integrating RAG, Knowledge Graphs, OCR under RBAC.
  • Ayur Scan Flutter app for Ayurvedic tongue diagnosis research.
  • Presented at MBCC 2025, IIT Mandi; International Conference on Faith & Future, DSV Haridwar.

Research Intern

CAIR · Dev Sanskriti Vishwavidyalaya, Haridwar
  • LLM-powered structured data extraction from historical books.
  • Mobile tongue diagnosis model research; OCR for Sanskrit/Vedic texts.

Project Intern

CAIR · Dev Sanskriti Vishwavidyalaya, Haridwar
  • NLP-based projects, Hindi transcription, indigenous language modeling.

Education

Master of Computer Applications (Data Science)

Dev Sanskriti Vishwavidyalaya, Uttarakhand

CGPA: 8.30 / 10.0

Bachelor of Science in Information Technology

Patliputra University, Patna

70%

Publications

AI-Monitors at SemEval-2026 Task 4: A Hybrid Embedding and LLM Ensemble for Narrative Similarity

SemEval-2026, ACL 2026, San Diego, USA

In Progress · 🏆 3rd / 47 International Teams

The Spiritual Well-being Index (SWBI): A Multimodal AI Framework for Assessing and Supporting Spiritual Growth

3rd MBCC 2026, IIT Mandi

Accepted

AI Tongue Diagnosis: Expert System with LLMs for Abdominal Disease Detection through Mobile App

2nd MBCC 2025, IIT Mandi

Presented · Publication in Progress

Technical Skills

Programming

PythonJavaScriptDart

AI / ML

LLMsPrompt EngineeringRAGOllamaLangChainTFLiteSentence TransformersEnsemble Methods

NLP & CV

OpenCVTesseract OCREntity RecognitionKnowledge Graphs

Web

React.jsFlaskMongoDBREST APIWebSocketJWT

Mobile

FlutterReact NativeFirebase

DevOps & Tools

RedisCeleryDockerGitHub ActionsGitLaTeX