Hi! I'm Sneha, a recent grad from Columbia's MS in Data Science. I've worked on AI for code at IBM, manufacturing analytics at NXP, and multimodal RAG and healthcare NLP at Columbia. I'm involved in pre- and post-training, eval, and vertical agents. I studied high-performance ML, deep learning for NLP, and agentic systems. I'm interested in AI embedded in existing systems as reasoning engineers trust, not as a replacement layer. I care about observability over agent behavior, organizational memory, and tools that augment rather than just automate.

I'd love to connect if you're in the field or want to collaborate. Get in touch

Education

2024—2025

Columbia University

M.S. in Data Science

Relevant: Statistical Inference, Applied ML, Deep Learning, NLP.
Graduate TA: Data Analysis, Databases for Business Analytics, Python.

2020—2024

SRM University

B.Tech. in Computer Science and Engineering

Relevant: Data Structures, Operating Systems, Cloud Computing, AI, Databases.

Experience

Sep 2025 — Jan 2026

IBM - Data Scientist

Built code generation system using graph-based representations for structural understanding. Designed evaluation framework with AST analysis and semantic verification.

May 2025 — Dec 2025

NXP Semiconductors - Data Science Intern

Built ML defect triage integrating screening and test data. Automated root-cause analytics clustering manufacturing logs for failure pattern detection.

Jan 2025 — May 2025

Columbia Business School - Graduate Research Assistant

Built multimodal ESG compliance system combining vision models, OCR, and semantic retrieval. Enhanced table reasoning for analyst workflows through adaptation.

May 2024 — Aug 2024

Metropolis Healthcare - Data Science Intern

Engineered clinical NLP transforming diagnostic reports into patient language. Deployed HIPAA-compliant AWS infrastructure processing reports and enabling product analytics.

Recent Projects

Chronos: Autonomous Email Agent

Chronos: Autonomous Email Agent

Llama 3DPOFastAPIGmail APIChrome Extension

AI email assistant that adapts to individual writing styles through direct preference optimization. Production system handling drafting, scheduling, and follow-ups via FastAPI and Chrome extension.

Innovation Screener

Innovation Screener

Next.js 14TypeScriptGoogle Gemini APIVercel

AI-powered validation framework for startup concepts. Evaluates innovation potential, technical feasibility, market readiness, and risk through structured multi-factor analysis. Built as Columbia capstone.

Migration & Refugee Populations Analysis

Migration & Refugee Populations Analysis

RQuartoggplot2Data visualization

Statistical visualization exploring global migration and refugee patterns across time and geography. Advanced plotting techniques: choropleths, ridgeline distributions, temporal heatmaps, built with R, ggplot2, and Quarto.

Neural Code Search Engine

Neural Code Search Engine

SPLADELightGCNElasticsearchFAISS

Semantic code retrieval system using neural sparse embeddings and graph-based re-ranking. Processes 2M code snippets across 50 repositories with sub-second query latency, transforming code discovery workflows.

Doc-Query

Doc-Query

PythonOllamaRAGStreamlit

Privacy-preserving document intelligence system using retrieval-augmented generation. Runs entirely on local infrastructure with open-source models, enabling secure question-answering over sensitive documents without external API dependencies.

Creatives