Projects

Flagship ML/AI projects structured as case studies. Each entry describes the engineering problem, my approach, and measurable outcomes. Projects marked "In Progress" are actively being hardened with deeper evaluation, deployment polish, and expanded writeups. Evidence links are provided where available for direct reviewer verification.

Generative AI Journal Summarizer

Jul 2025 – Present Featured

Multi-provider LLM gateway with RAG pipeline and ReAct agentic layer for journal entry analysis. Embeds past entries with sentence-transformers + FAISS, retrieves longitudinal context to augment LLM prompts. Agent orchestrates 5 tools (search, sentiment, trends, reflect, suggest) in a multi-step reasoning loop. Supports 17+ model configs across Groq, HuggingFace, OpenAI, and Anthropic with BYOK token vault (AES-256).

Problem

Journal analysis tools treat each entry in isolation, losing insight from patterns over time. Cloud LLM APIs also differ in availability and cost, requiring flexible provider routing.

Approach

Built a FastAPI backend with a RAG pipeline (sentence-transformers all-MiniLM-L6-v2, FAISS cosine search, prompt augmentation), multi-provider LLM routing, a BYOK token vault, and a ReAct-style agentic layer built from Groq API primitives (no LangChain) with 5 tools and observable planning traces. Evaluated retrieval with a golden test set and agent accuracy with a 10-case benchmark.

Impact

RAG retrieval: 0.80 precision@3, 1.0 MRR. Agent eval: 90% pass rate, 0.92 tool recall, 0.77 precision, 4.8s avg latency. Working multi-provider gateway with live demo, provider diagnostics, and reproducible eval harness.

Retrieval: 0.80 precision@3, 1.0 MRR

Agent: 90% pass, 0.92 recall, 4.8s latency

17+ LLM model configs across 8 providers

AeroIntel — Real-Time Aviation Intelligence Dashboard

Apr 2026 Featured

A real-time anomaly detection pipeline over live ADS-B telemetry with no labeled training data. Fuses two commercial feeds, applies Kalman filtering to maintain state across sparse position updates, detects orbital and holding patterns with DBSCAN, and scores deviations with IsolationForest. Claude explains flagged aircraft in plain language. Deployed continuously on Fly.io with CI/CD.

Problem

Public ADS-B feeds expose raw aircraft positions but not interpretable signals about unusual behavior. An analyst can see thousands of points on a map, but not which aircraft deserve attention or why.

Approach

Built a FastAPI backend that ingests commercial and military telemetry, maintains in-memory aircraft state, and runs a sequential enrichment pipeline: Kalman filtering for state estimation, DBSCAN for pattern detection, and IsolationForest for anomaly scoring. The backend pushes GeoJSON to a Next.js + MapLibre frontend and exposes natural-language query, region summary, and anomaly explanation endpoints backed by Claude.

Impact

Built and deployed a full ML inference pipeline processing 11,000+ live aircraft per 60-second cycle, with Kalman state estimation, DBSCAN spatial pattern detection, and IsolationForest anomaly scoring. Captured 4 real Claude anomaly explanations from live flight data, including a US military helicopter (GRZLY71) executing a racetrack surveillance orbit. Implemented Fleet Analytics, on-demand feature vector display, and a collapsible Pattern Drill-Down. Fixed a state persistence bug that was silently discarding anomaly scores between non-scoring cycles. System runs continuously on Fly.io with CI/CD via GitHub Actions. Full technical write-up in CASE_STUDY.md.

Full ML pipeline: Kalman + DBSCAN + IsolationForest, no labeled data required

11,000+ aircraft processed per 60-second inference cycle

4 real anomaly explanations from live military flight data

LearnOnTheGo — Prompt/PDF to Audio Lectures

Jan 2025 – present

Full-stack AI application with a FastAPI backend that generates audio lecture content from user prompts or uploaded PDFs. Backend deployed on Railway; React frontend deployed on Vercel (auth-gated, frontend integration in progress).

Problem

Learners often have useful content but limited time to read. They need a way to transform raw notes and documents into listenable study material.

Approach

Built a modular backend pipeline (prompt/file ingestion → LLM summarization → TTS audio output) with a React frontend shell. Prioritized backend API completeness and evidence-linked documentation before frontend polish.

Impact

Backend API fully operational with health checks, lecture generation, and audio output endpoints. Frontend auth and layout deployed but end-to-end user flow still in progress.

FastAPI backend deployed on Railway

Prompt + PDF ingestion pipeline

AI-generated lecture content via TTS

SkillSwap — AI-Powered Skill Bartering Platform

May 2025 – Present Featured

Full-stack skill-sharing marketplace with AI-powered semantic matching. Users list skills they offer or seek, and an embedding-based backend surfaces the best trade partners using cosine similarity over 384-dim vectors.

Problem

Skill bartering platforms rely on keyword search and manual browsing, which misses relevant matches across categories (e.g., a React developer seeking woodworking and a carpenter wanting to learn coding).

Approach

Built a three-tier architecture: Next.js 14 frontend on Vercel, Supabase (PostgreSQL + pgvector + RLS) for data/auth, and a FastAPI AI backend on Railway running sentence-transformers (all-MiniLM-L6-v2). Skills are embedded on creation and matched via cosine similarity with cross-category explanations.

Impact

Deployed end-to-end with 23 routes, 10 skill categories, real-time messaging, trade proposals, ratings, and AI-powered match recommendations. Demonstrates full-stack ownership from auth flows to embedding-based matching.

384-dim semantic matching

23 routes, 10 categories

Three-tier architecture (Next.js + FastAPI + Supabase)

Production Inference Optimization Study

Apr 2026 Featured

Applied inference optimization techniques (ONNX export, INT8 quantization, adaptive batching) to SkillSwap's all-MiniLM-L6-v2 sentence transformer. Benchmarked cost-latency tradeoffs with p50/p95/p99 measurements across five deployment configurations. Fully tested with CI.

Problem

Embedding generation for SkillSwap's semantic skill-matching needed cheaper, faster inference on CPU without sacrificing match quality or requiring GPU hardware.

Approach

Exported the production sentence-transformer to ONNX via Hugging Face Optimum, applied INT8 dynamic quantization (88 MB → 23 MB, ~74% reduction), and swept batch sizes 1–64. Benchmarked p50/p95/p99 latency and throughput across PyTorch baseline, ONNX Runtime, and quantized ONNX configurations. Validated accuracy with cosine similarity against the PyTorch baseline.

Impact

ONNX + INT8 batch=32 achieved 2.9ms p50 latency and 602 req/s — a ~9× throughput improvement over PyTorch single inference. Model size reduced ~74%. Quantized embeddings maintained 0.962 mean cosine similarity vs. baseline. 17 passing tests with GitHub Actions CI.

~9× throughput (602 req/s)

74% model size reduction

2.9ms p50 latency

Generative Modeling Study (β-VAE)

Spring 2026 Featured

From-scratch β-VAE implementation in PyTorch for MNIST, with systematic β-sweep ablation (β = 0.1–10) studying the reconstruction–disentanglement tradeoff. 19 tests, CI, and exported evidence plots.

Problem

Standard VAEs balance reconstruction fidelity against latent space regularity via a single β hyperparameter, but the impact is hard to reason about without hands-on experimentation.

Approach

Built encoder/decoder with configurable latent dimensions, trained across 6 β values, and measured reconstruction loss, KL divergence, and sample quality. Exported latent space interpolations and reconstructions as evidence.

Impact

Clear demonstration of how β controls the reconstruction–disentanglement tradeoff. Low β (0.1) gives sharp reconstructions but tangled latents; high β (10) gives smooth latent space but blurry outputs. 19/19 tests passing.

6 β-sweep configurations

19 passing tests

Evidence PNGs exported

Parameter-Efficient Fine-Tuning Study (LoRA)

Apr 2026 Featured

Systematic LoRA rank ablation on GPT-2 (124M) for dialogue summarization. Ablates rank [2,4,8,16] x alpha [8,16,32], evaluates with ROUGE + BERTScore, and demonstrates diminishing returns beyond rank 8. CPU-only, fully reproducible.

Problem

Full fine-tuning of LLMs is expensive. LoRA reduces trainable parameters by 95%+, but how does rank affect quality? Practitioners need empirical guidance for choosing configurations.

Approach

Applied LoRA to GPT-2 attention layers across 5 rank/alpha configurations on SAMSum (500 train examples). Evaluated each with ROUGE-1/2/L and BERTScore. Visualized training curves, ablation heatmaps, and before/after comparisons.

Impact

Rank 8 with alpha 16-32 captures most fine-tuning benefit at 0.65% of model parameters. Even rank-2 (202K params) substantially beats zero-shot. Methodology is model-size-agnostic and applies directly to Llama, Mistral, and other transformers.

5 LoRA configurations ablated

12 passing tests

0.65% trainable params (rank 8)

RL Environment + Q-Learning Study

Spring 2026 Featured

Custom Windy Gridworld environment with tabular Q-learning and DQN agents, implemented from scratch using Gymnasium and PyTorch. 24 tests with CI.

Problem

Understanding RL requires building environments and agents from scratch — not just calling library APIs. How do tabular and deep RL methods compare on the same task?

Approach

Implemented WindyGridWorld (Gymnasium API), tabular ε-greedy Q-learning, and a DQN with experience replay and target network. Systematic comparison of convergence, sample efficiency, and solution quality.

Impact

Tabular Q-learning converges fast on small state spaces; DQN generalizes but needs replay buffer tuning. Full environment/agent separation following Gymnasium patterns. 24/24 tests passing.

24 passing tests

Tabular + Deep RL agents

Custom Gymnasium environment

Optimizer Deep Dive: From Gradient Descent to Adam

Spring 2026 Featured

Pure NumPy implementations of BGD, SGD+Momentum, and Adam optimizers with a from-scratch neural network (backpropagation) trained on MNIST. Includes loss landscape visualization and initialization sensitivity analysis. 29 tests with CI.

Problem

Most practitioners use optimizer libraries without understanding the math. How do BGD, momentum, and Adam actually differ in convergence behavior across different curvature regimes?

Approach

Implemented all three optimizers and a feedforward neural network with full backpropagation in pure NumPy. Benchmarked on Rosenbrock, ill-conditioned quadratics (κ=2–500), and MNIST classification. Visualized 3D loss landscapes and gradient norms.

Impact

Adam achieves ~93% MNIST accuracy vs SGD ~85%. Condition number study shows Adam effectively reduces effective κ. He initialization critical for ReLU networks. 29/29 tests passing, zero external ML dependencies.

29 passing tests

~93% MNIST accuracy (pure NumPy)

3 optimizers from scratch

For perspectives on AI engineering, evaluation methodology, and where the field is heading:

Read My Thinking Posts →