Skills

AI & NLP / ML

  • LLMs (OpenAI, Llama 3)
  • RAG (LangChain)
  • FAISS/Chroma
  • Embeddings
  • Semantic search
  • Agents
  • Prompt design
  • scikit-learn
  • TensorFlow
  • PyTorch
  • XGBoost
  • Time-series
  • Eval (precision/recall, F1, AUC)

Data Engineering & MLOps

  • ETL/ELT
  • Data validation
  • Schema/KPI docs
  • Lineage
  • Docker
  • CI/CD (GitHub Actions)
  • dbt
  • FastAPI
  • OpenAPI/Swagger
  • Monitoring/alerts
  • AWS (Lambda, S3)
  • Azure Blob
  • PostgreSQL
  • MySQL
  • BigQuery

Analytics & BI

  • Tableau
  • Excel
  • Cohorts/segmentation
  • A/B testing
  • Hypothesis testing
  • KPI/metric design
  • Dashboards
  • Data storytelling

Languages & Tools

  • Python
  • SQL
  • Java
  • C
  • Git
  • Jupyter
  • Linux/bash

Projects

AI Investment & Capital Deals Memo Copilot for Financial Analysis

AI system that ingests financial PDFs and auto-generates investment memos. Uses a RAG pipeline (LangChain + OpenAI + FAISS/Chroma) with AWS S3 storage and a Streamlit UI for extraction, document-aware Q&A, and memo drafting, featuring transparent agent execution. Designed to reduce analyst prep time and improve consistency.

End-to-End Data Pipeline: United States Non-Immigrant Visa Analysis and Prediction

Led a 4-member team to ingest, clean, and analyze U.S. non-immigrant visa data to study immigration patterns. Built an end-to-end pipeline with dbt + Python for transformation, feature engineering, and ML prediction.

Maximize Marketing: A Data Analysis Case Study

Analyzed fitness-app behavior; cleaned and joined data with SQL (CTEs, JOINs) and Excel (pivots, lookups). Communicated insights via Tableau dashboards and a GitHub write-up with recommendations.

SaaS Customer Churn Prediction, End to End MLOps

Built a production-ready churn prediction system for a SaaS dataset: clean/validate data, engineer features, train & compare models, and serve the best model via a FastAPI microservice. Workflow is fully reproducible (Docker + tests + CI/CD) with experiment tracking and basic drift/performance monitoring.

Figuring out Neural Networks: Classifying Breast Cancer from Mammography Images

Built a mammography classifier and compared a custom network to transfer-learning baselines (ResNet50, VGG16). Achieved 82% recall on the positive class using deep learning in Python.

Harry Potter and the Next Word: LSTM RNNs + Streamlit

Processed movie script lines (tokenization, n-grams, embeddings) and trained a 2-layer stacked LSTM to predict the next word. Improved accuracy by +15% (loss −32%) and deployed with Streamlit.

Summarize Anything: AI Content Compression - Summarization platform with Llama

Drop in long-form content and get an abstractive summary of the essential points in seconds, optimized for readability, powered by a Llama 3 based summarizer and a token-aware, cost-efficient orchestrator that handles long-form inputs without truncation.

Predicting with Volatility: Stacked LSTMs for AMZN Forecasting

Forecasted AMZN opening price with stacked LSTMs on Tiingo API data; achieved RMSE 7.4 on test and generated 30-day forecasts.

I'm passionate about building AI and Data solutions that solve real world problems. I would love to contribute with teams that already do it at scale.

Dashboards and Visualizations

Neurodivergence across Reddit: Themes and Topics

Neurodivergence across Reddit: Themes and Topics

Tableau dashboard: user sleep insights from wearable devices

User Sleep Insights: Wearable fitness devices

Tableau map: U.S. non-immigrant visa issuance (Oct–Nov FY2024)

United States Non-Immigrant Visa Issuance: October–November FY2024

Tableau dashboard: user distribution and categorization — wellness market

User Distribution & Categorization: Wellness Company Market