Data Engineer with a Master's in Data Science. I build production ETL pipelines, improved Spark job throughput by 55% through Delta Lake optimization, and reduced query latency by 40% in a production ML pipeline — with hands-on experience at Databricks and Fractal Analytics.
End-to-end ETL and ML pipeline across MySQL, Snowflake, and AWS (S3, EC2). Reduced query latency by 40% through star-schema design and warehouse optimization. Improved prediction accuracy by 25% by catching bad records before model training. Containerized with Docker for one-command deployment.
⌥ View on GitHub ↗Deep learning inference pipeline using ResNet50 and CLIP for medical image classification. Added Grad-CAM explainability and automated structured report generation for clinical workflows. Deployed via Streamlit for real-time inference, replacing a manual review process.
⌥ View on GitHub ↗End-to-end cancer classification pipeline built with BiLSTM achieving 99.8% accuracy. Integrated PostgreSQL for data storage, Apache Airflow for pipeline orchestration, and deployed an interactive Streamlit app on AWS for real-time inference — turning a research model into a fully production-ready system.
⌥ View on GitHub ↗Large-scale data engineering pipeline built on NYC's TLC taxi trip dataset. Ingested and processed millions of trip records using PySpark, applied geospatial and temporal feature engineering, and built an analytics layer to surface insights on trip patterns, demand hotspots, and fare trends across NYC boroughs.
⌥ View on GitHub ↗Engineered an NLP pipeline processing 10,000+ clinical records using TF-IDF feature engineering and BiLSTM architectures, achieving a 0.998 F1-score. Benchmarked multiple deep learning models with rigorous hyperparameter tuning to maximize classification reliability and robustness. Findings validated through peer review and published at IEEE ICICIT 2024.
Validated expertise in designing and building data pipelines on AWS, covering data ingestion, transformation, storage, and orchestration using services like S3, Glue, Redshift, Kinesis, and Lake Formation. Demonstrates hands-on ability to architect scalable, production-grade data engineering solutions on the AWS cloud.
Completed Google's professional certificate covering the full data analytics workflow — from data cleaning and preparation to analysis, visualization, and storytelling. Gained proficiency in SQL, R, Tableau, and spreadsheets to derive actionable insights and communicate findings effectively to stakeholders.
Gained foundational knowledge of cloud computing concepts including service models (IaaS, PaaS, SaaS), deployment models, and the core benefits of cloud infrastructure. Built a solid understanding of how cloud platforms enable scalable and cost-efficient application and data workloads.
Deepened understanding of core cloud computing principles including virtualization, distributed storage, cloud networking, and security. Covered key architectural patterns used in modern cloud-native systems and how enterprises leverage cloud infrastructure for resilience and scalability.
Learned the fundamentals of the Internet of Things including how connected devices communicate, collect, and transmit data across networks. Covered IoT sensors, protocols, security considerations, and real-world use cases — providing useful context for understanding data generation at the edge.
Completed the foundational Python programming course by Dr. Chuck at the University of Michigan. Covered core programming concepts including variables, conditionals, loops, functions, and data structures — laying the groundwork for the data engineering and ML work that followed.
Gained awareness of the cybersecurity landscape including common threats, attack vectors, and best practices for protecting data and systems. Covered topics such as network security, encryption, malware, and how organizations defend against cyber threats — relevant to building secure data pipelines.
"Being part of the NSS team at Karunya was one of the most grounding experiences of my undergraduate years. Outside of all the technical work, it reminded me that the best skills you build aren't always in a classroom — they're in how you show up for people and work together toward something meaningful."