Available for opportunities

Dhinakar
Yalla

Data Engineer with a Master's in Data Science. I build production ETL pipelines, improved Spark job throughput by 55% through Delta Lake optimization, and reduced query latency by 40% in a production ML pipeline — with hands-on experience at Databricks and Fractal Analytics.

Dhinakar Yalla
40%
Query latency reduced
55%
Spark throughput gained
2TB+
Daily data processed
IEEE
Published research

Technical skills

⭐ AWS Certified Data Engineer
Languages
PythonSQLPySparkC++R
Data Engineering
ETL/ELT PipelinesDelta LakeData ModelingStar SchemaSnowflake SchemaQuery Optimizationdbt
Big Data & Orchestration
Apache SparkDatabricksApache AirflowKafkaHadoopApache Flink
Cloud & Databases
AWS S3EC2RedshiftGlueSnowflakePostgreSQLMongoDBRedis
Analytics & BI
Power BITableauPandasNumPyMatplotlib
DevOps & ML
DockerKubernetesGitTerraformscikit-learnPyTorchTensorFlowMLflowNLP

Experience

Databricks
Data Engineering Intern
Jan 2025 – Jun 2025
Remote, USA
  • Built Delta Lake-based ELT pipelines processing 2TB+ of daily event data using PySpark and Databricks Workflows — improved data freshness SLAs from 4 hours to under 45 minutes.
  • Optimized Spark job performance with Z-order and liquid clustering, reducing BI dashboard query scan time by 55%.
  • Designed a data quality monitoring framework using Great Expectations integrated into CI/CD pipelines, catching schema drift and null violations before production.
  • Containerized pipeline jobs with Docker and deployed onto Kubernetes clusters, cutting deployment setup time by 65% and enabling full environment parity across dev, staging, and production.
Fractal Analytics
Data Engineering Intern
Aug 2023 – Feb 2024
Chennai, India
  • Built and maintained ELT pipelines ingesting structured and semi-structured data from 10+ client sources into Snowflake, reducing manual handoff time by 45%.
  • Designed Spark-based batch processing jobs for 500GB+ datasets, improving job completion time by 30% through partition pruning and broadcast join optimization.
  • Engineered 15+ features from raw transactional data, reducing feature computation latency by 20% and improving downstream model quality.
  • Automated pipeline monitoring with Apache Airflow DAGs, cutting mean time to resolution by 40%.
1Stop.ai
Data Science Intern
Feb 2023 – Jul 2023
Remote, India
  • Built Python and SQL ETL pipelines to automate ingestion from multiple data sources, cutting manual processing time by 35%.
  • Integrated AWS S3 and EC2 into pipeline workflows, reducing data transfer overhead and improving end-to-end throughput.
  • Delivered Power BI and Tableau dashboards tracking 5+ KPIs in real time, adopted by business teams for weekly stakeholder reporting.

Projects

01
City Bike Price Prediction

End-to-end ETL and ML pipeline across MySQL, Snowflake, and AWS (S3, EC2). Reduced query latency by 40% through star-schema design and warehouse optimization. Improved prediction accuracy by 25% by catching bad records before model training. Containerized with Docker for one-command deployment.

MySQLSnowflakeAWSDockerML
⌥ View on GitHub ↗
02
AI-Powered Pneumonia Detection

Deep learning inference pipeline using ResNet50 and CLIP for medical image classification. Added Grad-CAM explainability and automated structured report generation for clinical workflows. Deployed via Streamlit for real-time inference, replacing a manual review process.

ResNet50CLIPGrad-CAMStreamlitPyTorch
⌥ View on GitHub ↗
03
Cancer Classification Pipeline

End-to-end cancer classification pipeline built with BiLSTM achieving 99.8% accuracy. Integrated PostgreSQL for data storage, Apache Airflow for pipeline orchestration, and deployed an interactive Streamlit app on AWS for real-time inference — turning a research model into a fully production-ready system.

BiLSTMPostgreSQLAirflowStreamlitAWSPython
⌥ View on GitHub ↗
04
NYC Taxi Trip Analytics

Large-scale data engineering pipeline built on NYC's TLC taxi trip dataset. Ingested and processed millions of trip records using PySpark, applied geospatial and temporal feature engineering, and built an analytics layer to surface insights on trip patterns, demand hotspots, and fare trends across NYC boroughs.

PySparkPythonSQLAWSAnalytics
⌥ View on GitHub ↗

Publications

Cancer Category Classification Using BiLSTM
IEEE ICICIT 2024 — Peer Reviewed & Published
Read on IEEE Xplore ↗

Engineered an NLP pipeline processing 10,000+ clinical records using TF-IDF feature engineering and BiLSTM architectures, achieving a 0.998 F1-score. Benchmarked multiple deep learning models with rigorous hyperparameter tuning to maximize classification reliability and robustness. Findings validated through peer review and published at IEEE ICICIT 2024.

BiLSTMTF-IDFNLPTensorFlowClinical NLPIEEE

Certifications

AWS Certified Data Engineer – Associate
Amazon Web Services (AWS)
November 2025

Validated expertise in designing and building data pipelines on AWS, covering data ingestion, transformation, storage, and orchestration using services like S3, Glue, Redshift, Kinesis, and Lake Formation. Demonstrates hands-on ability to architect scalable, production-grade data engineering solutions on the AWS cloud.

AWSS3GlueRedshiftData Engineering
Show Credential ↗
Google Data Analytics
Google
August 2025

Completed Google's professional certificate covering the full data analytics workflow — from data cleaning and preparation to analysis, visualization, and storytelling. Gained proficiency in SQL, R, Tableau, and spreadsheets to derive actionable insights and communicate findings effectively to stakeholders.

Data AnalyticsSQLTableauRBigQuery
Show Credential ↗
edX Verified Certificate for Introduction to Cloud Computing
edX
July 2021

Gained foundational knowledge of cloud computing concepts including service models (IaaS, PaaS, SaaS), deployment models, and the core benefits of cloud infrastructure. Built a solid understanding of how cloud platforms enable scalable and cost-efficient application and data workloads.

Cloud ComputingAWSIaaSPaaSSaaS
Show Credential ↗
Cloud Computing Core
edX
July 2021

Deepened understanding of core cloud computing principles including virtualization, distributed storage, cloud networking, and security. Covered key architectural patterns used in modern cloud-native systems and how enterprises leverage cloud infrastructure for resilience and scalability.

Cloud ArchitectureVirtualizationStorageNetworkingSecurity
Show Credential ↗
Introduction to IoT
Cisco
April 2021

Learned the fundamentals of the Internet of Things including how connected devices communicate, collect, and transmit data across networks. Covered IoT sensors, protocols, security considerations, and real-world use cases — providing useful context for understanding data generation at the edge.

IoTNetworkingSensorsProtocolsCisco
Show Credential ↗
Programming for Everybody (Getting Started with Python)
Coursera
March 2021

Completed the foundational Python programming course by Dr. Chuck at the University of Michigan. Covered core programming concepts including variables, conditionals, loops, functions, and data structures — laying the groundwork for the data engineering and ML work that followed.

PythonProgrammingData StructuresFunctionsOOP
Show Credential ↗
Introduction to Cybersecurity
Cisco
December 2020

Gained awareness of the cybersecurity landscape including common threats, attack vectors, and best practices for protecting data and systems. Covered topics such as network security, encryption, malware, and how organizations defend against cyber threats — relevant to building secure data pipelines.

CybersecurityNetwork SecurityThreatsEncryptionCisco
Show Credential ↗

Education

MS, Engineering Science — Data Science
University at Buffalo, SUNY
Aug 2024 – Dec 2025 · Buffalo, NY
BTech, Computer Science and Engineering
Karunya Institute of Technology and Sciences
Aug 2020 – May 2024 · India

Volunteering

NSS Volunteer
NSS Karunya · Social Services
Aug 2020 – May 2024
Karunya Institute, India

"Being part of the NSS team at Karunya was one of the most grounding experiences of my undergraduate years. Outside of all the technical work, it reminded me that the best skills you build aren't always in a classroom — they're in how you show up for people and work together toward something meaningful."

Community ServiceLeadershipTeamworkSocial Impact

Get in touch

yalladhinakar@gmail.com