The Ultimate Apache Spark Guide: Performance Tuning, PySpark Examples, and New 4.0 Features
The ultimate guide to Apache Spark. Learn performance tuning with PySpark examples, fix common issues like data skew, and explore new Spark 4.0 features.
The ultimate guide to Apache Spark. Learn performance tuning with PySpark examples, fix common issues like data skew, and explore new Spark 4.0 features.
Stay current with the essential data engineering news from June 2025. This monthly roundup covers the biggest announcements from Databricks’ Data + AI Summit, new Snowflake features, Apache Flink updates, and the growing role of AI and Apache Iceberg in the data landscape.
Data Engineering Heats Up in June 2025: A Look at the Latest Developments Read More »
Learn how to avoid 10 common data engineering pitfalls—like Spark data skew, Airflow retry chaos, schema drift, and more—with practical solutions
Don’t Get Tripped Up! 10 Common Data Engineering Pitfalls Read More »
Explore how AI in data engineering is shaping the future. This 2025 guide helps new grads build the skills, tools, and mindset to thrive in a cloud-driven, AI-first world.
Data Engineering in 2025: A Practical Guide for New Grads Entering the AI-First Era Read More »
DeepSeek SmallPond is here to shake up data engineering. See how this lightweight open-source framework offers a fresh alternative to Apache Spark and Flink for batch and streaming processes.
DeepSeek SmallPond: A Game-Changer for Data Engineers Seeking Lightweight Solutions Read More »
We will focus on the Apache Spark Union Operator Performance with examples, show you the physical query plan, and share techniques for optimization in this story.
Boosting Spark Union Operator Performance: Optimization Tips for Improved Query Speed Read More »
I want to share 5 hidden facts about Apache Spark that I learned throughout my career. Those can be helpful to you to save you some time reading the Apache Spark source code.
5 Hidden Apache Spark Facts That Fewer People Talk About Read More »
We will discuss a neglected part of Apache Spark Performance between coalesce(1) and repartition(1), and it could be one of the things to be attentive to when you check the Spark job performance.
Uncovering the Truth About Apache Spark Performance: coalesce(1) vs. repartition(1) Read More »
Discover the Essential Reading List for Data Engineers: 10 Classic Books You Can’t Miss. While many free online resources are available, they often lack the depth and context needed to truly master the field. In this article, I will share ten classic books that cover everything from fundamental technical skills like Python and SQL, to more advanced topics like Apache Spark, Apache Flink, Apache Beam, Apache Airflow, Kubernetes, distributed systems, and dimensional modeling.
The Essential Reading List for Data Engineers: 10 Classic Books You Can’t Miss Read More »
“Why my Spark job is running slow?” is an inevitable question. We will cover how to identify Spark data skew and how to handle data skew with different options, including key salting
Deep Dive into Handling Apache Spark Data Skew Read More »