I’m Chengzhi Zhao
Data Engineer | Data Content Creator | Contributor of Airflow, Flink | Creator of Data Engineering Space
I write content on Data Engineering, Productivity, and DIY projects

Feature Stories
The data engineering space is evolving. Here are the resources I collected for practical data engineering resource.
The list is keep updating.
Is Apache Airflow due for a replacement? mage-ai is the new ETL tool for data engineers to check out as a substitution.
This story help you on how to identify Spark data skew? How to handle data skew with different options?
My Stories
Data Engineering: Why It’s About Much More Than Just the Tools You Use
One key learning I had while chasing the latest tool is: Tools are great, but many data engineering problems cannot be resolved by using the newest tool but by human — Data Engineers. I want to share my thoughts on Why data engineering is about much more than just the tools you use.
Building Better Data Warehouses with Dimensional Modeling: A Guide for Data Engineers
Let’s bring the data community’s attention to the essential- Building Better Data Warehouses with Dimensional Modeling: A Guide for Data Engineers.
Boosting Spark Union Operator Performance: Optimization Tips for Improved Query Speed
We will focus on the Apache Spark Union Operator Performance with examples, show you the physical query plan, and share techniques for optimization in this story.
Why R for Data Engineering is More Powerful Than You Thought
R could add potential benefits to help the data engineering community. Let’s discuss about Why R for Data Engineering is More Powerful Than You Thought.
5 Hidden Apache Spark Facts That Fewer People Talk About
I want to share 5 hidden facts about Apache Spark that I learned throughout my career. Those can be helpful to you to save you some time reading the Apache Spark source code.
Uncovering the Truth About Apache Spark Performance: coalesce(1) vs. repartition(1)
We will discuss a neglected part of Apache Spark Performance between coalesce(1) and repartition(1), and it could be one of the things to be attentive to when you check the Spark job performance.
The Practical Data Engineering Resource
The data engineering space is evolving. Here are the resources I collected for practical data engineering resource.
How to Find the Best Deals On Time with R and Mage
How to find the best deals and coupons promptly can save you money and time. We can quickly build a weekend project that automatically finds the best deals on time with R and Mage