I’m Chengzhi Zhao
Data Engineer | Data Engineering Content Creator | Contributor of Airflow, Flink | Founder of Data Engineering Space

Feature Stories
The data engineering space is evolving. Here are the resources I collected for practical data engineering resource.
The list is keep updating.
Is Apache Airflow due for a replacement? mage-ai is the new ETL tool for data engineers to check out as a substitution.
This story help you on how to identify Spark data skew? How to handle data skew with different options?
My Stories
Airflow Schedule Interval 101
The airflow schedule interval could be a challenging concept to comprehend, even for developers work on Airflow for a while find difficult to grasp. A confusing question arises every once a while on StackOverflow is “Why my DAG is not running as expected?”. This problem usually indicates a misunderstanding among the Airflow schedule interval.
Bidding War on Housing Market? Let’s Use R For Exploratory Data Analysis
Exploratory Data Analysis (EDA) is a methodology in data science as the initial approach to gain insights by visualizing and summarizing data. We will use some exploratory data analysis technics to find the reason behind the bidding war on the housing market.
Visualizing Data with ggridges: Techniques to Eliminate Density Plot Overlaps in ggplot2
When it comes to visualizing data with a histogram and dealing with multiple groups, it can be quite challenging. I have recently come across a useful ggplot2 extension called ggridges that has been helpful for my data exploratory tasks.
Unlocking the Secrets of Slowly Changing Dimension (SCD): A Comprehensive View of 8 Types
Slowly Changing Dimension (SCD) is critical to dimensional modeling. We will discuss the eight types of SCDs. By the end, you will clearly understand each type and be able to differentiate between SCDs in dimensional modeling.
Demystifying Null in SQL: A Comprehensive Guide for Data Professionals
Sometimes writing SQL can be frustrating, especially when encountering NULL values. This article can help you better understand NULL in SQL
How I Built a Tool to Visualize Expense In Sankey Diagram
My main goal is to enable people without programming experience to use the powerful Sankey Diagram by simply uploading the transaction CVS file from the popular site Mint.com.
Data Engineering: Why It’s About Much More Than Just the Tools You Use
One key learning I had while chasing the latest tool is: Tools are great, but many data engineering problems cannot be resolved by using the newest tool but by human — Data Engineers. I want to share my thoughts on Why data engineering is about much more than just the tools you use.
Building Better Data Warehouses with Dimensional Modeling: A Guide for Data Engineers
Let’s bring the data community’s attention to the essential- Building Better Data Warehouses with Dimensional Modeling: A Guide for Data Engineers.