Welcome
I’m Chengzhi Zhao
Content Creator on Data Engineering, Productivity, and Life | Contributor of Airflow, Flink | Creator of Data Engineering Space

Feature Stories
The data engineering space is evolving. Here are the resources I collected for practical data engineering resource.
The list is keep updating.
Is Apache Airflow due for a replacement? mage-ai is the new ETL tool for data engineers to check out as a substitution.
This story help you on how to identify Spark data skew? How to handle data skew with different options?
My Stories
Blog
5 Lessons I Learned From a Totaled Car Accident
Experiencing a totaled car accident that results in the total loss of your vehicle is a difficult situation to deal with. I want to share what I learned with more people.
November 14, 2023 Life
Read More Data Engineering
Airflow Schedule Interval 101
The airflow schedule interval could be a challenging concept to comprehend, even for developers work on Airflow for a while find difficult to grasp. A confusing question arises every once a while on StackOverflow is "Why my DAG is not running as expected?". This problem usually indicates a misunderstanding among ...
September 14, 2023 Apache Airflow Data Engineering Python
Read More Data Engineering
Bidding War on Housing Market? Let’s Use R For Exploratory Data Analysis
Exploratory Data Analysis (EDA) is a methodology in data science as the initial approach to gain insights by visualizing and summarizing data. We will use some exploratory data analysis technics to find the reason behind the bidding war on the housing market.
August 25, 2023 Data Data Engineering Data Visualization R
Read More Data Engineering
Visualizing Data with ggridges: Techniques to Eliminate Density Plot Overlaps in ggplot2
When it comes to visualizing data with a histogram and dealing with multiple groups, it can be quite challenging. I have recently come across a useful ggplot2 extension called ggridges that has been helpful for my data exploratory tasks.
August 7, 2023 Data Engineering Data Visualization ggplot2 R
Read More Data Engineering
Unlocking the Secrets of Slowly Changing Dimension (SCD): A Comprehensive View of 8 Types
Slowly Changing Dimension (SCD) is critical to dimensional modeling. We will discuss the eight types of SCDs. By the end, you will clearly understand each type and be able to differentiate between SCDs in dimensional modeling.
July 14, 2023 Data Engineering Data Warehouse SCD
Read More Data Engineering
Demystifying Null in SQL: A Comprehensive Guide for Data Professionals
Sometimes writing SQL can be frustrating, especially when encountering NULL values. This article can help you better understand NULL in SQL
June 30, 2023 Data Engineering
Read More Data Engineering
How I Built a Tool to Visualize Expense In Sankey Diagram
My main goal is to enable people without programming experience to use the powerful Sankey Diagram by simply uploading the transaction CVS file from the popular site Mint.com.
June 22, 2023 Data Engineering Data Visualization R Sankey Diagram Shiny
Read More Data Engineering
Data Engineering: Why It’s About Much More Than Just the Tools You Use
One key learning I had while chasing the latest tool is: Tools are great, but many data engineering problems cannot be resolved by using the newest tool but by human — Data Engineers. I want to share my thoughts on Why data engineering is about much more than just the tools you ...
May 17, 2023 Data Engineering Data Warehouse
Read More Data Engineering
Building Better Data Warehouses with Dimensional Modeling: A Guide for Data Engineers
Let's bring the data community's attention to the essential- Building Better Data Warehouses with Dimensional Modeling: A Guide for Data Engineers.
May 5, 2023 Data Engineering Data Warehouse
Read More Data Engineering
Boosting Spark Union Operator Performance: Optimization Tips for Improved Query Speed
We will focus on the Apache Spark Union Operator Performance with examples, show you the physical query plan, and share techniques for optimization in this story.
April 20, 2023 Apache Spark Data Engineering Spark Performance
Read More Data Engineering
Why R for Data Engineering is More Powerful Than You Thought
R could add potential benefits to help the data engineering community. Let's discuss about Why R for Data Engineering is More Powerful Than You Thought.
April 15, 2023 Data Engineering ggplot2 R Shiny
Read More Data Engineering
5 Hidden Apache Spark Facts That Fewer People Talk About
I want to share 5 hidden facts about Apache Spark that I learned throughout my career. Those can be helpful to you to save you some time reading the Apache Spark source code.
April 4, 2023 Apache Spark Data Engineering Spark Performance
Read More