Welcome

I’m Chengzhi Zhao

Data Engineer | Data Engineering Content Creator | Contributor of Airflow, Flink | Founder of Data Engineering Space

Feature Stories

The data engineering space is evolving. Here are the resources I collected for practical data engineering resource.

The list is keep updating. 

Is Apache Airflow due for a replacement? mage-ai is the new ETL tool for data engineers to check out as a substitution. 

This story help you on how to identify Spark data skew? How to handle data skew with different options? 

My Stories

Source: Aron Visuals from Unsplash
Data Engineering

Airflow Schedule Interval 101

The airflow schedule interval could be a challenging concept to comprehend, even for developers work on Airflow for a while find difficult to grasp. A confusing question arises every once a while on StackOverflow is “Why my DAG is not running as expected?”. This problem usually indicates a misunderstanding among the Airflow schedule interval.

Read More
Photo by Nick Brunner on Unsplash
Data Engineering

Bidding War on Housing Market? Let’s Use R For Exploratory Data Analysis

Exploratory Data Analysis (EDA) is a methodology in data science as the initial approach to gain insights by visualizing and summarizing data. We will use some exploratory data analysis technics to find the reason behind the bidding war on the housing market.

Read More
Photo by LoboStudio Hamburg on Unsplash
Data Engineering

Visualizing Data with ggridges: Techniques to Eliminate Density Plot Overlaps in ggplot2

When it comes to visualizing data with a histogram and dealing with multiple groups, it can be quite challenging. I have recently come across a useful ggplot2 extension called ggridges that has been helpful for my data exploratory tasks.

Read More
Photo by Donald Tran on Unsplash
Data Engineering

Unlocking the Secrets of Slowly Changing Dimension (SCD): A Comprehensive View of 8 Types

Slowly Changing Dimension (SCD) is critical to dimensional modeling. We will discuss the eight types of SCDs. By the end, you will clearly understand each type and be able to differentiate between SCDs in dimensional modeling.

Read More
Photo by Sunder Muthukumaran on Unsplash
Data Engineering

Demystifying Null in SQL: A Comprehensive Guide for Data Professionals

Sometimes writing SQL can be frustrating, especially when encountering NULL values. This article can help you better understand NULL in SQL

Read More
Photo by Michał Turkiewicz on Unsplash
Data Engineering

How I Built a Tool to Visualize Expense In Sankey Diagram

My main goal is to enable people without programming experience to use the powerful Sankey Diagram by simply uploading the transaction CVS file from the popular site Mint.com.

Read More
Photo by Dan Cristian Pădureț on Unsplash
Data Engineering

Data Engineering: Why It’s About Much More Than Just the Tools You Use

One key learning I had while chasing the latest tool is: Tools are great, but many data engineering problems cannot be resolved by using the newest tool but by human — Data Engineers. I want to share my thoughts on Why data engineering is about much more than just the tools you use.

Read More
Photo by Erin Doering on Unsplash
Data Engineering

Building Better Data Warehouses with Dimensional Modeling: A Guide for Data Engineers

Let’s bring the data community’s attention to the essential- Building Better Data Warehouses with Dimensional Modeling: A Guide for Data Engineers.

Read More
Scroll to Top