Welcome

I’m Chengzhi Zhao

Content Creator on Data Engineering, Productivity | Contributor of Airflow, Flink | Creator of Data Engineering Space

Feature Stories

The data engineering space is evolving. Here are the resources I collected for practical data engineering resource.

The list is keep updating. 

The airflow schedule interval could be a challenging concept to comprehend, even for developers work on Airflow for a while find difficult to grasp. A confusing question arises every once a while on StackOverflow is “Why my DAG is not running as expected?”

This story help you on how to identify Spark data skew? How to handle data skew with different options? 

My Stories

Photo by Mateusz Butkiewicz on Unsplash
Data Engineering

4 Faster Pandas Alternatives for Data Analysis

Pandas is no doubt one of the most popular libraries in Python. However, Pandas doesn't shine in the land of data processing with a large dataset. We will compare 4 faster pandas alternatives for data analysis: Polars, Dask, Vaex, Modin
Read More
Photo by Jeffrey Brandjes on Unsplash
Data Engineering

Think in SQL — Avoid Writing SQL in a Top to Bottom Approach

SQL logical query processing order can help you understand why to change writing SQL in the top to bottom approach. It can also help you think in SQL clearly and develop your query more effectively
Read More
Photo by Daria Nepriakhina 🇺🇦 on Unsplash
Data Engineering

5 Fantastic Data Pipeline Orchestration Tools For R

Many modern data orchestration projects like Apache Airflow and Luigi are Python-based. Let's explore the popular data pipeline orchestration options for R.
Read More
Photo by Huyen Bui on Unsplash
Data Engineering

Get Fluent in Python Decorators by Visualizing It

Python decorator is syntactic sugar. You can achieve everything without explicitly using the decorator. However, Using the decorator can help your code be more concise and readable. Ultimately, you write fewer lines of code by leveraging Python decorators.
Read More
Photo by Gene Devine on Unsplash
Data Engineering

How to Build Data Animation in R

Have you seen any beautiful racing bar chart data animation on Youtube and wondered how it was built? I will show you how to use gganimate in R to animate data by creating a racing bar chart as an example.
Read More
Photo by Amy Humphries on Unsplash
Productivity

How I Found Peace of Mind After Timeboxing

I am tired of continuous rapid context switching and keep being distracted. Interruption kept occurring and continued the next day. Timeboxing helped me find peace of mind in an isolated environment to concentrate on my task.
Read More
Photo by Adi Nugroho on Unsplash
Data Engineering

6 Side Project Ideas for New and Experienced Data Engineers

Data engineers can work on some side projects to get experience. Those projects could initiate impressive discussions to help you land a dream job. We will introduce 6 data engineering side project ideas regardless of your experience.
Read More
Foto von Enis Yavuz auf Unsplash
Data Engineering

Is Apache Airflow Due for Replacement? The First Impression Of mage-ai

Airflow has been widespread for years. Is Apache Airflow due for a replacement? mage-ai is the new ETL tool for data engineers to check out as a substitution. I have taken a first impression of mage-ai and will share my thoughts.
Read More
Photo by Karsten Würth on Unsplash
Data Engineering

Here Is What I Learned Using Apache Airflow over 6 Years

Apache Airflow is undoubtedly the most popular open-source project for data engineering for years. It gains popularity at the right time with The Rise Of Data Engineer. Today, I want to share my journey with Airflow and what I learned over 6 years.
Read More
Photo by Choong Deng Xiang on Unsplash
Data Engineering

How to Visualize Monthly Expenses in a Comprehensive Way: Develop a Sankey Diagram in R

Personal budgeting APP like Mint/Personal Capital/Clarity only provide three limited types of charts. Have you ever wondered if charts are good enough to get better ideas on your monthly income and expense? Are there ways to visualize monthly expenses in a comprehensive way? In this article, I will share with ...
Read More
Photo by Lizzi Sassman on Unsplash
Data Engineering

Deep Dive into Handling Apache Spark Data Skew

"Why my Spark job is running slow?" is an inevitable question. We will cover how to identify Spark data skew and how to handle data skew with different options, including key salting
Read More
Photo by JIUNN-YIH LAU on Unsplash
Data Engineering

5 Tips for Self-Promotion as Data Professionals

Getting the work done isn't the journey's end. Your work should be your channel to get YOU self-promotion. I will give five tips to get self-promotion as data professionals
Read More
Scroll to Top