
The data engineering space is evolving. Here are the resources I collected for practical data engineering resource.
Last Updated: 2023–04–02
Books
- The Master Book On Distributed System: Designing Data-Intensive Application
- The SQL Fundamental Book: T-SQL Querying (Developer Reference)
- OLAP/Data Warehouse Must Read: The Data Warehouse Toolkit (The Definitive Guide to Dimensional Modeling, 3rd Edition)
- In-depth on Python: Fluent Python: Clear, Concise, and Effective Programming 2nd Edition
- Learning From the Creator: Spark: The Definitive Guide: Big Data Processing Made Simple
- Workflow Management Tool: Data Pipelines with Apache Airflow
- All You Need to Know About Streaming Foundation: Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing
- Understand Data Science: R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
- Data Communication: Storytelling with Data: A Data Visualization Guide for Business Professionals
- Gain Knowledge on Cloud Infra: Kubernetes in Action
- The Philosophy of Data Engineering: Fundamentals of Data Engineering: Plan and Build Robust Data Systems
- Data Mesh: Delivering Data-Driven Value at Scale
The Essential Reading List for Data Engineers: 10 Classic Books You Can’t Miss
February 20, 2023 No Comments
Discover the Essential Reading List for Data Engineers: 10 Classic Books You Can't Miss. While many free online resources are available, they often lack the ...
Read More → Data Engineering Space Leader (who to follow)
Practical Data Engineering Framework
Data Processing
- Apache Spark: Unified engine for large-scale data analytics
- Apache Flink: Stateful Computations over Data Streams
- Apache Beam: The easiest way to do batch and streaming data processing.
Deep Dive into Handling Apache Spark Data Skew
December 30, 2022 No Comments
"Why my Spark job is running slow?" is an inevitable question. We will cover how to identify Spark data skew and how to handle data ...
Read More → Workflow Orchestration
Here Is What I Learned Using Apache Airflow over 6 Years
January 7, 2023 No Comments
Apache Airflow is undoubtedly the most popular open-source project for data engineering for years. It gains popularity at the right time with The Rise Of ...
Read More → Is Apache Airflow Due for Replacement? The First Impression Of mage-ai
January 12, 2023 No Comments
Airflow has been widespread for years. Is Apache Airflow due for a replacement? mage-ai is the new ETL tool for data engineers to check out ...
Read More → 5 Fantastic Data Pipeline Orchestration Tools For R
January 28, 2023 No Comments
Many modern data orchestration projects like Apache Airflow and Luigi are Python-based. Let's explore the popular data pipeline orchestration options for R.
Read More → OLAP Query
- Druid: a high-performance, real-time analytics database that delivers sub-second queries on streaming and batch data at scale and under load.
- Trino: a query engine that runs at ludicrous speed
- ClickHouse: a column-oriented database that enables its users to generate powerful analytics, using SQL queries, in real-time.
Awesome Blogs
Classic Articles
- Classic Streaming Foundation (1): Streaming 101: The world beyond batch
- Classic Streaming Foundation (2): Streaming 102: The world beyond batch
- Spark Tuning Foundation (1): How-to: Tune Your Apache Spark Jobs (Part 1)
- Spark Tuning Foundation (2): How-to: Tune Your Apache Spark Jobs (Part 2)
- Airflow Scheduler Demystify: Airflow Scheduler 101
About Me
I hope my stories are helpful to you.
For data engineering post, you can also subscribe to my new articles or becomes a referred Medium member that also gets full access to stories on Medium.
More Articles
How to Visualize Monthly Expenses in a Comprehensive Way: Develop a Sankey Diagram in R
Personal budgeting APP like Mint/Personal Capital/Clarity only provide three limited types of charts. Have you ever wondered if charts are good enough to get better ...
Read More → 5 Tips for Self-Promotion as Data Professionals
Getting the work done isn't the journey's end. Your work should be your channel to get YOU self-promotion. I will give five tips to get ...
Read More → How to Engage with Users By Storytelling: Show Data Analytics in R and Shiny
Using R and Shiny, we can build an app where the end users can interact with the data analysis we have done. I will show ...
Read More →