Welcome
Build Resilient, Scalable Data Platforms with A Data Architect (Apache Contributor)
I help engineering teams and startups design production-ready pipelines using Apache Airflow, Spark, Kafka, and Flink in cloud-native environments.
Contributor of Apache Airflow, Flink
Creator of Data Engineering Space
Latest Articles & Tutorials
AI
The AI Wake-Up Call for Data Engineers: Why LLMs + MCP Matter Now
AI isn't coming for data engineering — it's becoming part of it. In this post, I explore how Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and Model Context Protocol (MCP) are transforming how data engineers build, query, and integrate modern data systems. Real-world tools like Cline, Cursor, and DuckDB show ...
Read More
Data Engineering
Unboxing Apache Airflow 3.0: What’s New, What’s Gone, and Why It Matters
Explore the latest features, UI updates, and key changes in Apache Airflow 3.0. This deep dive covers DAG versioning, event-driven scheduling, Docker setup, and more for data engineers and workflow automation pros.
April 25, 2025
Apache Airflow Data Engineering
Read More
Data Engineering
DuckDB Local UI is Awesome!
Discover how DuckDB Local UI revolutionises your data exploration experience. After years of using external tools, DuckDB’s native interface provides a seamless, quick, and intuitive way to interact with your data projects
March 15, 2025
Data Data Engineering Data Visualization DuckDb
Read More
Data Engineering
DeepSeek SmallPond: A Game-Changer for Data Engineers Seeking Lightweight Solutions
DeepSeek SmallPond is here to shake up data engineering. See how this lightweight open-source framework offers a fresh alternative to Apache Spark and Flink for batch and streaming processes.
March 8, 2025
Apache Spark Data Engineering DuckDb SmallPond
Read More
AI
LLM for Data Visualization: How AI Shapes the Future of Analytics
Discover how to utilise LLM for data visualization by generating SQL queries using LLMs and building charts with Seaborn and Plotly. Learn how AI agents transform EDA and analysis.
January 16, 2025
AI Data Visualization LangChain LangGraph
Read More
Data Engineering
Apache Airflow 3.0 Is Coming Soon: Here’s What You Can Expect
Discover the upcoming features in Apache Airflow 3.0, with insights from the Airflow 3.0 workstream. Get ready for the next big release!
January 8, 2025
Apache Airflow Data Engineering
Read More
Data Engineering
How to build a web crawler with MWAA (AWS Airflow) with CDK
Integrating Apache Airflow with the AWS ecosystem has become easier than ever with MWAA. To make MWAA work efficiently, I prepared a comprehensive guide using CDK to spin up MWAA and some tips for MWAA specifically to help you understand the deployment for Airflow in AWS.
October 5, 2024
Apache Airflow Data Data Engineering Scrapy
Read More
Data Engineering
The Foundation of Data Validation
If you are reading this blog post, you may have faced the challenge of data validation before, or you might be struggling with it. My goal in this post is to share my experience with data validation
April 29, 2024
Data Data Engineering Data Visualization SQL
Read More
Data Engineering
Airflow Schedule Interval 101
The airflow schedule interval could be a challenging concept to comprehend, even for developers work on Airflow for a while find difficult to grasp. A confusing question arises every once a while on StackOverflow is "Why my DAG is not running as expected?". This problem usually indicates a misunderstanding among ...
September 14, 2023
Apache Airflow Data Engineering Python
Read More
Data Engineering
Bidding War on Housing Market? Let’s Use R For Exploratory Data Analysis
Exploratory Data Analysis (EDA) is a methodology in data science as the initial approach to gain insights by visualizing and summarizing data. We will use some exploratory data analysis technics to find the reason behind the bidding war on the housing market.
August 25, 2023
Data Data Engineering Data Visualization R
Read More
Data Engineering
Visualizing Data with ggridges: Techniques to Eliminate Density Plot Overlaps in ggplot2
When it comes to visualizing data with a histogram and dealing with multiple groups, it can be quite challenging. I have recently come across a useful ggplot2 extension called ggridges that has been helpful for my data exploratory tasks.
August 7, 2023
Data Engineering Data Visualization ggplot2 R
Read More












