Welcome

Build Resilient, Scalable Data Platforms with A Data Architect (Apache Contributor)

I help engineering teams and startups design production-ready pipelines using Apache Airflow, Spark, Kafka, and Flink in cloud-native environments.

Latest Articles & Tutorials

AI

The AI Wake-Up Call for Data Engineers: Why LLMs + MCP Matter Now

AI isn't coming for data engineering — it's becoming part of it. In this post, I explore how Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and Model Context Protocol (MCP) are transforming how data engineers build, query, and integrate modern data systems. Real-world tools like Cline, Cursor, and DuckDB show ...
Read More
Data Engineering

Unboxing Apache Airflow 3.0: What’s New, What’s Gone, and Why It Matters

Explore the latest features, UI updates, and key changes in Apache Airflow 3.0. This deep dive covers DAG versioning, event-driven scheduling, Docker setup, and more for data engineers and workflow automation pros.
Read More
Data Engineering

DuckDB Local UI is Awesome!

Discover how DuckDB Local UI revolutionises your data exploration experience. After years of using external tools, DuckDB’s native interface provides a seamless, quick, and intuitive way to interact with your data projects
Read More
Data Engineering

DeepSeek SmallPond: A Game-Changer for Data Engineers Seeking Lightweight Solutions

DeepSeek SmallPond is here to shake up data engineering. See how this lightweight open-source framework offers a fresh alternative to Apache Spark and Flink for batch and streaming processes.
Read More
Photo by Google DeepMind on Unsplash
AI

LLM for Data Visualization: How AI Shapes the Future of Analytics

Discover how to utilise LLM for data visualization by generating SQL queries using LLMs and building charts with Seaborn and Plotly. Learn how AI agents transform EDA and analysis.
Read More
Photo by Jongsun Lee on Unsplash
Data Engineering

Apache Airflow 3.0 Is Coming Soon: Here’s What You Can Expect

Discover the upcoming features in Apache Airflow 3.0, with insights from the Airflow 3.0 workstream. Get ready for the next big release!
Read More
AI

How to Build an AI Agent for Data Analytics Without Writing SQL

We will demonstrate the construction of an AI agent from inception using LangChain and DuckDB. We can address business inquiries without SQL expertise.
Read More
Data Engineering

How to build a web crawler with MWAA (AWS Airflow) with CDK

Integrating Apache Airflow with the AWS ecosystem has become easier than ever with MWAA. To make MWAA work efficiently, I prepared a comprehensive guide using CDK to spin up MWAA and some tips for MWAA specifically to help you understand the deployment for Airflow in AWS.
Read More
Photo by Vardan Papikyan on Unsplash
Data Engineering

The Foundation of Data Validation

If you are reading this blog post, you may have faced the challenge of data validation before, or you might be struggling with it. My goal in this post is to share my experience with data validation
Read More
Source: Aron Visuals from Unsplash
Data Engineering

Airflow Schedule Interval 101

The airflow schedule interval could be a challenging concept to comprehend, even for developers work on Airflow for a while find difficult to grasp. A confusing question arises every once a while on StackOverflow is "Why my DAG is not running as expected?". This problem usually indicates a misunderstanding among ...
Read More
Photo by Nick Brunner on Unsplash
Data Engineering

Bidding War on Housing Market? Let’s Use R For Exploratory Data Analysis

Exploratory Data Analysis (EDA) is a methodology in data science as the initial approach to gain insights by visualizing and summarizing data. We will use some exploratory data analysis technics to find the reason behind the bidding war on the housing market.
Read More
Photo by LoboStudio Hamburg on Unsplash
Data Engineering

Visualizing Data with ggridges: Techniques to Eliminate Density Plot Overlaps in ggplot2

When it comes to visualizing data with a histogram and dealing with multiple groups, it can be quite challenging. I have recently come across a useful ggplot2 extension called ggridges that has been helpful for my data exploratory tasks.
Read More
Scroll to Top