Visualizing the monthly cash flow isn’t new if you use personal budgeting/finance tools like Mint/Personal Capital/Clarity. All those tools primarily provide three types of charts: pie charts, bar charts, and line charts. However, have you ever wondered if charts are good enough to get better ideas about your monthly income and expense? Are there ways to visualize monthly expenses comprehensively? In this article, I will share with you how to create a Sankey Diagram In R to help better you gain more insights into your financial situation.
What Is Sankey Diagram
From Wikipedia: Sankey Diagrams are flow diagrams in which the width of the arrows is proportional to the flow rate. One of the famous Sankey diagrams is Napoleon‘s invasion of Russia. The diagram below clearly shows the time and number of troops left.
Back to personal finance monthly cash flow, it is a perfect use case to adopt the Sankey diagram to demonstrate the cash flow and which account the money originates from or goes to.
The pie and bar charts in the Mint App don’t show how cash flows but the amount of money spent/earned in one category, which is one of the limitations to deep dive your personal finance by cash flows.
Here is one of the nice Sankey diagram on Reddit. Our goal in this article is to recreate a similar one with a data dump from a personal finance app like Mint.
How To Download Your Monthly Transactions From Mint
We will use Mint.com as an example to download the transactions. It is straightforward from the web application, and the mobile APP doesn’t have this option. Once you log on to Mint, go to TRANSACTIONS, scroll to the button, and you should see an option “export all xxx transactions.” A CSV file will be downloaded when you clicked on that option.
The downloaded CSV file has the following fields: Date, Description, Original Description, Amount, Transaction Type, Category, Account Name, Labels, Note. We will use Amount, Transaction Type, Category, Account Name to build the Sankey Diagram.
Create Sankey Diagram In R and ggplot2
One of the libraries we will use to build the Sankey diagram is ggalluvial. The design and functionality were inspired initially by the alluvial package. One of the great things about ggalluvial is that it builds on top of ggplot2, and you can get the benefit of the grammar of graphics
library(ggplot2) library(dplyr) library(ggthemes) library(ggalluvial) df = read.csv("~/Downloads/transactions.csv") df <- df %>% select(Date, Amount, Category, Transaction.Type, Account.Name) %>% mutate(Date=as.Date(Date, format="%m/%d/%Y")) %>% filter(Date > as.Date('12/01/2021', format="%m/%d/%Y")) %>% group_by(Category, Transaction.Type, Account.Name) %>% summarise(Expense = sum(Amount)) %>% filter(Expense > 100, Transaction.Type == "debit", !Category %in% c("Transfer", "Paycheck", "Credit Card Payment", "Mortgage & Rent", "Investments")) ggplot(df, aes(axis1 = Transaction.Type, axis2 = Account.Name, axis3 = Category, y = Expense)) + scale_x_discrete(limits = c("Transaction.Type", "Account.Name", "Category"), expand = c(.2, .05)) + geom_alluvium() + geom_stratum() + geom_text(stat = "stratum", aes(label = after_stat(stratum))) + theme_economist() + scale_colour_economist()
The code above can be broken down into 3 categories:
- Import libraries and read the downloaded CSV file
- Transformation: we selected only the needed fields; converted the timestamp from string to date; sum the dollar amount group by transaction type, account name, and category. The group by the condition is to help build the Sankey diagram stages.
- Build the data visualization: all we have to do is put the group by condition into axis, then name the field we’d like to see the flows in
scale_x_discreteand then call
ggalluvialto start building the chart.
Create Sankey Diagram Without Coding
There is also a website called sankeymatic.com that provides a no-coding option for drawing a nice Sankey diagram. It requires the user to format the input in a certain way, and then you should have the same result as the above Reddit post.
The personal finance app provides a quick easy data visualization, but it limits itself to advanced comprehensive usage like cash flow analysis. I hope this article complements this area of your personal finance analysis. Please let me know what you think about the Sankey diagram and using R to build such an excellent chart by leaving a comment below.
I hope my stories are helpful to you.
For data engineering post, you can also subscribe to my new articles or becomes a referred Medium member that also gets full access to stories on Medium.
How to Engage with Users By Storytelling: Show Data Analytics in R and Shiny
Using R and Shiny, we can build an app where the end users can interact with the data analysis we have done. I will show you how to engage with users by storytelling - show data analytics in R and Shiny.
R For Data Analysis: How to Find the Perfect Cocomelon Video for Your Kids
I will share my journey on using R for Data Analysis: building an end-to-end solution for exploring trending Cocomelon videos using R from scratch.
Is ChatGPT Making People Lose Interest in Writing: Learning From Using ChatGPT
ChatGPT is powerful and scary. As people interested in writing, we have thoughts and manually type each word. Will this change how we write, and will more people lose interest?
After using ChatGPT for some time, my answer is: No. It isn’t capable of changing anything humans produce. But it could hurt people who want to get started.