How to Visualize Monthly Expenses in a Comprehensive Way: Develop a Sankey Diagram in R

Visualizing the monthly cash flow isn’t new if you use personal budgeting/finance tools like Mint/Personal Capital/Clarity. All those tools primarily provide three types of charts: pie charts, bar charts, and line charts. However, have you ever wondered if charts are good enough to get better ideas about your monthly income and expense? Are there ways to visualize monthly expenses comprehensively? In this article, I will share with you how to create a Sankey Diagram In R to help better you gain more insights into your financial situation.

What Is Sankey Diagram

From Wikipedia: Sankey Diagrams are flow diagrams in which the width of the arrows is proportional to the flow rate. One of the famous Sankey diagrams is Napoleon‘s invasion of Russia. The diagram below clearly shows the time and number of troops left.

Back to personal finance monthly cash flow, it is a perfect use case to adopt the Sankey diagram to demonstrate the cash flow and which account the money originates from or goes to.

The pie and bar charts in the Mint App don’t show how cash flows but the amount of money spent/earned in one category, which is one of the limitations to deep dive your personal finance by cash flows.

Here is one of the nice Sankey diagram on Reddit. Our goal in this article is to recreate a similar one with a data dump from a personal finance app like Mint.

Prerequisite

Although it requires you to install R to get better-customized results, you can add little code yourselves, and having a deep understanding of R is not required. You can copy & paste the code from this post.

You’d need to install R and RStudio to get the best experience in R

How To Download Your Monthly Transactions From Mint

We will use Mint.com as an example to download the transactions. It is straightforward from the web application, and the mobile APP doesn’t have this option. Once you log on to Mint, go to TRANSACTIONS, scroll to the button, and you should see an option “export all xxx transactions.” A CSV file will be downloaded when you clicked on that option.

The downloaded CSV file has the following fields: Date, Description, Original Description, Amount, Transaction Type, Category, Account Name, Labels, Note. We will use Amount, Transaction Type, Category, Account Name to build the Sankey Diagram.

Create Sankey Diagram In R and ggplot2

One of the libraries we will use to build the Sankey diagram is ggalluvial. The design and functionality were inspired initially by the alluvial package. One of the great things about ggalluvial is that it builds on top of ggplot2, and you can get the benefit of the grammar of graphics

				
					library(ggplot2)
library(dplyr) 
library(ggthemes)
library(ggalluvial)

df = read.csv("~/Downloads/transactions.csv")

df <- df %>% 
  select(Date, Amount, Category, Transaction.Type, Account.Name) %>% 
  mutate(Date=as.Date(Date, format="%m/%d/%Y")) %>% 
  filter(Date > as.Date('12/01/2021', format="%m/%d/%Y")) %>% 
  group_by(Category, Transaction.Type, Account.Name) %>% 
  summarise(Expense = sum(Amount)) %>% 
  filter(Expense > 100, Transaction.Type == "debit", !Category %in% c("Transfer", "Paycheck", "Credit Card Payment", "Mortgage & Rent", "Investments"))
  
ggplot(df, aes(axis1 = Transaction.Type, axis2 = Account.Name, axis3 = Category, y = Expense)) +
  scale_x_discrete(limits = c("Transaction.Type", "Account.Name", "Category"), expand = c(.2, .05)) +
  geom_alluvium() + 
  geom_stratum() + 
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  theme_economist() + 
  scale_colour_economist()

The code above can be broken down into 3 categories:

Import libraries and read the downloaded CSV file
Transformation: we selected only the needed fields; converted the timestamp from string to date; sum the dollar amount group by transaction type, account name, and category. The group by the condition is to help build the Sankey diagram stages.
Build the data visualization: all we have to do is put the group by condition into axis, then name the field we’d like to see the flows in scale_x_discreteand then call ggalluvialto start building the chart.

Now we can visualize the diagram as follows, which clearly shows the cash flow on each category which account they are coming from and how each category is distributed.

Create Sankey Diagram Without Coding

There is also a website called sankeymatic.com that provides a no-coding option for drawing a nice Sankey diagram. It requires the user to format the input in a certain way, and then you should have the same result as the above Reddit post.

Final Thoughts

The personal finance app provides a quick easy data visualization, but it limits itself to advanced comprehensive usage like cash flow analysis. I hope this article complements this area of your personal finance analysis. Please let me know what you think about the Sankey diagram and using R to build such an excellent chart by leaving a comment below.

About Me

I hope my stories are helpful to you.

For data engineering post, you can also subscribe to my new articles or becomes a referred Medium member that also gets full access to stories on Medium.

In case of questions/comments, do not hesitate to write in the comments of this story or reach me directly through Linkedin or Twitter.

How to Visualize Monthly Expenses in a Comprehensive Way: Develop a Sankey Diagram in R

What Is Sankey Diagram

Prerequisite

How To Download Your Monthly Transactions From Mint

Create Sankey Diagram In R and ggplot2

Create Sankey Diagram Without Coding

Final Thoughts

About Me

More Articles

Stop Breaking Production Data Pipeline: Implementing Write-Audit-Publish (WAP) with Spark and Apache Iceberg

The Data Modeling Wars: Inmon vs. Kimball vs. Data Vault

Apache Spark 4.1 is Here: The Next Chapter in Unified Analytics

About The Author

Chengzhi Zhao

What Is Sankey Diagram

Prerequisite

How To Download Your Monthly Transactions From Mint

Create Sankey Diagram In R and ggplot2​

Create Sankey Diagram Without Coding​

Final Thoughts

About Me

More Articles

Stop Breaking Production Data Pipeline: Implementing Write-Audit-Publish (WAP) with Spark and Apache Iceberg

The Data Modeling Wars: Inmon vs. Kimball vs. Data Vault

Apache Spark 4.1 is Here: The Next Chapter in Unified Analytics

About The Author

Chengzhi Zhao

Create Sankey Diagram In R and ggplot2

Create Sankey Diagram Without Coding