How I Built a Tool to Visualize Expense In Sankey Diagram

Photo by Michał Turkiewicz on Unsplash
Photo by Michał Turkiewicz on Unsplash

One year ago, I wrote an article about How to Visualize Monthly Expenses in a Comprehensive Way: Develop a Sankey Diagram in R. It leverages a package called ggalluvial R for building Sankey charts. However, it isn’t interactive, and I didn’t create a UI for the tool. Now it’s time to refresh the tooling by continuing in R but making it more interactive and user-friendly. 

So I decided to rebuild this project with a better interactive view by leveraging networkD3 and Shiny. My main goal is to enable people without programming experience to use the powerful Sankey chart by simply uploading the transaction CVS file from the popular site Mint.com.

Here is what the final view looks like. You can visit it at https://chengzhizhao.shinyapps.io/PersonalFinance2Viz/; I will share how to build it in R.

Personal Finance 2 Viz Website

Getting Started with Sankey Chart

The Sankey chart is a flow diagram that can help you understand the distribution between nodes and how data flows and interact with each node. One of the famous Sankey diagrams is Napoleon‘s invasion of Russia. The diagram below clearly shows the time and number of troops left.

Here is an excellent show of the Sankey diagram on Reddit. As you can see, it has a breakdown between each bucket. You can easily understand where your money is spent from the chart below. 

Sankey Diagram - Income From Reddit

Download Your Monthly Transactions From Mint

Luckily, Mint.com support users in downloading the transactions from its website into CSV file. It is straightforward from the web application, and the mobile APP doesn’t have this option. Once you log on to Mint, go to TRANSACTIONS, scroll to the button, and you should see an option “export all xxx transactions.” A CSV file will be downloaded when you click on that option.

The downloaded CSV file has the following fields: Date, Description, Original Description, Amount, Transaction Type, Category, Account Name, Labels, and Note. We will use the Amount, Transaction Type, and Category to build the Sankey Diagram.

We can prepare data using the following R code by updating the data type of fields Amount and Date, renaming the field, and filtering only the date in the transaction since the targeted date. The targeted date will be selected from a dateInput field from Shiny’s UI. 

				
					csv_df() %>%
mutate(
  Amount = as.numeric(Amount),
  Date = as.Date(Date, "%m/%d/%Y")
) %>%
select(
  "Date",
  "Description",
  "Amount",
  "Transaction Type",
  "Category"
) %>%
rename("Transaction_Type" = "Transaction Type") %>%
filter(Date >= input$date1)
				
			

networkD3 and D3

D3 has a deep learning curve, and the learning curve of D3 gives learners a hard time comprehending. For R users who don’t know JavaScript, there are plenty of libraries to bridge the gap. networkD3 is one of those libraries that we can use its sankeyNetwork interface to create a Sankey diagram without digging deep into D3.

To properly use sankeyNetwork , a few parameters need to be passed correctly. Let’s say we have a shopping category as node1 and debit as node2, and the link between is (node1, node2). 

  • Links: a data frame object with the links between the nodes. In this case, it will be (node1, node2)
  • Nodes: a data frame containing the node id and properties of the nodes. In this case, it will be node1 and node2
  • Source: character string naming the network source variable in the Links data frame. In this case, it is node1
  • Target: character string naming the network target variable in the Links data frame. In this case, it is node 2

However, many times we have the link and node but fitting the current data frame you have into sankeyNetwork might take a lot of work. Here is something we’d need to check further.

				
					## Get the unique nodes from the link relationship
nodes = data.frame(c(df$Transaction_Type, df$Category)) %>%
  unique() %>% rename(node = 1)
## match category from the nodes and assign an id
df$source <- match(df$Category, nodes$node) - 1
## match Transaction_Type from the nodes and assign an id
df$target <- match(df$Transaction_Type, nodes$node) - 1
## build the final renderSankeyNetwork
output$plot <- renderSankeyNetwork({
        san <- sankeyNetwork(
          Links = df,
          Nodes = nodes,
          Source = "source",
          Target = "target",
          Value = "Amount",
          units = "$",
          NodeID = "node",
          fontSize = 12,
          nodeWidth = 30,
          width = "100%",
          height = "500px"
        )
				
			

More Interactive in Shiny with Sankey Diagram

I mentioned leveraging Shiny in my previous post: How to Engage with Users By Storytelling: Show Data Analytics in R and Shiny.

Second Iteration: Interactivity with User Click | Image By Author

How to Engage with Users By Storytelling: Show Data Analytics in R and Shiny

Using R and Shiny, we can build an app where the end users can interact with the data analysis we have done. I will show you how to engage with users by storytelling - show data analytics in R and Shiny.

Read More →

What’s more, we’d need to enable the click from native D3, get which link the user clicked on, then pass the value to the DT view to render it. Note I purposely return the entire category so the user can view the relevant data points.

				
					## Define helper function to select only the selected cateogry
funct <-
  function (df, source_name) {
    if (!is.null(source_name)){
      isp <- sprintf("Select * From df WHERE Category = '%s'", source_name)
    }
    else {
      isp <- sprintf("Select * From df")
    }
    isd <- sqldf::sqldf(isp)
    return(isd)
  }
## Associate with a click event
clickFun <-
  'function() {
  d3.selectAll(".link").on("click",function(d) { Shiny.onInputChange("source_name", d.source.name); });
}'
onRender(san, clickFun)
output$mint_table 
    <- DT::renderDataTable(DT::datatable(funct(df, input$source_name)))
				
			

Finally, we ask the user to input the downloaded CSV file and show the Sankey diagram. 

Here is the final view: 

Demo of Personal Finance 2 Viz website
Demo of Personal Finance 2 Viz website

Final Thoughts

I hope this article helps inspire you to build more interactive visualization and share them in the community. To try to view your Mint transaction, feel free to visit: https://chengzhizhao.shinyapps.io/PersonalFinance2Viz/

About Me

I hope my stories are helpful to you. 

For data engineering post, you can also subscribe to my new articles or becomes a referred Medium member that also gets full access to stories on Medium.

In case of questions/comments, do not hesitate to write in the comments of this story or reach me directly through Linkedin or Twitter.

More Articles

Photo by saeed karimi on Unsplash

5 Hidden Apache Spark Facts That Fewer People Talk About

I want to share 5 hidden facts about Apache Spark that I learned throughout my career. Those can be helpful to you to save you some time reading the Apache Spark source code.

Read More →
Photo by Lizzi Sassman on Unsplash

Deep Dive into Handling Apache Spark Data Skew

“Why my Spark job is running slow?” is an inevitable question. We will cover how to identify Spark data skew and how to handle data skew with different options, including key salting

Read More →
Photo by JIUNN-YIH LAU on Unsplash

5 Tips for Self-Promotion as Data Professionals

Getting the work done isn’t the journey’s end. Your work should be your channel to get YOU self-promotion. I will give five tips to get self-promotion as data professionals

Read More →

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link