How I Built a Tool to Visualize Expense In Sankey Diagram

One year ago, I wrote an article about How to Visualize Monthly Expenses in a Comprehensive Way: Develop a Sankey Diagram in R. It leverages a package called ggalluvial R for building Sankey charts. However, it isn’t interactive, and I didn’t create a UI for the tool. Now it’s time to refresh the tooling by continuing in R but making it more interactive and user-friendly.

So I decided to rebuild this project with a better interactive view by leveraging networkD3 and Shiny. My main goal is to enable people without programming experience to use the powerful Sankey chart by simply uploading the transaction CVS file from the popular site Mint.com.

Here is what the final view looks like. You can visit it at https://chengzhizhao.shinyapps.io/PersonalFinance2Viz/; I will share how to build it in R.

Getting Started with Sankey Chart

The Sankey chart is a flow diagram that can help you understand the distribution between nodes and how data flows and interact with each node. One of the famous Sankey diagrams is Napoleon‘s invasion of Russia. The diagram below clearly shows the time and number of troops left.

Here is an excellent show of the Sankey diagram on Reddit. As you can see, it has a breakdown between each bucket. You can easily understand where your money is spent from the chart below.

Download Your Monthly Transactions From Mint

Luckily, Mint.com support users in downloading the transactions from its website into CSV file. It is straightforward from the web application, and the mobile APP doesn’t have this option. Once you log on to Mint, go to TRANSACTIONS, scroll to the button, and you should see an option “export all xxx transactions.” A CSV file will be downloaded when you click on that option.

The downloaded CSV file has the following fields: Date, Description, Original Description, Amount, Transaction Type, Category, Account Name, Labels, and Note. We will use the Amount, Transaction Type, and Category to build the Sankey Diagram.

We can prepare data using the following R code by updating the data type of fields Amount and Date, renaming the field, and filtering only the date in the transaction since the targeted date. The targeted date will be selected from a dateInput field from Shiny’s UI.

				
					csv_df() %>%
mutate(
  Amount = as.numeric(Amount),
  Date = as.Date(Date, "%m/%d/%Y")
) %>%
select(
  "Date",
  "Description",
  "Amount",
  "Transaction Type",
  "Category"
) %>%
rename("Transaction_Type" = "Transaction Type") %>%
filter(Date >= input$date1)

networkD3 and D3

D3 has a deep learning curve, and the learning curve of D3 gives learners a hard time comprehending. For R users who don’t know JavaScript, there are plenty of libraries to bridge the gap. networkD3 is one of those libraries that we can use its sankeyNetwork interface to create a Sankey diagram without digging deep into D3.

To properly use sankeyNetwork , a few parameters need to be passed correctly. Let’s say we have a shopping category as node1 and debit as node2, and the link between is (node1, node2).

Links: a data frame object with the links between the nodes. In this case, it will be (node1, node2)
Nodes: a data frame containing the node id and properties of the nodes. In this case, it will be node1 and node2
Source: character string naming the network source variable in the Links data frame. In this case, it is node1
Target: character string naming the network target variable in the Links data frame. In this case, it is node 2

However, many times we have the link and node but fitting the current data frame you have into sankeyNetwork might take a lot of work. Here is something we’d need to check further.

				
					## Get the unique nodes from the link relationship
nodes = data.frame(c(df$Transaction_Type, df$Category)) %>%
  unique() %>% rename(node = 1)

## match category from the nodes and assign an id
df$source <- match(df$Category, nodes$node) - 1

## match Transaction_Type from the nodes and assign an id
df$target <- match(df$Transaction_Type, nodes$node) - 1

## build the final renderSankeyNetwork
output$plot <- renderSankeyNetwork({
        san <- sankeyNetwork(
          Links = df,
          Nodes = nodes,
          Source = "source",
          Target = "target",
          Value = "Amount",
          units = "$",
          NodeID = "node",
          fontSize = 12,
          nodeWidth = 30,
          width = "100%",
          height = "500px"
        )

More Interactive in Shiny with Sankey Diagram

I mentioned leveraging Shiny in my previous post: How to Engage with Users By Storytelling: Show Data Analytics in R and Shiny.

Second Iteration: Interactivity with User Click | Image By Author

How to Engage with Users By Storytelling: Show Data Analytics in R and Shiny

March 11, 2023 No Comments

Using R and Shiny, we can build an app where the end users can interact with the data analysis we have done. I will show ...

What’s more, we’d need to enable the click from native D3, get which link the user clicked on, then pass the value to the DT view to render it. Note I purposely return the entire category so the user can view the relevant data points.

				
					## Define helper function to select only the selected cateogry
funct <-
  function (df, source_name) {
    if (!is.null(source_name)){
      isp <- sprintf("Select * From df WHERE Category = '%s'", source_name)
    }
    else {
      isp <- sprintf("Select * From df")
    }
    isd <- sqldf::sqldf(isp)
    return(isd)
  }

## Associate with a click event
clickFun <-
  'function() {
  d3.selectAll(".link").on("click",function(d) { Shiny.onInputChange("source_name", d.source.name); });
}'

onRender(san, clickFun)

output$mint_table 
    <- DT::renderDataTable(DT::datatable(funct(df, input$source_name)))

Finally, we ask the user to input the downloaded CSV file and show the Sankey diagram.

Here is the final view:

Final Thoughts

I hope this article helps inspire you to build more interactive visualization and share them in the community. To try to view your Mint transaction, feel free to visit: https://chengzhizhao.shinyapps.io/PersonalFinance2Viz/

About Me

I hope my stories are helpful to you.

For data engineering post, you can also subscribe to my new articles or becomes a referred Medium member that also gets full access to stories on Medium.

In case of questions/comments, do not hesitate to write in the comments of this story or reach me directly through Linkedin or Twitter.

Jan27

How I Built a Tool to Visualize Expense In Sankey Diagram

Getting Started with Sankey Chart

Download Your Monthly Transactions From Mint

networkD3 and D3

More Interactive in Shiny with Sankey Diagram

How to Engage with Users By Storytelling: Show Data Analytics in R and Shiny

Final Thoughts

About Me

More Articles

The Data Modeling Wars: Inmon vs. Kimball vs. Data Vault

Airflow Schedule Interval 101

5 Fantastic Data Pipeline Orchestration Tools For R

About The Author

Chengzhi Zhao