Have you seen any beautiful racing bar chart data animation on Youtube and wondered how it was built?
Data visualization by animations is fun to watch. There are various libraries to create spectacular animations. I will show you how to use gganimate in R to animate data by creating a racing bar chart as an example.
What is gganimate
gganimate is a package added to ggplot2. It amid to be the “Grammar of Animated Graphics.” As I mentioned in my other blog post, “Why ggplot2 is so good for data visualization”, ggplot2 was developed behind ideas of Grammar of Graphics. ggplot2 does an excellent job of remapping the core ideas of Grammar of Graphics, but the graphics are static. To see data changing by frames, gganimate expands the Grammar of Graphics to fulfill the animation gap in ggplot2.
gganimate adds additional building blocks on top of ggplot2, which makes it extremely easy to add animation to the existing plots. Those new building blocks are:
- transition: It’s like a script in the movie. It defines how the animation looks. For example, a lot of data animation is done over time and
transition_timecan be used in this case.
- view: It’s like a camera in the movie. It defines how the axis or zoom should look.
- shadow: defines how data multiple times should look like. For example, it can trace the movement by time and slowly fade out the nth data point.
- enter/exit: defines how data show and disappear
- ease_aes: defines how a value changes to another during tweeting (tweeting is a filmmaking technique for generating intermediate frames such that one image evolves smoothly into the next.)
Installing the gganimate is straightforward in R. You must do it by typing
install.packages("gganimate")in RStudio. However, if you run any of the examples from gganimate website, you might encounter the following error:
No renderer backend detected. gganimate will default to writing frames to separate files Consider installing: - the `gifski` package for gif output - the `av` package for video output and restarting the R session
This is because gganimate requires the engine to render a gif or video output. Otherwise, it creates numerous png per frame. The easiest solution to fix the above issue is installing gifski on Macbook
install.packages("gifski") and restarting the R session.
Create a gganimate with COVID-19 data
It is very straightforward to create a ggplot2 chart with the data. First, we read the downloaded CSV file into a data frame. Then, let’s filter on only the continents and convert the date from string to date type. Finally, we choose a scatter plot with the total_deaths and total_cases fields.
library(ggplot2) library(gganimate) library(dplyr) df = read.csv("~/Downloads/full_data.csv") df <- df %>% filter(location %in% c("Asia", "Europe", "Africa", "North America", "South America", "Oceania")) %>% mutate(date=as.Date(date, format="%Y-%m-%d")) ggplot(df, aes(total_deaths, total_cases, color=location)) + geom_point()
Add animation with gganimate plot
To see an animation from the ggplot2 above, we need to add one line of code to render the animation as GIF.
ggplot(df, aes(total_deaths, total_cases, color=location)) + geom_point() + # below is gganimation section transition_time(date)
One very cool thing about the animation from this scatter plot is you can see the rate increase between total death and entire case from time to time. You can see the sudden spike from Asia for the total cases in the middle of the animation.
To make the horizontal bar chart, we can still apply the pattern we showed in the first animation above. The code and animation look like below:
ggplot(df, aes(y=location, x=total_cases)) + geom_col() + # below is gganimation section transition_time(date)
It doesn’t look good. A couple of issues with the animation above:
- The y-axis (location) is not sorted correctly based on the total cases and doesn’t reorder the rank.
- The bar chart stretched and went back multiple times, which doesn’t make sense for the total_case.
Now let’s fix this. We need to introduce the row_number sorted by total_cases for each date to fix those two problems. In this way, we precisely know the order of the y-axis for each day.
df_rank <- df %>% group_by(date) %>% arrange(date, total_cases) %>% mutate(row_number = as.character(row_number()))
Now we can create the animation with the new data frame
transition_states as it is easier to handle the order change.
It is similar if you have created animation on PowerPoint or Keynote. The state transition divides data into multiple states based on the levels in a given column. In this case, we will use the date for each stage. It animates each date as a frame and uses fade to enter and exit for smooth transitions.
animacion <- ggplot(df_rank, aes(x=row_number, y=total_cases, fill=location)) + geom_col() + geom_text(aes(x=row_number, y=0, label = location), hjust=1.1) + coord_flip(clip = "off", expand = FALSE) + theme_minimal() + theme( panel.grid = element_blank(), legend.position = "none", axis.ticks.y = element_blank(), axis.title.y = element_blank(), axis.text.y = element_blank(), plot.margin = margin(1, 4, 1, 4, "cm") ) + # below is gganimation section transition_states(date, state_length = 0, transition_length = 2) + enter_fade() + exit_fade() + ease_aes('quadratic-in-out') animate(animacion, width = 640, height = 480, fps = 60, duration = 20, rewind = FALSE)
gganimate brings static ggplot2 to make it easy to build creative data visualization like the racing bar chart. However, while the animation is astonishing at first glance, it doesn’t show all the information. It cannot replace static plots entirely. How do you like gganimate? Are you ready to try it in your following projects?
I hope my stories are helpful to you.
The airflow schedule interval could be a challenging concept to comprehend, even for developers work on Airflow for a while find difficult to grasp. A confusing question arises every once a while on StackOverflow is “Why my DAG is not running as expected?”. This problem usually indicates a misunderstanding among the Airflow schedule interval.
Exploratory Data Analysis (EDA) is a methodology in data science as the initial approach to gain insights by visualizing and summarizing data. We will use some exploratory data analysis technics to find the reason behind the bidding war on the housing market.
When it comes to visualizing data with a histogram and dealing with multiple groups, it can be quite challenging. I have recently come across a useful ggplot2 extension called ggridges that has been helpful for my data exploratory tasks.