Problem with Default Density Plot in ggplots
When it comes to visualizing data with a histogram and dealing with multiple groups, it can be quite challenging. I have recently come across a useful ggplot2 extension called ggridges that has been helpful for my data exploratory tasks.
It can be difficult to display multiple groups in a plot because they may not display nicely. This can lead to confusion when trying to stack histograms as a bar chart. In the following example with the classic iris dataset, since the species — setosa isn’t started with the same horizontal line, it makes comparison extremely difficult, and wrong conclusions could be quickly drawn.
We can use the density plot to mitigate this problem, avoiding overlapping each group and making the histogram smoother.
Unfortunately, it is still unable to solve the overlapping issue in the chart above that involves multiple groups. Identifying the overlapping areas and finding the exact distribution remains challenging.
You might argue: we could Adjust
fill parameter to make overlapping areas more apparent. But it takes work to adjust color and transparency level.
In this case, we’d need “partially overlapping” This makes the perfect case for avoiding crowded charts but keeping charts in comparable distance.
ggridges are Ridgeline plots that are partially overlapping line plots that create the impression of a mountain range. They can be quite useful for visualizing changes in distributions over time or space. — ggridges wiki
A super helpful case is to apply a density plot for each group
geom_density_ridges and visualize its distribution and compare.
Let’s update our example to the following.
## Code with default geom_density ggplot(data=iris, aes(x=Sepal.Length, fill=Species)) + geom_density()+ theme_minimal() ## Code with geom_density_ridges ggplot(data=iris, aes(x=Sepal.Length, y=Species)) + geom_density_ridges()+ theme_ridges()
Now, we have each group (species) that used to be identified as color in the y-axis. The method highlights the distribution of each group individually, making it easy to compare them.
If you still don’t like the overlapping and want to make each group nonoverlapping at all, you can change the scale setting to a value less than 1
ggplot(data=iris, aes(x=Sepal.Length, y=Species)) + geom_density_ridges(scale = 0.9)+ theme_ridges()
Once the distribution is learned, we can apply some quantile metrics to understand the distribution for each group easily. We can apply p50 and p95 values to help us gain more insights into the distribution.
ggplot(data=iris, aes(x=Sepal.Length, y=Species)) + geom_density_ridges(quantile_lines = TRUE, quantiles = c(0.5, 0.95), alpha = 0.7)+ theme_ridges()
I recently discovered this helpful ggplot2 extension called ggridges that I find useful for data exploratory tasks. ggridges is an excellent solution for multiple groups, and you’d need to learn their distribution and show insights within one chart. I hope this blog provides insights into another extension within ggplot2 ecosystem to help you as data professionals.