Data visualization has always been a delightful area for me to work as a data professional. Visualizing data is like an art. Many creative ideas sprout in this space. People with diverse background people are helping more people digest boring number and lengthy summary into beautiful charts. The well-known subreddit Dataisbeautiful is one of those places.
Visualizing a streaming data isn’t something new. You can take data from a live feed source and build a streaming ETL pipeline using tools like Apache Flink, Apache Spark, or Kafka Stream to perform data manipulation. Then dump the data into Elastic Search or Reddit to consume it.
If you have played with Kibana dashboard, you probably familiar with different chart types and have created a nice dashboard with multiple charts. With data streaming flys in, the dartboard can self refresh frequently, and we can even put the dashboard on a TV and watch it all day long.
When I started a hackathon project during my years at Meetup, however, I found this visualization wasn’t interactive, and most importantly, it wasn’t fun.
So I asked myself: Can I visualize streaming data in another way? Maybe I can use the streaming data to…
Build a Game? Like Pac-Man
Let's Prepare the Streaming Data
I started this game as a hackathon project when I was a engineer at Meetup.com. The memory of building this game with streaming data feels like yesterday, but it was 6 years past. Meetup used to have a public WebSocket endpoint (ws://stream.meetup.com/2/rsvps). You could easily pull streaming data free. Now you’d need to signup the API to leverage this part. (https://www.meetup.com/api/general/#websockets).
The Meetup streaming data has information about the RVSP, event people signup and who signed etc. We will use this live streaming data to build our game.
The Traditional Visualization of Streaming Data
After getting a streaming data source, the rest is simple. We’ll need four tools:
- A data streaming framework can perform data transformation in a streaming fashion.
- A data storage support an ingesting large volume of records
- A light weighted data storage to save data for building a game
- A data visualization tool can build charts on the data source.
At the time I built the project in 2018, Apache Storm was the most popular data streaming tool, but the main concept of data streaming framework didn’t change. Elastic Search and Kibana is the apparent data source and visualization tool. Redis is the data storage we can use to build that game.
The process is first to pull data from Meetup, parse the JSON blob, lookup country code, then persist data into Elastic Search and Redis.
In Strom, the Topology will look something like following:
More detailed code can refer to my project in GitHub. Now we can develop the Kibana dashboard to render the data in Elastic Search as we shared earlier.
Build the Pac-Man Game with Streaming Data
Pac-Man is classic a video game. It has been best-selling game for 40 years.
The objective of the game is to eat all of the dots placed in the maze while avoiding four colored ghosts — Wikipedia
How should I leverage the streaming data with a Pac-Man game? In the meetup data, I have the event RSVP location (latitude & longitude) information. This can help me draw dots on the world map. Streaming data can feed the location and appearance of dots. Once the Pac-Man eat a dot, I will display the RVSP event information that visualizes the event that has been RVSPed.
Here is the following I did:
- Find a world map that fits the Pac-Man theme.
- Clean up the Pac-Man game by removing the maze with only Pac-Man, and colored ghosts left.
- Read data from Redis from an HTML game. I use webdis, which is A very simple web server providing an HTTP interface to Redis. Then display the dots on the world map when data comes in from Apache Storm.
- Build the banner to display the RSVPed event name once Pac-Man eat a dot.
Here is the final game of how it looks. To fits Meetup’s color, I change Pac-Man from yellow to red as well.
Here is the Github to the game
As a data professional, I like to explore combining data with new domains. I don’t believe many people have done this before mixing a game and streaming data.
I have to say, playing a game isn’t the most effienct way for data visualization. However, adding the game element into data visualization conveys more interactive experience for the users. It could have some potential, especially a game that is using a live event to play with, which adds more randomness.
Building this Pac-Man game is a fun experience. Although this is a hackathon project I have done years back, I still found it’s a unique experience that is worth writing it down and sharing with my readers.
Please let me know if you find this post helpful or just read it for fun. I appreciate your time.
I hope my stories are helpful to you.
Let’s bring the data community’s attention to the essential- Building Better Data Warehouses with Dimensional Modeling: A Guide for Data Engineers.
Slowly Changing Dimension (SCD) is critical to dimensional modeling. We will discuss the eight types of SCDs. By the end, you will clearly understand each type and be able to differentiate between SCDs in dimensional modeling.
I will share my journey on using R for Data Analysis: building an end-to-end solution for exploring trending Cocomelon videos using R from scratch.