FanGraphs Tags a Cloud Database to Keep Up with the Big Show - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

08:00 AM
Connect Directly

FanGraphs Tags a Cloud Database to Keep Up with the Big Show

Website for baseball analysis turned to MariaDB SkySQL as it looks to take on more game data from domestic and international sources.

Baseball data analysis website FanGraphs adopted the MariaDB SkySQL cloud database recently to work with fluctuating and ever-growing information coming out of the sport. FanGraphs, which gathers granular data including the velocity of pitches thrown during games, is using the cloud database to process statistics, complex queries, projections, and models of playoff odds.

“Anything that’s baseball, we’re taking a look at,” says David Appelman, CEO and founder of FanGraphs.

Now that the 2021 season of Major League Baseball is underway, he says there is new Statcast data introduced by the league that must be accommodated. “The data can be pretty wide,” Appelman says. “There’s a lot of records for each individual event that happens in baseball. On a season-level, there’s something in the realm of a million records a season for data for every individual pitch thrown.”

Image: Alex -
Image: Alex -

There is also data from minor league teams as well as baseball leagues overseas to be ingested by FanGraphs, he says. “It’s a fairly sizeable amount of data.” FanGraphs tends to run thousands of queries per second on its database to serve its audience, Appelman says. Adding more international data is a priority for FanGraphs, he says, along with more Statcast data from MLB.

Founded in 2005, Appelman says he personally managed the FanGraphs database until 2019. Over the years his company has tried to work with different resources to improve its efficiency with varied results. FanGraphs first migrated to MariaDB about seven years ago, Appelman says, then considered exploring a migration to Linux, but that brought up several potential headaches. “I didn’t want to deal with migration,” he says. “Optimizing the database for Windows is one thing. Optimizing it on a Linux box is a completely different thing.”

Appelman says he did not have time to devote to sort that out while other operations required attention. FanGraphs considered other options, such as moving the database to a turnkey solution. “I looked at Amazon Relational Database Service and Cloud SQL,” he says.

About the time FanGraphs was looking to move and offload all its database administration, Appelman got a tech briefing for MariaDB SkySQL that opened up new possibilities. “It was fast. It seemed it would handle all my needs,” he says.

FanGraphs entered a contract with MariaDB to migrate first to Linux, and then in February of this year migrated to SkySQL. This also led to FanGraphs moving from dedicated servers to the Google Cloud Platform. “We just needed more flexibility,” Appelman says. The infrastructure migration to GCP included app servers and data loading servers.

This was not FanGraphs first attempt at taking advantage of the cloud. In 2017, the company tried to migrate to a smaller cloud provider, Appelman says, trying to match exact resources such as RAM and processing power. “We ran into big problems,” he says. “The next morning, I had to migrate back. What I didn’t quite realize was that with the service I moved to, the hypervisor was causing really bad I/O. The database became this huge bottleneck.”

Appelman says he was also reluctant to move his infrastructure to AWS because of the learning curve he faced with its resources. He needed another option. “GCP fit a nice middle ground,” Appelman says. “I found it a little bit easier to set up than AWS.”

There were still performance questions raised with the move. The migration of FanGraphs from a 4xSSD RAID 10 array in a dedicated machine to the cloud, Appelman says, seemed at first to be a downgrade in raw power. “That doesn’t seem to be the case anymore,” he says. “Things are running great. We had no problems migrating to SkySQL and GCP this time.”

FanGraphs is now considering additional SkySQL resources it might tap into, Appelman says, such as its data warehousing technology. “We need second or low-second or sub-second responses for a lot of our queries,” he says. “We want people to be able to do very fast, ad hoc data analysis. With certain types of MLB data, there’s now a lot more than it used to be -- we’re hoping to take advantage of that to bring our users a lot more granular and customizable analysis without having to wait a while to get the results.” Other resources from SkySQL might be leveraged in the future to run multithreaded, single queries for more efficient processing time, Appelman says.

There are a few wish-list items he wants to explore now that FanGraphs has committed to the cloud. Appelman says he has yet to scratch the surface with GCP’s resources that might be of interest, such as machine learning. So far, he is eager to see continued development of reporting tools on the SkySQL database. “Knowing exactly where the bottlenecks are in our application makes a big difference for me,” Appelman says. “I’ve used some third-party tools to figure out which queries I’ve botched. Having that available in the reporting section would be useful.”

Related Content:

IBM Puts Red Hat OpenShift to Work on Sports Data at US Open

Enterprises Put More Data Infrastructure in the Cloud

Database Deployments Moving to the Cloud

Topspin and Terabytes: IBM Ups Its Cloud Game at the Masters


Joao-Pierre S. Ruth has spent his career immersed in business and technology journalism first covering local industries in New Jersey, later as the New York editor for Xconomy delving into the city's tech startup community, and then as a freelancer for such outlets as ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Why IT Leaders Should Make Cloud Training a Top Priority
John Edwards, Technology Journalist & Author,  4/14/2021
10 Things Your Artificial Intelligence Initiative Needs to Succeed
Lisa Morgan, Freelance Writer,  4/20/2021
Lessons I've Learned From My Career in Technology
Guest Commentary, Guest Commentary,  5/4/2021
White Papers
Register for InformationWeek Newsletters
2021 State of ITOps and SecOps Report
2021 State of ITOps and SecOps Report
This new report from InformationWeek explores what we've learned over the past year, critical trends around ITOps and SecOps, and where leaders are focusing their time and efforts to support a growing digital economy. Download it today!
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll