March Madness: The Analytics Behind the Dance Card

Machine Learning & AI

It may be impossible to predict the perfect bracket, but these academics have managed to predict perfectly the "at large" bids that were included in the March Madness NCAA college basketball tournament this year and with 96% accuracy over the last 6 years.

Jessica Davis, Senior Editor

March 22, 2017

4 Min Read

<p>(Image: Brocreative/Shutterstock)</p>

The dance card choices are made, the brackets are filled out, and we are already several days into the annual March Madness college basketball tournament. He may not have a perfect bracket, but Jay Coleman correctly predicted the "at large" bids for this year's tournament, and has a 96% correct prediction rate over the last 6 years. How did he do it?

Coleman and Mike DuMond have collaborated on predicting the dance card since 2000.

Coleman is a professor of operations management and quantitative methods at the University of North Florida and DuMond is a vice president at Economists Inc. and an adjunct professor at Florida State University. Each year since 2000 they have used analytics (and software from SAS, the sponsor of this site) to create their dance card prediction.

Coleman creates a "dance card" that predicts the 36 college basketball teams that will be invited to "at-large" spots in the tournament. These are teams that didn't perform well enough last year (by winning their conference or conference tournament) to get automatic invitations. Instead these teams are invited to participate by the 10-member NCAA Tournament Selection Committee, which is comprised of university athletic directors (but no analytics experts, Coleman told me). The committee decides the best at-large teams for the remaining 36 tournament slots.

Coleman, DuMond, and another collaborator, correctly predicted 36 of 36 at-large bids this year, and now have correctly predicted 209 of 218 at-large bids over the last 6 years combined (96% correct). Since they began their predictions in the year 2000, they have gained quite a bit of notoriety, doing something like 120 media interviews.

The first year they did it, in 2000, Coleman and DuMond looked at the data they had from 1994 and 1999 and published their results in an academic journal. On a lark, Coleman called the local television station on the eve of the committee's "at-large" bids announcement of teams to tell them about their prediction.

"We turned out to be the lead story," Coleman said.

The original prediction was based on team performance statistics, but the model has since evolved. Now they rely on the RPI or ratings percentage index, which Coleman says is the ubiquitous stat for college basketball.
"The NCAA came up with it to help them rank and categorize teams and help advise the selection process, and it is still in use," Coleman said. "There are better and more advanced analytics out there, but by and large, our research shows (the members of the committee) don't use it."

What other secret inputs does the committee consider when choosing at-large teams? A subsequent academic paper by Coleman and DuMond found evidence suggesting bias in the selection process. The results showed that perhaps members of the committee who had ties to particular teams were more likely to include those teams in the tournament. Coleman told me the idea of bias surfaced when the actual "at-large" selections didn't exactly match up to what they should be if the committee had been following its usual formula. Since then Coleman has noticed less of a bias issue than had existed before, although "there is still evidence that there may be a lingering bias" in favor of some teams in the Pac 12 conference.

I also asked Coleman if he filled out a bracket himself, and who he has to win the tournament. Coleman told me he no longer fills out a bracket because it's just too frustrating. Even when he applies analytics to determine his picks, he's only as likely to win as the person using a dartboard, jersey color, or fuzziness of mascot to make their choices.

He has applied his dance card model to the bracket, but the bracket just really defies prediction.

"There's just enough of a random element that you will never have enough data to nail it down completely," he said. "If a predictive model predicts at about 75%, maybe 80%, that is the upper end of your accuracy."

Following his dance card model for the bracket, Coleman's finalist team, Villanova, has already been eliminated. Following my own personal system that I invented on March 15 to fill out my first-ever bracket (has a good journalism school), my final team, Northwestern, has also been eliminated.

About the Author(s)

Jessica Davis

Senior Editor

Jessica Davis is a Senior Editor at InformationWeek. She covers enterprise IT leadership, careers, artificial intelligence, data and analytics, and enterprise software. She has spent a career covering the intersection of business and technology. Follow her on twitter: @jessicadavis.

See more from Jessica Davis

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

About the Author(s)

Editor's Choice