The science of soccer stats

Joe Klamar / AFP - Getty Images
Spain's Sergio Ramos and Xavi Hernandez, seen during a match in May, ranked highest in a study that used network analysis to rate soccer players.
Just in time for World Cup action, researchers have developed a rating system for soccer players that relies on network analysis of the passing game — but doesn't count goals at all.
"You could think maybe you're missing the most important piece of information," Luis Amaral, a chemical and biological engineering professor at Northwestern University, admitted during an interview. But it turns out that the ranking system that he and his colleagues came up with closely matched the general consensus from sports writers, coaches, managers and other experts.
The best part is that you should be able to judge for yourself by matching ratings on Amaral's website with actual World Cup results.
The rating system, detailed today in the open-access journal PLoS ONE, was put through a test run using performance data from the 2008 European Cup tournament. During high-profile events like the EuroCup, or the World Cup, the official scorers provide gobs of data about how the players are doing. "They will tell you how many shots a player took, how many were on goal, how many passes they made, who took the passes," Amaral told me.
To judge how different players stack up, soccer-watchers (including fantasy soccer leagues) use a variety of weighted formulas that include starts, goals, saves (for goalkeepers only), assists, penalty cards, shots and misses. But chance and other hard-to-quantify factors play a big role in whether the goal is actually scored, Amaral said. You don't need to look any further than the way the U.S. team got its game-tying goal during last week's World Cup match against England to see how true that is.
"You can count how many goals someone scores, but if a player scores two goals in a match, that's amazing," the professor said in a Northwestern news release. "You can really only divide two or three goals or two or three assists among, potentially, 11 players. Most of the players will have nothing to quantify their performance at the end of the match."
Amaral and his colleagues took a different approach. "What the teams are trying to do is gain possession of the ball, and once they gain possession, they try to keep possession of the ball until they get an opportunity to make a shot and score a goal," he said. So they looked at a soccer team as if it were a computer network.
The researchers set up a computer model using statistics about the flow of passes between different members of each team, as well as information about their ability to take a shot at the goal.
Oil spill
Amaral et al. / PLoS
This diagram looks at soccer players as nodes on a network during the three knockout-phase matches for Spain's team in the 2008 EuroCup tournament.
"We looked at the way in which the ball can travel and finish on a shot," Amaral said. "The more ways a team has for a ball to travel and finish on a shot, the better that team is. And the more times the ball goes through a given player to finish in a shot, the better that player performed."
The computer model was designed to give one point to everyone who was involved in a sequence of passes. Then the model was run a million times to see how the average point totals for a given "network" of players stacked up. Finally, the results were normalized so that the average player was given a rating of zero. The good players ended up with positive ratings, and the not-so-good players got negative ratings.
The team results matched the outcome of the EuroCup tournament, with Spain coming out on top. Eight of the top 20 players in the rating system also ended up on the 20-player "best of tournament" team. That's not perfect, but it's much better than what would be predicted by chance. For what it's worth, Spain's Xavi Hernandez scored the highest for an individual match performance (3.0), while his teammate Sergio Ramos turned in the best overall tournament score (2.1).
Amaral, a native of Portugal who spent long hours during his childhood debating which soccer players were the best, said the rating system could be applied to performances in different places or at different times - for example, to back up your point of view in the Pele-vs.-Maradona argument. "I don't know the answer to that one," Amaral told me, but the computer model could tell the tale if anyone was willing to go back and document the passing statistics.
"If you ask people to compare a performance today with a performance from 10 years ago, you start to romanticize performances," Amaral said. "There are always biases, but our algorithm has no biases."
The rating technique could be used in other walks of life as well: For example, businesses could use the method to evaluate the performances of individual employees working on a team project.
So how does the method stack up for the World Cup? When we spoke, Amaral and his colleagues had run the numbers only for the Argentina-Nigeria match. Argentina's Lionel Messi emerging as the top performer.
"The preliminary result that my colleague told me is a 2.5 [for Messi]. That would be in the top five when compared to the EuroCup," Amaral said. "This was a very, very good performance. What we found in the EuroCup is that many of the teams kept a steady level of performance. If the same is true for the World Cup, the first few matches could be a very strong indicator of how these teams are going to be doing."

Check in with the Amaral Lab webpage for World Cup rankings as the tournament continues. Amaral's colleagues in the study published by PLoS ONE, "Quantifying the Performance of Individual Players in a Team Activity," include Jordi Duch and Joshua Waltzman. We'll revisit the topic in a post-Cup posting.