Lessons from Figure Skating on Quantifying Quality

By: Alex Edquist

(Photo credit: flickr, KOREA.NET)
Winners of the women’s figure skating competition from left to right: Yuna Kim, Adelina Sotnikova, and Carolina Kostner (Photo credit: flickr, KOREA.NET)

Another round of controversy over figure skating judging erupted these past Olympics when Russia’s Adelina Sotnikova upset heavily-favored reigning champion Yuna Kim of South Korea in the women’s free skate. “I just couldn’t see how Yuna and Sotnikova were so close in the components,” said Kurt Browning, a four-time figure skating world champion and prominent commentator on the sport, after the women’s figure skating final. “I was shocked. What, suddenly, [Sotnikova] just became a better skater overnight? I don’t know what happened. I’m still trying to figure it out.”

Sotnikova’s win was even more shocking because she beat her Russian teammate Julia Lipnitskaia, who was heralded in the media as Russia’s next great figure skating hope, and had been chosen over Sotnikova for the team figure skating events.

The “components” that Kurt Browning spoke of are skating skills, transitions, performance, choreography, and interpretation, and these five components are added to the score for technical elements. The scoring method was changed to make it more objective in 2004 in response to the figure skating scandal from the 2002 Olympics, during which a French judge agreed to score a Russian figure skating pair higher than she should have in return for giving a French pair an advantage in ice dancing. Unfortunately for Yuna, the new guidelines’ technical focus meant that her intangible grace and artistry were not enough to beat Sotnikova, whose performance was less refined but more difficult to execute.

As Browning said, “Yuna Kim outskated [Sotnikova], but it’s not just a skating competition anymore – it’s math.” Scott Hamilton, the 1984 Olympic figure skating champion, said, “I looked at the way the component score (rules) are written, and Adelina checks off every box. It’s not as aesthetically pleasing as Yuna or Carolina [Kostner], but she does everything the judges are looking for.”

The Sochi figure skating controversy highlights the time-old difficulty of measuring intangible qualities. Standards of measurements are useful and frequently necessary, and they help outline a situation, record progress, and – in this case – decide which athlete goes home with the gold medal. And when it’s some quantity that must be measured, measurements work very well. For example, measuring the time it takes a slalom skier to make it through the course or a speed skater to make it around the track is an excellent way to determine the winner. There might still be controversies surrounding the results of these competitions (the American speed skaters said their suits were a drag on their performance), but rarely about the mechanism for obtaining the measurement or the standard the measurement is based on.

But how does someone measure something like artistry? Is it possible to do so objectively and with standards? These are questions that have plagued judged sports, like figure skating and gymnastics, for years. Committees have aimed to make the scores more standardized and structured, leaving less room for a judge’s potential bias. But then they are faced with the problem – as they were in Sochi – of athletes “checking the boxes” and focusing on scoring the most points rather than attempting the most beautiful performance, which is a break from these sports’ artistic pasts.

This isn’t a problem that is limited to sports. For example, universities have made a big push for diversity on their campuses in recent decades. Diversity is truly a noble goal, and no one can doubt the importance of students interacting with peers who have vastly different experiences than their own. However, diversity is as hard – if not harder — to measure than artistry. Racial percentages are a common measurement, but using racial percentages as targets for diversity goals is problematic, especially since the Supreme Court has ruled that admittance quotas based on race are unconstitutional. Diversity definitely includes racial diversity, but it also encompasses diversity of ideas, backgrounds, and experiences, which cannot be truly captured in simple measurements of race and state or country of origin.

The No Child Left Behind policy was a nationwide attempt to increase accountability and objectivity in primary and secondary student success. However, many critics faulted the policy’s emphasis on standardized tests as the defining measure of success. In addition, the policy’s formulas for rewarding schools and teachers helped lead to many cheating scandals, including those in Atlanta. In many ways, No Child Left Behind is a lot like figure skating.  Both try to measure intangible concepts – a skater’s artistic skill, a child’s educational development – in a way that encouraged teaching and skating “to the test.” The Obama administration’s response to No Child Left Behind, Race to the Top, encourages teaching less standardized test material and more critical thinking skills. How is the program’s success evaluated? More standardized tests.

Policymakers across many fields are often stuck between a rock and a hard place when dealing with this issue. Either they try to measure intangibles and unintentionally encourage behavior that conforms to the measures rather than behavior that improves on the intangible, or they rely less on measurements and risk biased results or other accountability failures. Figure skating illustrates this dilemma beautifully. In the Vancouver Winter Olympics, the figure skating controversy was that the judges were biased. The rules were modified, and two Olympics later, the newest figure skating controversy was that the scoring was too mathematic.

Leaning too heavily on either objective or subjective measures of quality both spell trouble. Overly subjective measures open themselves to bias; overly objective measures to cheating. A middle road is, perhaps, best. The difficulties of quantifying quality can be mitigated by integrating qualitative evaluations with quantitative measures.  For example, college admissions offices ask for essays in addition to applicant data, and figure skating judges still have some room for their opinion in the scores. For evidence that these systems are not perfect, look no further than Fisher vs. Austin or the Sochi Olympics’ women’s figure skating results. However, objective measures help keep scorers from rigging the game, and subjective measures help keep competitors from doing the same.

There is no easy answer to the problem of quantifying quality. Imperfect measurements are still usually better than no measurements at all, especially in today’s big-data-driven world. But these measurements will likely be rife with unintended consequences. For those attempting to quantify quality, a willingness to change the measurements in response to those consequences as they arise will go a long way to making those imperfect measurements a little less imperfect.