Sabermetrics has come a long way over the past couple of decades. It wasn’t all that long ago that only a small cult of followers knew, or even cared to know, about advanced baseball metrics. Yet today, many advanced stats are swimming in the mainstream. The proliferation of advanced metrics has certainly been good for baseball, but the speed at which these stats became popular has left the casual fan in the dark as to how to interpret them in small sample sizes. The United States Census Bureau, which is responsible for compiling and publishing data about the American economy and its people, does a pretty good job of making their data easy to understand. The Census has a couple of features that I think could be applied to baseball stats and implemented by sites like Baseball Prospectus, Fangraphs, and Baseball Reference.
Publish a Margin of Error
For each statistic the US Census releases to the public, it releases the estimated total, but also a “margin of error” which quantifies the amount of uncertainty associated with the estimate. For example, the Census estimated that there were 456,527 unemployed people in New York City in 2012 with a margin of error of +/- 10,079. This means that we can be pretty certain that there were between 446,448 and 466,606 unemployed people. This makes it pretty clear that the 456,527 number is reasonably accurate, but not ultra-precise.
As often as sabermetric evangelists talk about small sample sizes, I haven’t seen any specific error bars put on WAR or any of the defensive metrics. Rather than just hoping fans realize not to take small sample sizes too seriously, why not add a column next to the number that gives a range of possibilities? I.e. “+/- 4.” Given the amount of people with statistical chops who write about baseball, I’m sure someone can do the necessary math to make this happen.
This would give fans a better understanding of how volatile these stats can be in small sample sizes, and also prevent misuse and misunderstanding — like this infamous tweet from Jon Heyman:
i am not a hater of WAR stat, but if someone can explain to me how starling marte & bryce harper are both 1.7, please do — Jon Heyman (@JonHeymanCBS) April 29, 2013
Heyman’s tweet reflects exactly what the Baseball Reference leaderboard said, which clearly wasn’t enough information. If Starling Marte and Bryce Harper‘s WAR were listed as 1.7 with a column next to it saying “+/- 0.6”, Heyman would (hopefully) have a better sense of how we should actually be thinking about the two players; and he probably doesn’t spark an ugly twitter spat.
Compile Multiyear Samples
The American Community Survey conducted by the US Census releases 1-year, 3-year, and 5-year estimates each year for the various stats that it measures. The 1-year estimate uses only the current year’s data, but has a relatively high margin of error; the 5-year estimate is more precise, but incorporates older data; and the 3-year estimate falls in-between. It even provides guidelines on when each measure is most appropriate.
As an example: The 1-year ACS says the number of Asian children under the age of 5 living in poverty in the Bronx was 228 +/- 262. That stat’s obviously close to useless. Looking at the 5-year estimate though, the number is 409 +/- 187, which actually gives us some idea of what the real number was (roughly between 300 and 500).
Given the extreme volatility of defensive stats in small samples, I find it odd that there’s no published stat that uses multiple years of data to estimate a player’s true talent level in single year. Obviously, the older data may no longer accurately reflect the player’s true talent level; but at the very least, a three-year stat would be more scientific than looking at a player’s page mentally weighing this year’s data against data from the past few years.