Re-post: How predictive can April hitting stats be?

Editor’s Note: Derek here. Before I get to Chris’ article, I just wanted to acknowledge the lack of posts over the past couple of weeks. Both of us have been busy with work, Chris has also been busy with school, and I also took a mini-vacation away from New York. We’ll be getting back into the swing of things shortly. For today, a re-post of an article from last April which is certainly applicable for this month’s surprises and laggards.

Every year, there are a few players who come out of the woodwork to tear the cover off of the ball for the first few weeks of the season. Most end up being nothing more than a flash in the pan, like Chris Shelton who belted 10 homers back in April of 2006, but some actually maintain their April successes: Edwin Encarnacion broke out in April of 2012 and hasn’t stopped hitting homers since. Alternatively, there are also those players who get off to uncharacteristically poor starts. Adam Dunn (2011), Derek Jeter (2004), David Ortiz (2008), and Albert Pujols (2012) have all had seasons like this in the last decade. Jeter and Ortiz returned to form without missing a beat, but Dunn and Pujols have never quite been the same.

Early season stats stand out more so than any other months since, at the end of the month, “April stats” is synonymous with “season stats.” We’re less likely to notice when a player goes on a hot or cold spell in the middle of the season simply because we don’t see those numbers on the TV broadcast or that player’s Baseball-Reference page. Everyone knows that one month of data shouldn’t drastically change your outlook of a player. There’s just too much random noise in such a small amount of games. Things out of a batter’s control, like BABIP, can cause wild variations in small samples. But what about if we drill down into specific statistics? Should we take it seriously when a hitter seems to have changed something he has a decent amount of control over? Like how often he strikes out?

To find out, I compared hitters’ April K%, BB%, ISO, and BABIP to their PECOTA projections. Then I checked to see how these stats deviated from the projections for the remainder of the season. Included in the sample is every player who came to the plate at least 70 times in the month of April from 2011-2013.  I did not set any PA thresholds for the rest of the season. This resulted in a few weird outliers, but ensured there was no selection bias caused by poor April performers dropping out of the sample. So how predictive can an April performance be?

Let’s start with K%. The results of a simple regression (with the intercept set to 0) yields an equation of:

(RoS K% – PECOTA K%) =  .236 x (April K% – PECOTA K%)

So on average, a hitter can be expected to retain about 24% of the deviation from his projected strikeout rate. So if a hitter is projected to strike out 10% of the time, but strikes out 20% of the time in April, he can be expected to whiff around 12.4% of the time going forward. Here’s what the data looks like with a smoothed loess trendline.


For walk rates, the retention is a slightly lower 17%.


Unsurprisingly, a player’s April BABIP isn’t very predictive, with just 11% of the residual carrying over through the rest of the season. This data was a little too noisy for a loess curve, so I stuck with a plain old linear trendline:


A player’s power output is even less predictive. On average, hitters only retain 7% of their April deviation from their projected isolated power. When a player hits for an uncharacteristic amount of power in April, it likely has more to do with random variation than a change in talent.


Of the stats I examined, K% and BB% were the most predictive. This jives with research done by Russell Charleton of Baseball Prospectus on how quickly statistics stabilize. Russell found we can start to get some idea of a player’s true-talent K% and BB% much earlier than pretty much any other stat out there.

Are April trends from certain types of players more predictive?

Of course, some un-predicted performances are a little more believable than others. For example, you might think players with a lot of power would be more likely to maintain a high walk rate as pitchers learn not to give him anything to hit. I ran some multiple linear regressions to see what characteristics make a player more (or less) likely to carry his performance through the rest of the year. Here are the results along with some notable takeaways:



A player’s April strikeout rate is fairly predictive of his rest-of-season performance, but none of the other April stats tell us anything about a player’s future strikeouts.



For walk rates, hitters  who hit for more power than expected in April are more likely to out-walk their pre-season projection.



Nothing to see here. An unusual April BABIP is slightly predictive, but no other factors seem to have even a modicum of value when it comes to determining if a change in BABIP is for real.



Hitters who hit for a higher than expected BABIP in April are more likely to outperform their isolated power projection. The same goes for players who hit for a high BABIP.

Does age matter?

Surprisingly, there’s little evidence that a young players are any more likely than old players to sustain their April performances. Looking at the relationship between April performance and rest-of-season performance broken up by age, the slopes of the trend lines aren’t significantly different:



What about over-performers versus under-performers?

Is a player who cuts his strikeouts in half any more likely to keep up the pace than a guy who’s striking out twice as often? It doesn’t look that way. The graphs of K% and BB% — the least noisy variables — show that the trends for over-performers versus under-performers don’t appear to be all that different.


Deviations in a player’s BB%, K%, BABIP, and ISO all correlate with rest-of-season stats in the direction you’d expect, so clearly, April stats have at least some predictive value. But they don’t tell us a heck of a lot. Even for the stats that are most reliable in small samples, we can expect a player to retain only a small fraction of the deviation from his projection. This is true for pretty much all players — regardless of how old they are or whether they’re under- or over-performing. Of course there are exceptions. Some guys actually do break out and maintain their success over the course of a full season, like Chris Davis did last year. But for every Chris Davis, there are plenty of guys — like Mark Reynolds and Jean Segura from 2013 — who start the season with a bang, but quietly regress back to mediocrity once the calendar turns to May.

This article originally appeared on Beyond the Box Score.

About Chris

Chris works in economic development by day, but spends most of his nights thinking about baseball. He writes for Pinstripe Pundits, and is an occasional user of the twitter machine: @_chris_mitchell
This entry was posted in Analysis and tagged . Bookmark the permalink.

Comments are closed.