ERA Estimators, Pt. I: Past
I semi-recently had the honor of presenting at PitcherList’s PitchCon online conference, which raised a good chunk of money for Feeding America. My presentation, “ERA Estimators: Past, Present, and Future,” discussed, well, exactly what it sounds like. Over three posts, I will recap and elaborate upon various talking points from the presentation.
I hoped to make this content accessible to all levels of (fantasy) baseball fandom. With that in mind, the content throughout, but especially in this first post, may feel a bit remedial to the common FanGraphs/RotoGraphs reader. Nor do I claim this content to be necessarily original or expansive; the array of articles comparing and arguing the merits of the “big three” ERA estimators (FIP, xFIP, SIERA) and more is broad. You can find a wealth of information in FanGraphs’ glossary already, if not elsewhere.
However, if this does happen to be your first exposure to ERA estimators or you are familiar with them but don’t necessarily understand their innards, then I hope you find this launching-off point beneficial.
ERA Estimators, Part I: Past
So, the “big three” ERA estimators — what are they?
- Fielding independent pitching, aka FIP. Perhaps the original ERA estimator, it relies on Voros McCracken’s defense-independent pitching statistics (DIPS) theory that pitchers have little, if not zero, control over batted balls — that any variation in batting average on balls in play (BABIP) from the league average can be chalked up to luck. It is an oversimplified theory — easy to say in hindsight, now that we are endowed with such a wealth of data — but it stood upon a solid theoretical foundation.Using DIPS as its premise, FIP fundamentally posits that only strikeouts, walks, and home runs are controllable by pitchers, that those three outcomes are ownable skills, while everything else is luck.
- Expected FIP, aka xFIP. It is fairly well-established that a pitcher’s rate of home runs allowed as a percentage of his fly balls allowed (HR/FB) weakly correlates from year to year — that is, it doesn’t “stick” well. What xFIP posits, then, is that a pitcher’s fly balls allowed are more of an ownable skill than his home runs allowed. His number of fly balls allowed is then multiplied by the league-average HR/FB rate to establish an “expected” (or “deserved” or “estimated” or whatever backward-looking descriptor you want to attach to it) number of home runs allowed.Is one assumption more valid than the other? We’ll get to that.
- Skill-interactive ERA, or SIERA. Created by Matt Swartz of MLB Trade Rumors (among other bylines and accomplishments), SIERA effectively takes xFIP’s guts and scrambles them — but in a good way.(x)FIP inherently assumes that strikeouts, walks, and fly balls (or home runs) occur in isolation. An increase or decrease in any of them increases or decreases (x)FIP linearly/incrementally. SIERA, on the other hand, takes elements of (x)FIP’s three outcomes and also adds (1) nonlinear elements and (2) interactions (hence the name), capturing the interdependence the three variables.
You don’t have to understand interactions or nonlinearity to appreciate the ingenuity of SIERA’s formula. Neither strikeouts nor walks nor home runs occur in a vacuum. FIP might assume that, all else equal, a pitcher with a 30% strikeout rate and 15% walk rate is as skilled as a pitcher with a 15% strikeout rate and 5% walk rate. SIERA does not.
The age-old question inevitably steers toward: which estimator is best? The unsatisfying answer is they all have their merits. They serve different purposes, establish different theories about pitcher skills, and ultimately describe or predict pitcher performance differently.
(When we talk about describing performance, we use it in a backward-looking capacity, applying the past to the past: how did so-and-so deserve to perform based on his outcomes? Conversely, when talking about predicting performance, we use it in a forward-looking capacity, applying the past to the unseen future: how should so-and-so perform moving forward?)
Or, to paraphrase Tom Tango (aka Tango Tiger, author of The Book and Statcast architect): to what extent do we attribute outcomes to the play and to the player?
Moreover, there appears to be a direct trade-off between ascribing value to the play or to the player — in other words, a trade-off between descriptive and predictive ability. You can think of it as a spectrum:
Descriptive |——————–| Predictive
Play |——————–| Player
Using a Pearson correlation coefficient (“r,” which communicates the strength of relationship between a pair of variables: -1 is perfectly negative, +1 is perfectly positive, 0 is nonexistent), we can measure how well each estimator describes same-year ERA…
FIP: r = +0.79
xFIP: r = +0.66
SIERA: r = +0.65
(*All pitcher-seasons from 2017 through 2019, min. 120 innings, n = 335)
… and how well each estimator predicts next-year ERA (e.g., 2018 ___ to 2019 ERA):
FIP: r = +0.37
xFIP: r = +0.42
SIERA: r = +0.40
(*All back-to-back pitcher-seasons of 120+ innings each, from 2017 through 2019, n = 134. Important note: The sample size here is quite small. Other studies likely show SIERA being slightly more predictive than xFIP. It’s possible the juiced ball and its fluctuating juiciness has obscured these relationship. Ultimately, just know that xFIP and SIERA are nearly interchangeable from both descriptive and predictive standpoints.)
FIP is most descriptive and least predictive; xFIP and SIERA, least descriptive but most predictive. Filling out our spectra, they might look something like this, although not to scale:
Descriptive |—–FIP—–xFIP/SIERA—–| Predictive
Play |—–FIP—–xFIP/SIERA—–| Player
Why would we ever want to use something less predictive than something else? I’m chiefly a fantasy analyst, and while I’m typically in the business of gauging future value, I also need to acknowledge what has already happened. I might benefit from using FIP in a descriptive manner and xFIP and SIERA for forward-looking purposes as opposed to relying exclusively on one metric in all contexts.
Moreover — and I think this is really important, but it could be just me — none of the “big three” estimators are particularly good at predicting ERA. At best, we explain 18% of the variance in next-year’s ERA (xFIP); at worst, 14% (FIP). That’s not a lot! The difference is neither insubstantial nor inconsequential, but at a certain point, the magnitude of “edge” or “margin” by favoring one estimator over another is extremely thin.
Does that mean there’s little purpose in trying to predict ERA — that we’d be better off focusing on describing past ERA while sacrificing very little in the way of predictive ability? Honestly, probably not. But it doesn’t mean it will stop me from trying.
Tomorrow: ERA Estimators, Pt. II: Present, in which I review Statcast’s expected ERA (xERA) and Connor Kurcon’s predictive classified run average (pCRA), among others, and unveil a new (albeit arguably not good) metric of my own: deserved earned run average (dERA). Stay tuned.
Alex, good read. Just curious how predictive (r = ?) ERA is for the next season. It seems like it might make a decent baseline reference (at least for me). I’m also curious about ERC for same and next year.
Thanks
Sure!
ERA: r = +0.33
FIP: r = +0.37
xFIP: r = +0.42
SIERA: r = +0.40
What is ERC?
Thanks Alex. ERC is Component ERA (although it seems like it was originally called earned run composite). It was invented by Bill James and, according to Wikipedia, the formula was published in the 2004 Bill James Handbook. I’ve always found it to be very descriptive (as intended), but not as much predictive (which it wasn’t intended to be).