ERA Estimators, Pt. II: Present

July 9, 2020

I semi-recently had the honor of presenting at PitcherList’s PitchCon online conference to help raise money for Feeding America. My presentation, “ERA Estimators: Past, Present, and Future,” discussed, well, exactly what it sounds like. Over three posts, I will recap and elaborate upon various talking points from the presentation.

If the previous post was an elementary look at the “big three” estimators (FIP, xFIP, and SIERA), I hope this one is a little more illuminating.

ERA Estimators, Part II: Present

I will waste no time introducing what I call Deserved ERA, or dERA. I want to be clear: this is much more impressive than it sounds. In fact, it is not exactly a novel concept. It is simply the act of translating weighted on-base average (wOBA), which attributes run values to events like singles, home runs, hits by pitch, etc., and converting them to an ERA scale.

In ERA Estimators, Part I: Past, I discussed how, when it comes to estimators, there has generally been a trade-off between descriptive (backward-looking) ability and predictive (forward-looking) ability. I also reframed the issue using language from Tom Tango, characterizing it as a trade-off between ascribing value to the play or the player.

I plotted the three “big three” estimators according to these spectra:

Descriptive |—–FIP—–xFIP/SIERA—–| Predictive
Play |—–FIP—–xFIP/SIERA—–| Player

It oversimplifies the relationships, at least in terms of each estimator’s proximity to absolute description or prediction. FIP describes past performance dramatically better than xFIP or SIERA. However, while being the weakest predictor of performance, it is not significantly so — in fact, FIP hardly pales in comparison, such that it makes me wonder if FIP is not actually the most powerful metric of the bunch. This is something I never would have admitted considering a year ago.

(I remain reluctant to admit this, simply because hanging my hat on a metric that is entirely outcomes-based violates every fiber of my being. But, again, the thin margin by which FIP is “less predictive” than xFIP and SIERA has at least partly brought it back into my good graces.)

Deserved ERA (dERA)

(Apologies to Baseball Prospectus for this nomenclature being too similar to Deserved Run Average, or DRA.)

Enter dERA. What if we throw caution to the wind, abandon altogether the pursuit or predictive ability, and lean into the pursuit of descriptive ability? Well, we threaten to get a little silly.

As noted, all dERA is is a one-to-one conversion of wOBA to ERA. For demonstrative purposes, I split apart strikeouts and the combination of walks and hits by pitch from wOBA on contact (wOBAcon). However, because strikeouts, walks, and HBPs are assigned distinct coefficients (weights) in wOBA, one need not actually do this and can simply use wOBA. I will provide equations for both, estimated from a sample of 335 pitcher-seasons of 120+ innings from 2017 through 2019:

dERA = – 4.95 + 0.57 * K% + 19.73 * (BB% + HBP%) + 28.61 * wOBAcon%

…where HBP% can also be denoted as HBP/PA (HBP per plate appearance). “wOBAcon%” is a pitcher’s wOBAcon allowed multiplied by the fraction of plate appearances that end in a batted ball event (BBE). For example, if a pitcher had a 30% strikeout rate, 5% walk rate, 1% HBP rate, and a .350 wOBAcon, his wOBAcon% = (1 — 0.3 — 0.05 — 0.01) * .350 = 0.64 * .360 = .224.

Or, alternatively, this is much easier:

dERA = – 4.56 + 27.74 * wOBA

No fancy transformations needed — simply grab a pitcher’s total wOBA allowed from Statcast and plug in.

(In fact, I would argue it might be more theoretically sound to use the second equation. The first equation is not necessarily wrong, but the coefficient for strikeouts should not be positive, in theory, even if the coefficient is negligibly small. Strikeouts prevent baserunners and, thus, runs. Ultimately, leaving strikeouts out of the equation — because they assume a value of zero in wOBA — would have been more appropriate. Just so you know!)

Regardless of approach, dERA’s r², which measures the goodness of fit of the model and describes how much of ERA’s variance we can explain, is 0.83 — substantially higher than that of FIP (r² = 0.63). However, as I expected, dERA’s relationship with ERA (among 134 pairs of consecutive pitcher-seasons of 120+ innings) is just r² = 0.09 — so low that literally ERA on its own is more predictive of itself (r² = 0.11).

Descriptive |ERA—–dERA—–FIP—–xFIP/SIERA—–| Predictive
Play |—–dERA—–ERA—–FIP—–xFIP/SIERA—–| Player

ERA is more predictive than dERA! And ERA is ERA, so its same-year (descriptive) r² is technically 1.0! What purpose could dERA possibly serve?

Well, let’s not forget that ERA, to some extent, is a product of luck. dERA’s purpose is to determine how a pitcher should have performed based on what happened despite the pitcher’s actual skill level. And dERA, which is our best estimation of what should have happened, says a qualified pitcher’s ERA in a given season is still plagued by a decent amount of noise/luck/whatever you want to call it.

Is this helpful? I don’t know. But I like to use dERA over ERA for pitchers’ specific pitches as shorthand for their effectiveness.

Perhaps the logical extension of this idea is substituting Statcast’s expected wOBAcon (xwOBAcon) for wOBAcon in dERA’s guts. It brings us very conveniently to our next metric:

Expected ERA (xERA)

Here’s a description of xERA straight from the horse’s mouth, with pertinent passages underlined:

Expected ERA, or xERA, is a simple 1:1 translation of Expected Weighted On-Base Average (xwOBA), converted to the ERA scale. xwOBA takes into account the amount of contact (strikeouts, walks, hit by pitch) and the quality of that contact (exit velocity and launch angle), in an attempt to credit the pitcher or hitter for the moment of contact, not for what might happen to that contact thanks to other factors like ballpark, weather, or defense.

By converting this to the ERA scale, it puts xwOBA in numbers that are more familiar, and allows it to be compared directly to the pitcher’s actual ERA. (If you’re familiar with FIP, or Fielding Independent Pitching, the idea is similar, just that now Statcast quality of contact can be included.)

xERA is not necessarily predictive, but if a pitcher has an xERA that is significantly higher than his actual ERA, it should make you want to take a closer look into how he suppressed those runs.

Fortunately, Dan Richards of PitcherList investigated the value of xERA relative to the “big three.” And, like Statcast noted, it is effectively as descriptive and predictive as FIP.

Descriptive |ERA—–dERA—–FIP/xERA—–xFIP/SIERA—–| Predictive
Play |—–dERA—–ERA—–FIP/xERA—–xFIP/SIERA—–| Player

Predictive Classified Run Average (pCRA)

Six-Man Rotation’s Connor Kurcon has a sharp intuition when it comes to this stuff. I will say, first and foremost, that if you don’t follow him, you should. I guarantee you’ll learn something.

In his unveiling of pCRA, which built off of Descriptive Classified Run Average (dCRA, originally CRA), Kurcon achieved something astoundingly clever: he took (x)FIP and added a hard-hit component. All that’s different between pCRA and xFIP is the inclusion of Statcast-classified barrels in lieu of fly balls (or, for FIP, home runs).

Barrels describe batted ball events (BBE) with high expected wOBA on contact, exceeding a certain exit velocity and falling between a certain range of launch angles. The relationship between exit velocity and launch angle is dynamic — the harder you hit the ball, the larger your margin for error regarding launch angle — and, evidently, describes and predicts ERA better than any existing estimator.

Unfortunately, pCRA, like xERA, resides within a black box to which only Kurcon is privy. But you can visit the pCRA leaderboard here while I update the descriptive-to-predictive spectrum.

Descriptive |ERA—–dERA—–FIP/xERA—–xFIP/SIERA—–pCRA—–| Predictive
Play |—–dERA—–ERA—–FIP/xERA—–xFIP/SIERA—–pCRA—–| Player

What this means, ultimately, is preventing barrels is legitimate skill that pitchers can own, and that’s important!

Deserved Run Average (DRA)

Back in 2015, Baseball Prospectus introduced DRA. It’s a hefty endeavor. It adjusts every event for context, including park factors, opposing hitter quality, catcher and umpire effects, framing and strike zone effects, run differentials, base-run environment — you name it, DRA accounts for it.

DRA’s most important distinction is it estimates a pitcher’s runs allowed per nine innings (RA/9). While ERA is at the mercy of the scorer, runs allowed ignores the context of earned and unearned runs. I waded through BP’s back catalog, and while I could find evidence of DRA being superior to FIP, it’s unclear to me where it stands on the descriptive-to-predictive spectrum, especially because it operates on a different scale than traditional estimators. But, given BP is in the business of working for teams behind the scene, I would pay it heed.

Forecasted Run Average (FRA)

Hardly a month ago, PitcherList’s Richards debuted Forecasted Run Average (FRA), which out-performed FIP, xERA, xFIP, and SIERA in terms of predicting next-year ERA. It relies on strikeout-minus-walk rate (K-BB%), average exit velocity (EV), and average launch angle (LA). Pretty simple! It’s up there with pCRA, both in terms of predictiveness and year-to-year stickiness. As far as ERA predictors go, pCRA and FRA lead the pack (and I imagine DRA, as purely a run estimator, is up there, too, but it’s an apples-to-oranges comparison.)

Descriptive |ERA—–dERA—–FIP/xERA—–xFIP/SIERA—–pCRA/FRA—–| Predictive
Play |—–dERA—–ERA—–FIP/xERA—–xFIP/SIERA—–pCRA/FRA—–| Player

Tomorrow: ERA Estimators, Pt. III: Future, in which I communicate my vision for the future of ERA estimators (pitch-based!) — and, not coincidentally, review fresh, new, exciting work from Prospects365’s Ethan Moore that basically achieves my vision. I’m jazzed.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG