Toward an Effective Velocity ERA Estimator
The research I’ve done on Effective Velocity to date has led to some interesting examples that in many cases suggest the strategy has an impact on pitcher success. But to be able to really take advantage of that implication, I felt I needed to create an ERA estimator that would apply to all pitchers. If that metric were more predictive of future performance than our current best ERA estimators like FIP, then fantasy owners could systematically use it to make better decisions based on pitchers’ adherence to the philosophy of Effective Velocity.
As you might have guessed, it’s pretty difficult to improve on the estimators we have. Still, despite not actually getting anywhere useful with my efforts so far, I thought it would be interesting to write about what I’ve tried and what I’ve learned, and maybe a reader will share an idea in the comments that will pull me in the right direction.
I had read and written the formula for FIP enough times to know it by heart. From the FanGraphs Glossary:
To someone who knows DIPS theory, the inputs are intuitive. Broadly speaking, pitchers do not have much control over whether the balls put in play against them become hits. Sometimes balls are hit right at defenders. Sometimes defenders make incredible defensive plays. Weird stuff can happen. But over the course of a full season, most pitchers see their batting averages on balls in play converge near the MLB average of about .300. And because pitchers cannot control their BABIP, FIP chooses to ignore it, instead evaluating pitcher performance based on the three outcomes pitchers can control: strikeouts, walks, and home runs. Since FIP has proven to better predict future ERA than ERA itself, FIP is believed to capture actual pitcher skill better than ERA does.
I understood the choice of inputs, and I also vaguely understood that their coefficients related to their relative importance in terms of predicting ERA, but I didn’t know how those coefficients had been selected. It turns out, it’s pretty simple, too. I calculated strikeouts per batter faced, walks per batter faced, and home runs per batter faced for every qualified starter between 2007 and 2016. Then, I ran a multivariate linear regression with those three independent variables and same-season ERA as the dependent variable—it’s same-season because FIP was meant to explain pitcher performance; predicting performance is a happy side effect. That regression produced coefficients for strikeout rate, walk rate, and home run rate that scaled to -2, 3, and 12, respectively. Voila, that’s FIP—the home run coefficient is a bit lower than in the actual formula, but I also ran my regression on a different set of pitchers and on different seasons than were used to create FIP originally.
Now that I understood where FIP came from, it was an easy enough process to try to create similar formulas that also included Effective Velocity. After all, EV is unaffected by defense and luck, and so if it influences the success pitchers have, it should be a part of FIP, too.
The problem (I assume) is that EV is difficult to distill in a single number. In a previous article, I created a stat I called EV Adherence Rate (EVAdh%) that shows the percentage of pitch pairs that pitchers threw that were at least 4 mph different from each other in EV. My subsequent research showed that the pitchers in the highest quintile of EVAdh% consistently outperformed their FIPs with their ERAs. Well, one quintile of pitchers on one possible outcome—a batted ball—of plate appearances does not make a significant variable, it seems. My first multivariate linear regression added EVAdh%, but it produced a coefficient near 0 and a p-value of 0.90. For my next multivariate linear regression, I used a bit variable that was 1 if a pitcher was above the 80th percentile in EV and a 0 otherwise. Again, it produced a coefficient near 0 with a not-nearly-enough-improved p-value of 0.82. The EV stats that I readily had at my disposal were not proving to be significant additions to strikeouts, walks, and home runs for predicting pitcher same-season ERA.
My best guess for an explanation is that EVAdh% is just too blunt an instrument to capture the subtleties of the strategy. I may need to move away from full-season summary statistics and instead try to estimate individual pitch values that consider EV as well as other pitch properties like velocity, location, and movement. It’s also possible that EV as I’ve estimated it just isn’t as significant as I’ve assumed based on some of my other findings, but I think I have a lot more to explore before I’d be ready for that conclusion.
–
This article includes research on the theory of Effective Velocity, which was created by Perry Husband. The research presented here estimates EV but is less sophisticated than Husband’s work. To read more about Husband’s work or to learn about the services he offers, check out his web site.
Scott Spratt is a fantasy sports writer for FanGraphs and Pro Football Focus. He is a Sloan Sports Conference Research Paper Competition and FSWA award winner. Feel free to ask him questions on Twitter – @Scott_Spratt
“Then, I ran a multivariate linear regression with those three dependent variables and same-season ERA as the independent variable”
– wouldn’t ERA be the dependent variable?
Yes, you’re right. I’ll fix that.