How to Project High HR/FB% Pitchers

For the second week in a row, I had to voice my amazement that JT Brubaker was cut in the NFBC Main Event. While Brubaker’s results have not been great (4.95 ERA), there are several signs that point to him being closer to a 4.00 ERA pitcher. The stat that sticks out is his 1.9 HR/9. The home runs have him with a 4.98 FIP while his xFIP is a full run lower at 3.98. I wanted to see if I should blindly assume that his home run rate will drop. With the expected drop, will his FIP and ERA regress downward to his xFIP? Also, are there any measurable traits that make a pitcher more home run prone? I ended up with a “maybe” and a solid “no”.

Brubaker isn’t the only pitcher who fits this mold. Bailey Ober has a 2.2 HR/9. His 4.99 ERA is almost identical to his 5.18 while his xFIP is down at 4.12. Another is Yusei Kikuchi (1.6 HR/9, 4.37 FIP, 3.47 xFIP). Adbert Alzolay (2.0 HR/9, 5.03 FIP, 3.89 xFIP). The season is over halfway over and fantasy managers are losing patience.

There is a different version of this article that only exists on some revision list that diverges at this point. It focused on HR/9 and FIP-xFIP differences. It had a ton of math but no answers. I stepped back and focused on a single question. Over a short time frame, can I use xFIP to evaluate pitchers since an extra one or two home runs can boost up a pitcher’s FIP and ERA. But a few games turn into a few weeks and then a few months. When is it best to just give up on waiting for the regression and just assume it’s the pitcher’s true talent? I needed to better understand if I trusted xFIP in a small sample over other metrics.

xFIP uses strikeouts and walks as an input, but instead of home run rate, it creates a home run estimator using the pitcher’s flyball rate and the league’s average home run to flyball rate (13.6% HR/FB% in 2021). A pitcher’s flyball rate stabilizes (70 BIP) much quicker than his home run per flyball rate (400 flyballs).

The focus needs to be on high HR/FB% rates. I decided to match up a pitcher’s season’s 1st half and 2nd half results from 2010 to 2019 (min 20 IP in 1H, 10 IP in 2H).* There are a couple of possible additions to this study (fastball%, pitch mix, Statcast data), but I utilized all available stats in the FanGraphs Splits database. I grouped the pitchers into HR/FB increments for the first half and then found the median second-half difference.

Median 1st Half & 2nd Half HR/FB% Values
HR/FB% 1st Half 2nd Half
> 20% 22.2% 13.3%
15% to 20% 16.7% 12.8%
10% to 15% 12.1% 11.1%
5% to 10% 7.8% 10.4%
0% to 5% 3.3% 10.3%

The home run rates regress towards the league average values, but not all the way. A player will eventually get to display their true talent. And that knowledge level is acceptable. For inexperienced pitchers, xFIP should be the default for the pitcher’s talent. As a pitcher gains experience, their HR/FB% needs to be taken more and more into account.

Failing at Splitting Hairs

I tried to see if I could find the traits that pointed to the pitchers who allowed a higher HR/FB%. I grouped (not all so I could backtest) some of the pitchers with an HR/FB% over 15% and divided them into those who improved their home run rate and those who continued giving up too many homers. Then I tried to find contrasting traits between the two groups. A lot of the traits correlated, but I ended up with the following two values and the “best” narratives I can make up as to possibly why.

  • Walk Rate (BB/9 < 3.1): The pitcher has good control of where the ball is going, so he can prevent home runs.
  • Soft Contact (Soft% < 19%): The pitcher normally allows weak contact but was unlucky with home runs.

Next, I created a z-score from how many standard deviations a pitcher’s stats are above or below these thresholds. Finally, I added the z-scores together. With the combined z-score value, I back-tested the two factors and they were NOT predictive in estimating who would or wouldn’t have a high HR/FB%. Just nothing. I tried a few more tests and had no luck. I just need to trust xFIP.

Going back to the original question about should JT Brubaker’s home run rate drop? Yes, it should drop from 21% to about 13% (14% HR/FB% in 2020) just based on historic regression. He only has only allowed 150 career flyballs, so he’s a ways away from the 600 flyballs needed for his own flyball rate stabilizes. I stand by my recommendation to buy Brubaker since he’s struggled with home runs so far this season.

 

* I use a lower innings threshold in the second half to help account for survivor bias. Assume a pitcher is a true talent 4.50 ERA pitcher. If they are unlucky in the season’s first half and have a 5.50 ERA, they may get their innings limited in the second half. If on the other hand, if the pitcher is lucky and has a 3.50 ERA, they will have a long leash until their playing time gets cut.





Jeff, one of the authors of the fantasy baseball guide,The Process, writes for RotoGraphs, The Hardball Times, Rotowire, Baseball America, and BaseballHQ. He has been nominated for two SABR Analytics Research Award for Contemporary Analysis and won it in 2013 in tandem with Bill Petti. He has won four FSWA Awards including on for his Mining the News series. He's won Tout Wars three times, LABR twice, and got his first NFBC Main Event win in 2021. Follow him on Twitter @jeffwzimmerman.

6 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Rotoholicmember
2 years ago

I see that you used lower innings threshold to account for some survivor bias, but the bad pitchers’ contribution would still be reduced, and some won’t pitch in the second half at all. Pitchers with a history of success would be over-represented. The median would appear to be better than it would if all the bad pitchers had been allowed to continue pitching.

Instead of splitting by 1H/2H, you could split it similar to how Pizza Cutter did for his original stabilization studies, ie: sorting games chronologically and then bucketing them into odds and evens. So you still get to compare half seasons vs each other but you remove most of the survivor bias. This would also allow you to use weighted average HR/FB rates rather than the median. It could be a much more complicated way to parse the data depending how you have it set up, though.

By the way, good timing on this article after Andrew Heaney’s display last night!