Introducing Batter xHR/FB Rate, Version 4.0: The Equation by Mike Podhorzer February 10, 2021 At last, it’s finally time to unmask xHR/FB v4.0! If you want a refresher on how we got here, review my xHR/FB history and v4.0 research and then check out the correlations of a variety of metrics that may or may not predict HR/FB rate. After calculating correlations with HR/FB rate, I went to work testing out a plethora of variable combinations to determine which best predicted the metric. Let’s first find out which variables got the axe and failed to make it into the equation: Avg Dist FB Avg Dist LD Spin Rate FB Solid Contact FB% Straightaway FB% If you recall from yesterday, I ran correlations for average distance of fly balls and line drives, and then average distance for each batted ball type individually. I tested my equation using both the distance from the combined batted balls and separately, and the combined won out. As a result, I said goodbye to the average distance marks for each batted ball type individually. Next, I was excited about the possibilities that average spin rate on fly balls could offer us. I wasn’t even aware the metric was available and was hoping it would dramatically improve the predictiveness of my equation. It did not, sad face. The correlation with HR/FB rate was meaningful, but only marginally positive, so it was probably drowned out by other, more influential metrics. I shared my enthusiasm yesterday for discovering that solid contact fly balls resulted in a 21.1% home run rate, which suggested I shouldn’t have given all my attention to just barrels. Well, maybe I should have. Despite the high home run rate, the correlation with HR/FB rate was surprisingly low, and it wasn’t a meaningful variable when tested in my equation. Finally, straightaway fly balls had the lowest correlation with HR/FB at very slightly negative, and had no business being part of the equation gang. It’s now time to reveal the equation and discuss: xHR/FB+LD = -0.462976419 + (Std Dev of Dist FB+LD * 0.001638039) + (Avg Dist FB+LD * 0.001176657) + (Barrel FB% * 0.172323325) + (Barrel LD% * 0.151220016) + (Pull FB% * 0.127471647) + (Pull LD% * 0.032113658) + (Oppo FB% * 0.038808565) That’s not the end though. This equation is entirely dependent on Statcast data and its definition of fly balls and line drives. So you will end up with an xHR/FB+LD rate, but it will not be comparable to the HR/FB rate here on FanGraphs. Since Baseball Savant doesn’t publish its own HR/FB or HR/FB+LD rate, we want the equation to actually be useful. Therefore, we need to convert it into a friendlier version. So the final step is: xHR/FB v4.0 = xHR/FB+LD * [Statcast FB+LD] / FanGraphs FB Adjusted R-squared = 0.826 Adjusted R-squared of v1.0= 0.649 Adjusted R-squared of v2.0 = 0.682 Adjusted R-squared of v3.0= 0.792 This was a big leap forward. You might question why I would make such a proclamation with only a marginal increase in R-squared from v3.0. Well, first off, an R-squared this high already is quite awesome. How many xMetrics in the universe could brag of toting such a high mark?! The second reason is actually more important and something you may have noticed when looking at the equation — it’s missing a park factor adjustment. In v3.0, I added a park factor adjustment variable for obvious reasons. Park dimensions, weather, etc, all have significant effects on home runs. So really, no good xHR/FB rate equation worth its salt should be missing one, right? If it were only that simple! I wanted to include a park factor adjustment, I really did. Unfortunately, it is just too darn difficult. Let me explain why: Players switch teams in-season all the time. I would have had to identify such players, calculate the percentage of total fly balls and line drives the player hit for each team, and then create a blended park factor of the two parks using those percentages. That would have been crazy time consuming. Park factors we find on many different sites, including here on FanGraphs, are essentially for the league average hitter. Take a generic hitter, move him out of Park A and into Park B, we would typically expect his HR/FB rate to change by X%. However, an individual player isn’t the league and not always your run of the mill guy. That means that every hitter is going to be affected differently, maybe slightly, maybe dramatically so, by the park. So a one size fits all home run park factor just isn’t going to work correctly for every hitter. For example, Yankee Stadium features a short porch in right field, making it heaven for left-handed home run hitters. Yet, DJ LeMahieu, a right-handed hitter, has posted an insane 30.75 HR/FB rate at home since joining the Yankees, versus just a 10.7% mark in away parks. If I added a park factor adjustment into the equation, it would use the right-handed home run factor for Yankee Stadium, when in fact LeMahieu is benefiting the way a left-hander would given his penchant for hitting opposite field fly balls. Given the components of my xHR/FB rate equation, park factors don’t always affect the output the way you might expect it to. Without the adjustment, you would figure that hitters in the most favorable home run parks would consistently outperform their xHR/FB rates, and vice versa. This is actually not the case. The wrinkle here is that some parks affect the batted ball distance, which is already an equation variable, because of weather or atmospheric conditions, while others do not, and the park factor is mostly due to the dimensions. Coors Field presents the biggest challenge in incorporating park factors, and is an example of this wrinkle. Because of the thin air, the ball travels further, which increases the Avg Dist FB+LD variable, and that is why Coors’ home run park factors are consistently at or near the top in baseball. A park factor adjustment in the equation would end up double counting this effect. Want proof? I created a Pivot Table with my data set and summed the difference between actual HR/FB rate and xHR/FB rate. Rockies hitters in aggregate underperformed their xHR/FB rate! In fact, the team was the sixth biggest underperformer. While you might suggest I update the equation to include a park factor adjustment for every team but the Rockies, I would argue that this weather/atmospheric effect that influences the Avg Dist FB+LD variable might exist in other parks too. We are just most familiar with Coors, but I don’t know what effect, if any, other parks have on this variable. On the other hand, most of the other underperforming and overperforming teams in the Pivot Table are those you would expect. The Giants are the second biggest underperformers behind the Tigers, who play in a park that still baffles me, as the home run park factors are basically neutral, and yet they are the most underperforming team. At the other end are the league’s biggest overperformers, the Reds, which does make sense as their park always ranks as one of the most favorable home run parks in baseball. Behind the Reds are the Yankees, who are again no surprise. Then there’s an enormous gap to the rest of the pack. So as you can see now, applying a park factor adjustment to any xMetric is mighty, mighty difficult. It would require more time than I have and more math and modeling skills than I possess. Instead, if you have a general idea of how a park plays for that batters handedness, you can mentally adjust his xHR/FB rate up or down. Getting back to that adjusted R-squared, the fact it’s so high even without a park factor adjustment is super cool. Now let’s check out the year over year correlations for the equation components. We obviously want to see relatively high marks to suggest these are sustainable skills, rather than random rates that bounce around each season. YoY Correlations Metric YoY Correlation HR/FB 0.621 xHR/FB 0.705 Barrel LD% 0.683 Barrel FB% 0.664 Avg Dist FB+LD 0.652 Pull LD% 0.604 Oppo FB% 0.493 Pull FB% 0.489 Std Dev of Dist FB+LD 0.352 First, it’s nice to see that xHR/FB rate is stickier than actual HR/FB rate. Again, this is even without a park factor adjustment. Next, we find that four of the seven variables sport YoY correlations over 0.60. That seems pretty good! Interestingly, Pull LD% is more consistent than either of the two batted ball direction fly ball percentages. Finally, Std Dev of Dist FB+LD is the least consistent from season to season, but it plays an important role in forecasting HR/FB rate. Last, and not in the table, is the correlation of xHR/FB rate in Year 1 to HR/FB rate in Year 2. That correlation is 0.632, which is barely above the 0.621 correlation of HR/FB rate itself. While I was hoping that the gap would be more significant, the missing park factor piece certainly plays a role here in the closeness of the correlations. Also remember that my equation is backward looking and not meant to be used for a Year 2 forecast. The ideal usage of the rate is as a substitute for actual HR/FB rate when projecting next season HR/FB rate. xHR/FB v4.0 was used as a guide when forecasting 2021 HR/FB rates for the Pod Projections, which are available now!