Inducing Weak Contact: Why Rex Hudler Got Me Thinking

As part of my preseason prep, I watched a Kris Bubic start from last season. During it, Rex Hudler, who is never short on opinions, brought up an interesting point. The more pitches each batter sees, the quicker the batter becomes with the pitcher repertoire, and the more likely the batter gets a hit. At first, I thought someone else was speaking, but no, the concept warranted further investigation. It’s the same theory behind the times-through-the-order penalty but the new effect could be felt depending on how many pitches a pitcher throws per hitter and depth of arsenal for the pitcher. That idea started me down a wormhole that led to many questions and one subpar answer, but there seems to be at least one nugget of wisdom in Rex Hudler’s head.

First off, with less than a month before the season starts, it’s not an ideal time to start a study that could take weeks to iron out. I barely have enough time to report news, velocity readings, and draft my own teams. The following “answers” are not set in stone and there are so many more questions to investigate. I could either shelve the ideas for months or just make a snippet available and let others run with the ideas while I grind through the fantasy season. I’m giving others the chance to refine the ideas before I come back to them.

Since I was going to examine pitcher batted-ball data, I decided to include other inputs. Also for this study, I wanted to focus as much as possible on stats to help my fantasy team. Instead of projecting StatCast data and then creating standard stats, I ignored the middle ground. As I stated before, shortcuts were taken while I focused on BABIP and vISO (versus ISO).

I wanted to stay away from batted ball inputs, but one kept popping up so I included it… until I had to ignore it. The inputs I used were:

1. Pitches per Batter Faced per Arsenal Diversity Factor* (Rex Hudler Index): The bigger the value, the more of a pitcher’s arsenal is seen in each at-bat. Consider two pitchers. One throws his two pitches an equal amount and averages five pitches per at-bat. The other pitcher equally throws four pitches and averages four pitches per at-bat. The first pitcher would end up with a value of 2.5 (5 P/BF / 2 Diversity Factor) and the other at 1.0 (4 P/BF / 4 Diversity Factor).

2. Non-fastball Zone% when Behind in the Count Percentage: I had been considering this concept for a while and this stuudy was a perfect time to see if it had any merit. The basic explanation is that when a pitcher is behind in the count, can they throw a non-fastball over the plate to get a called strike. If not, a hitter can just sit on and crush fastballs in the zone.

3. Ahead versus Behind in the Count: I’ve already determined that the higher the ratio of pitches ahead versus pitches behind can lead to weaker contact.

4. Difference in Fastball Velocity from the League Average: From some previous work, I had an idea that high-velocity pitchers were harder to hit. This is true, but from this exercise, I found out it’s more important to measure how much a pitcher’s fastball velocity deviates from league-average fastball velocity (i.e. 93 mph). Hitters get used to the average fastball speed and they take a while to adjust to anything different. High velocity can lead to more strikeouts but there is a reason why some sub-90 mph starters make it work.

5. Groundball rate: I didn’t start with it as an input, but it became obvious that groundball pitchers have a higher BABIP but lower vISO. Adding this batted ball data improved the BABIP and vISO projections, but in the end, I had to remove it. The issue comes down to MLB changing the ball each year and the impact of groundballs and flyballs change. What used to be a flyball that didn’t get to the warning track became a home run just a few years later. Since BABIP differences are so small to begin with, the changing ball made year-to-year valuation almost impossible.

The way I determined each input may need to be refined. For example, I came up with two ways (ratio and difference) to measure ahead and behind in the count. I used the ratio version for this study. Is there a better way of measuring pitching mix diversity? For sure. Do I have time to test them right now? No.

Along with the inputs, here are some limitations I could think of:

  • Data from 2015 to 2020 was used so StatCast data can be incorporated at a later date. This timeframe just happens to coincide when MLB decided to keep juicing up the baseball. It’s impossible to create a batted ball metric when the ball keeps changing.
  • Only starting pitchers (min 60 IP) were included. I wanted to get the pitchers who would possibly see a batter two to three times a game.
  • The defense behind the pitches was not accounted.
  • The quality of the opposing hitters was not used.
  • Ballpark effects not taken into account.

The plan was to see what works and what doesn’t. Besides the above limitation, some biases could exist that would need to be expelled at a later date.

  • I treated each at-bat the same depending on the number of times the batter saw the pitcher. A similar study might need to be run on just the first, second, and third time through the order.
  • There are possible overlaps with the count-related factors. The pitchers who were always behind in the count might be the ones who throw the most pitches. The problem is to find a way to efficiently separate each factor so they aren’t related.

Next, I used a linear regression equation to see how the five values combine to project BABIP and vISO. The first conclusion is that “Non-Fastball Zone% When Behind” and “Pitches per Batter Faced per Arsenal Diversity Factor” are not relevant (i.e. significant) with vISO. Now, some other version of it (e.g. different strike zone size, only behind by one strike, etc) might work better, so I’m not discarding the information. Without any regression, I ended up with projected BABIP values from .267 to .311 and vISO values ranging from .136 to .294.

With this solution, the groundball rate swings projections to the extremes. For example, groundballer Dallas Keuchel would have a projected .310 BABIP (nearly the highest) and .136 ISO (the lowest). I wanted to learn more than if pitcher has a groundball or flyball tendency. While the groundball rate is a relevant factor for estimating BABIP and vISO, it outweighs the other factors. I just removed it.

The new projected BABIP range shrunk from .277 to .299 and the vISO range from .209 to .238. Again, the groundball rate matters but, now the other factors are isolated and just their weight can be seen.

Finally, I put the BABIP and vISO estimates on a 0% to 100% scale with 100% being the pitcher who limits hard contact the most. I didn’t like going this route but if I put out any BABIP or vISO projections, I know they would be wrong. After all that, here are the starting pitchers (min 60 IP) ranked by the average of the two weak contact values.

2020 Weak Contact Leaderboard
Name GB% NonFBZone Pitch/Div Ahead/Behind FBvDiff BABIP_WC ISO_WC Average ERA SIERA xFIP FIP
Yu Darvish 43% 13% 1.0 166% 2.4 98% 95% 96% 2.01 3.14 2.82 2.23
Jacob deGrom 42% 10% 1.5 131% 5.5 92% 90% 91% 2.38 2.70 2.46 2.26
Marco Gonzales 38% 7% 1.2 148% 4.9 80% 98% 89% 3.10 3.90 4.13 3.32
Kyle Hendricks 47% 5% 1.0 168% 5.7 74% 100% 87% 2.88 4.00 3.78 3.55
Zack Greinke 41% 7% 1.1 104% 6.0 82% 72% 77% 4.03 3.72 3.51 2.80
Dinelson Lamet 37% 6% 1.7 137% 4.0 64% 87% 76% 2.09 3.16 3.30 2.48
Kenta Maeda 49% 13% 1.1 129% 1.7 85% 64% 74% 2.70 2.92 2.63 3.00
Hyun-Jin Ryu 51% 13% 0.9 108% 3.5 95% 51% 73% 2.69 3.67 3.32 3.01
Adam Wainwright 43% 11% 1.0 112% 3.8 87% 57% 72% 3.15 4.39 4.23 4.11
Luis Castillo 58% 7% 1.0 119% 4.4 67% 74% 71% 3.21 3.35 2.82 2.65
Dallas Keuchel 53% 12% 1.1 77% 5.8 100% 41% 71% 1.99 4.57 3.98 3.08
Gerrit Cole 37% 5% 1.5 138% 3.6 54% 85% 69% 2.84 3.21 3.38 3.89
Brandon Woodruff 49% 3% 1.1 154% 3.4 36% 92% 64% 3.05 3.30 3.29 3.20
Dylan Bundy 41% 6% 0.9 124% 2.9 57% 69% 63% 3.29 3.80 3.75 2.95
Aaron Civale 44% 12% 0.9 122% 1.3 72% 49% 60% 4.74 4.11 3.92 4.03
Max Scherzer 33% 5% 1.2 149% 1.6 39% 80% 59% 3.74 3.56 3.53 3.46
Zach Davies 41% 12% 1.4 79% 4.5 90% 26% 58% 2.73 4.32 4.14 3.88
German Marquez 51% 3% 0.9 134% 2.6 28% 77% 53% 3.75 4.27 3.83 3.28
Patrick Corbin 44% 5% 1.1 120% 2.9 44% 62% 53% 4.66 4.42 4.12 4.17
Aaron Nola 50% 8% 1.0 130% 0.7 46% 54% 50% 3.28 3.25 2.79 3.19
Jon Lester 47% 11% 0.9 77% 3.9 77% 18% 48% 5.16 5.02 5.11 5.14
Andrew Heaney 39% 3% 1.7 152% 1.6 13% 82% 48% 4.46 4.08 4.15 3.79
Shane Bieber 48% 10% 1.1 111% 1.1 59% 33% 46% 1.63 2.52 2.04 2.07
Carlos Carrasco 44% 9% 1.1 123% 0.5 49% 44% 46% 2.91 3.91 3.65 3.59
Zack Wheeler 56% 2% 1.0 116% 3.8 23% 67% 45% 2.92 4.07 3.76 3.22
Lucas Giolito 41% 6% 1.6 135% 0.9 26% 59% 42% 3.48 3.51 3.35 3.19
Kyle Freeland 52% 12% 0.8 85% 1.2 69% 3% 36% 4.33 4.95 4.55 4.65
Zac Gallen 46% 8% 1.1 113% 0.2 41% 28% 35% 2.75 3.88 3.62 3.66
Martin Perez 38% 11% 0.9 97% 1.0 62% 8% 35% 4.50 5.43 5.20 4.88
Johnny Cueto 41% 8% 1.0 92% 1.8 51% 13% 32% 5.40 4.90 4.78 4.64
Alec Mills 47% 3% 0.9 97% 3.1 21% 39% 30% 4.48 4.81 4.61 5.44
Jose Berrios 40% 5% 1.0 111% 1.2 18% 36% 27% 4.00 4.39 4.28 4.06
Lance Lynn 36% 4% 1.4 125% 0.4 8% 46% 27% 3.32 4.08 4.34 4.19
Matthew Boyd 37% 7% 1.3 96% 1.4 33% 15% 24% 6.71 4.60 4.97 5.78
Antonio Senzatela 51% 6% 1.5 104% 1.3 15% 31% 23% 3.44 5.02 4.81 4.57
Kyle Gibson 51% 7% 0.9 99% 0.8 31% 10% 21% 5.35 4.77 4.36 5.39
Trevor Bauer 34% 5% 1.1 109% 0.4 5% 23% 14% 1.73 2.94 3.25 2.88
Brady Singer 53% 5% 1.9 105% 0.3 3% 21% 12% 4.06 4.29 4.05 4.08
Chris Bassitt 44% 6% 0.9 92% 0.2 10% 0% 5% 2.29 4.46 4.49 3.59
Framber Valdez 60% 4% 1.5 97% 0.0 0% 5% 3% 3.57 3.23 2.94 2.85

The top of the list looks perfect, but the several names at the bottom seem out of place (e.g. Bauer, Bassit, Lynn). It’s a start, but it pass the smell test.

An obvious trend emerges. Here are the results of ERA minus the three major ERA estimators.

Pitchers ERA Minus …
Pitchers SIERA xFIP FIP
Top 10 -0.67 -0.47 -0.12
Everyone -0.50 -0.33 -0.19
Bottom 10 -0.37 -0.32 -0.39

First and foremost, nothing should be taken from the ERA being lower across all the estimators. Since it was a shortened season, more variance would exist. Additionally, only the pitchers who are putting up good results were be allowed to keep throwing. Those who struggled had their innings limited or were demoted to the alternate site.

The pitchers who were able to limit hard contact from the above methods have their ERA and FIP almost identical. This is because they limit limited the hardest contact of all, home runs. On the other hand, the ERA estimators that assume league average contact (i.e SIERA and xFIP) are off by a decent amount with this top group. The differences merge together for those who allow the hardest contact.

Sadly, that’s more than likely the end until the season is over and more time becomes available to complete longer studies. I know some people will want a robust, detailed study. Hell, I do, but at least everyone knows some factor to consider when determining the pitchers who suppress hard contact.

 

* For the Diversity Factor, I asked Twitter for help and got plenty of suggestions. A little birdy pointed me to the inverse of the Herfindahl-Hirschman Index. The pitch arsenal diversity equation works out to be:

(1/((pitch1%)^2+(pitch2%)^2+(pitch%)^2+ … (pitchX%)^2)

Here is an example of possible values (Method 2) created using different pitch mixes.





Jeff, one of the authors of the fantasy baseball guide,The Process, writes for RotoGraphs, The Hardball Times, Rotowire, Baseball America, and BaseballHQ. He has been nominated for two SABR Analytics Research Award for Contemporary Analysis and won it in 2013 in tandem with Bill Petti. He has won four FSWA Awards including on for his Mining the News series. He's won Tout Wars three times, LABR twice, and got his first NFBC Main Event win in 2021. Follow him on Twitter @jeffwzimmerman.

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
cnote66member
3 years ago

Gotta love Wonder Dog!