Inducing Weak Contact: Why Rex Hudler Got Me Thinking

March 11, 2021

As part of my preseason prep, I watched a Kris Bubic start from last season. During it, Rex Hudler, who is never short on opinions, brought up an interesting point. The more pitches each batter sees, the quicker the batter becomes with the pitcher repertoire, and the more likely the batter gets a hit. At first, I thought someone else was speaking, but no, the concept warranted further investigation. It’s the same theory behind the times-through-the-order penalty but the new effect could be felt depending on how many pitches a pitcher throws per hitter and depth of arsenal for the pitcher. That idea started me down a wormhole that led to many questions and one subpar answer, but there seems to be at least one nugget of wisdom in Rex Hudler’s head.

First off, with less than a month before the season starts, it’s not an ideal time to start a study that could take weeks to iron out. I barely have enough time to report news, velocity readings, and draft my own teams. The following “answers” are not set in stone and there are so many more questions to investigate. I could either shelve the ideas for months or just make a snippet available and let others run with the ideas while I grind through the fantasy season. I’m giving others the chance to refine the ideas before I come back to them.

Since I was going to examine pitcher batted-ball data, I decided to include other inputs. Also for this study, I wanted to focus as much as possible on stats to help my fantasy team. Instead of projecting StatCast data and then creating standard stats, I ignored the middle ground. As I stated before, shortcuts were taken while I focused on BABIP and vISO (versus ISO).

I wanted to stay away from batted ball inputs, but one kept popping up so I included it… until I had to ignore it. The inputs I used were:

1. Pitches per Batter Faced per Arsenal Diversity Factor* (Rex Hudler Index): The bigger the value, the more of a pitcher’s arsenal is seen in each at-bat. Consider two pitchers. One throws his two pitches an equal amount and averages five pitches per at-bat. The other pitcher equally throws four pitches and averages four pitches per at-bat. The first pitcher would end up with a value of 2.5 (5 P/BF / 2 Diversity Factor) and the other at 1.0 (4 P/BF / 4 Diversity Factor).

2. Non-fastball Zone% when Behind in the Count Percentage: I had been considering this concept for a while and this stuudy was a perfect time to see if it had any merit. The basic explanation is that when a pitcher is behind in the count, can they throw a non-fastball over the plate to get a called strike. If not, a hitter can just sit on and crush fastballs in the zone.

3. Ahead versus Behind in the Count: I’ve already determined that the higher the ratio of pitches ahead versus pitches behind can lead to weaker contact.

4. Difference in Fastball Velocity from the League Average: From some previous work, I had an idea that high-velocity pitchers were harder to hit. This is true, but from this exercise, I found out it’s more important to measure how much a pitcher’s fastball velocity deviates from league-average fastball velocity (i.e. 93 mph). Hitters get used to the average fastball speed and they take a while to adjust to anything different. High velocity can lead to more strikeouts but there is a reason why some sub-90 mph starters make it work.

5. Groundball rate: I didn’t start with it as an input, but it became obvious that groundball pitchers have a higher BABIP but lower vISO. Adding this batted ball data improved the BABIP and vISO projections, but in the end, I had to remove it. The issue comes down to MLB changing the ball each year and the impact of groundballs and flyballs change. What used to be a flyball that didn’t get to the warning track became a home run just a few years later. Since BABIP differences are so small to begin with, the changing ball made year-to-year valuation almost impossible.

The way I determined each input may need to be refined. For example, I came up with two ways (ratio and difference) to measure ahead and behind in the count. I used the ratio version for this study. Is there a better way of measuring pitching mix diversity? For sure. Do I have time to test them right now? No.

Along with the inputs, here are some limitations I could think of:

Data from 2015 to 2020 was used so StatCast data can be incorporated at a later date. This timeframe just happens to coincide when MLB decided to keep juicing up the baseball. It’s impossible to create a batted ball metric when the ball keeps changing.
Only starting pitchers (min 60 IP) were included. I wanted to get the pitchers who would possibly see a batter two to three times a game.
The defense behind the pitches was not accounted.
The quality of the opposing hitters was not used.
Ballpark effects not taken into account.

The plan was to see what works and what doesn’t. Besides the above limitation, some biases could exist that would need to be expelled at a later date.

I treated each at-bat the same depending on the number of times the batter saw the pitcher. A similar study might need to be run on just the first, second, and third time through the order.
There are possible overlaps with the count-related factors. The pitchers who were always behind in the count might be the ones who throw the most pitches. The problem is to find a way to efficiently separate each factor so they aren’t related.

Next, I used a linear regression equation to see how the five values combine to project BABIP and vISO. The first conclusion is that “Non-Fastball Zone% When Behind” and “Pitches per Batter Faced per Arsenal Diversity Factor” are not relevant (i.e. significant) with vISO. Now, some other version of it (e.g. different strike zone size, only behind by one strike, etc) might work better, so I’m not discarding the information. Without any regression, I ended up with projected BABIP values from .267 to .311 and vISO values ranging from .136 to .294.

With this solution, the groundball rate swings projections to the extremes. For example, groundballer Dallas Keuchel would have a projected .310 BABIP (nearly the highest) and .136 ISO (the lowest). I wanted to learn more than if pitcher has a groundball or flyball tendency. While the groundball rate is a relevant factor for estimating BABIP and vISO, it outweighs the other factors. I just removed it.

The new projected BABIP range shrunk from .277 to .299 and the vISO range from .209 to .238. Again, the groundball rate matters but, now the other factors are isolated and just their weight can be seen.

Finally, I put the BABIP and vISO estimates on a 0% to 100% scale with 100% being the pitcher who limits hard contact the most. I didn’t like going this route but if I put out any BABIP or vISO projections, I know they would be wrong. After all that, here are the starting pitchers (min 60 IP) ranked by the average of the two weak contact values.

2020 Weak Contact Leaderboard

Name	GB%	NonFBZone	Pitch/Div	Ahead/Behind	FBvDiff	BABIP_WC	ISO_WC	Average	ERA	SIERA	xFIP	FIP
Yu Darvish	43%	13%	1.0	166%	2.4	98%	95%	96%	2.01	3.14	2.82	2.23
Jacob deGrom	42%	10%	1.5	131%	5.5	92%	90%	91%	2.38	2.70	2.46	2.26
Marco Gonzales	38%	7%	1.2	148%	4.9	80%	98%	89%	3.10	3.90	4.13	3.32
Kyle Hendricks	47%	5%	1.0	168%	5.7	74%	100%	87%	2.88	4.00	3.78	3.55
Zack Greinke	41%	7%	1.1	104%	6.0	82%	72%	77%	4.03	3.72	3.51	2.80
Dinelson Lamet	37%	6%	1.7	137%	4.0	64%	87%	76%	2.09	3.16	3.30	2.48
Kenta Maeda	49%	13%	1.1	129%	1.7	85%	64%	74%	2.70	2.92	2.63	3.00
Hyun-Jin Ryu	51%	13%	0.9	108%	3.5	95%	51%	73%	2.69	3.67	3.32	3.01
Adam Wainwright	43%	11%	1.0	112%	3.8	87%	57%	72%	3.15	4.39	4.23	4.11
Luis Castillo	58%	7%	1.0	119%	4.4	67%	74%	71%	3.21	3.35	2.82	2.65
Dallas Keuchel	53%	12%	1.1	77%	5.8	100%	41%	71%	1.99	4.57	3.98	3.08
Gerrit Cole	37%	5%	1.5	138%	3.6	54%	85%	69%	2.84	3.21	3.38	3.89
Brandon Woodruff	49%	3%	1.1	154%	3.4	36%	92%	64%	3.05	3.30	3.29	3.20
Dylan Bundy	41%	6%	0.9	124%	2.9	57%	69%	63%	3.29	3.80	3.75	2.95
Aaron Civale	44%	12%	0.9	122%	1.3	72%	49%	60%	4.74	4.11	3.92	4.03
Max Scherzer	33%	5%	1.2	149%	1.6	39%	80%	59%	3.74	3.56	3.53	3.46
Zach Davies	41%	12%	1.4	79%	4.5	90%	26%	58%	2.73	4.32	4.14	3.88
German Marquez	51%	3%	0.9	134%	2.6	28%	77%	53%	3.75	4.27	3.83	3.28
Patrick Corbin	44%	5%	1.1	120%	2.9	44%	62%	53%	4.66	4.42	4.12	4.17
Aaron Nola	50%	8%	1.0	130%	0.7	46%	54%	50%	3.28	3.25	2.79	3.19
Jon Lester	47%	11%	0.9	77%	3.9	77%	18%	48%	5.16	5.02	5.11	5.14
Andrew Heaney	39%	3%	1.7	152%	1.6	13%	82%	48%	4.46	4.08	4.15	3.79
Shane Bieber	48%	10%	1.1	111%	1.1	59%	33%	46%	1.63	2.52	2.04	2.07
Carlos Carrasco	44%	9%	1.1	123%	0.5	49%	44%	46%	2.91	3.91	3.65	3.59
Zack Wheeler	56%	2%	1.0	116%	3.8	23%	67%	45%	2.92	4.07	3.76	3.22
Lucas Giolito	41%	6%	1.6	135%	0.9	26%	59%	42%	3.48	3.51	3.35	3.19
Kyle Freeland	52%	12%	0.8	85%	1.2	69%	3%	36%	4.33	4.95	4.55	4.65
Zac Gallen	46%	8%	1.1	113%	0.2	41%	28%	35%	2.75	3.88	3.62	3.66
Martin Perez	38%	11%	0.9	97%	1.0	62%	8%	35%	4.50	5.43	5.20	4.88
Johnny Cueto	41%	8%	1.0	92%	1.8	51%	13%	32%	5.40	4.90	4.78	4.64
Alec Mills	47%	3%	0.9	97%	3.1	21%	39%	30%	4.48	4.81	4.61	5.44
Jose Berrios	40%	5%	1.0	111%	1.2	18%	36%	27%	4.00	4.39	4.28	4.06
Lance Lynn	36%	4%	1.4	125%	0.4	8%	46%	27%	3.32	4.08	4.34	4.19
Matthew Boyd	37%	7%	1.3	96%	1.4	33%	15%	24%	6.71	4.60	4.97	5.78
Antonio Senzatela	51%	6%	1.5	104%	1.3	15%	31%	23%	3.44	5.02	4.81	4.57
Kyle Gibson	51%	7%	0.9	99%	0.8	31%	10%	21%	5.35	4.77	4.36	5.39
Trevor Bauer	34%	5%	1.1	109%	0.4	5%	23%	14%	1.73	2.94	3.25	2.88
Brady Singer	53%	5%	1.9	105%	0.3	3%	21%	12%	4.06	4.29	4.05	4.08
Chris Bassitt	44%	6%	0.9	92%	0.2	10%	0%	5%	2.29	4.46	4.49	3.59
Framber Valdez	60%	4%	1.5	97%	0.0	0%	5%	3%	3.57	3.23	2.94	2.85

The top of the list looks perfect, but the several names at the bottom seem out of place (e.g. Bauer, Bassit, Lynn). It’s a start, but it pass the smell test.

An obvious trend emerges. Here are the results of ERA minus the three major ERA estimators.

Pitchers ERA Minus …

Pitchers	SIERA	xFIP	FIP
Top 10	-0.67	-0.47	-0.12
Everyone	-0.50	-0.33	-0.19
Bottom 10	-0.37	-0.32	-0.39

First and foremost, nothing should be taken from the ERA being lower across all the estimators. Since it was a shortened season, more variance would exist. Additionally, only the pitchers who are putting up good results were be allowed to keep throwing. Those who struggled had their innings limited or were demoted to the alternate site.

The pitchers who were able to limit hard contact from the above methods have their ERA and FIP almost identical. This is because they limit limited the hardest contact of all, home runs. On the other hand, the ERA estimators that assume league average contact (i.e SIERA and xFIP) are off by a decent amount with this top group. The differences merge together for those who allow the hardest contact.

Sadly, that’s more than likely the end until the season is over and more time becomes available to complete longer studies. I know some people will want a robust, detailed study. Hell, I do, but at least everyone knows some factor to consider when determining the pitchers who suppress hard contact.

* For the Diversity Factor, I asked Twitter for help and got plenty of suggestions. A little birdy pointed me to the inverse of the Herfindahl-Hirschman Index. The pitch arsenal diversity equation works out to be:

(1/((pitch1%)^2+(pitch2%)^2+(pitch%)^2+ … (pitchX%)^2)

Here is an example of possible values (Method 2) created using different pitch mixes.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG