wPDI & CSW: Strikeout Rate

September 18, 2020

Introduction

This is the fourth article in my wPDI vs. CSW series. You can catch up by reading the first three articles – on called strikes, whiffs and residuals.

Here is a quick summary of some of the basics of wPDI & CSW from this series:

Last year, I developed the Weighted Plate Discipline Index (wPDI) framework, whereby all pitches can be classified into six different outcomes as follows:

wPDI: Classifying the 6 Pitching Outcomes

	Outcome	Outcome	Outcome	Outcome	Outcome	Outcome
	A	B	C	D	E	F
Zone?	Out of Zone	Out of Zone	Out of Zone	In Zone	In Zone	In Zone
Swing?	Swung On	Swung On	No Swing	Swung On	Swung On	No Swing
Contact?	No Contact	Contact Made	No Swing	No Contact	Contact Made	No Swing

Each outcome is then assigned a weight, or an index. A% through F% are the percent of pitches thrown in each outcome. The general formula for wPDI, the Weighted Plate Discipline Index is given as:

wPDI = Index_A * A% + Index_B * B% + Index_C * C% + Index_D * D% + Index_E * E% + Index_F * F%

wPDI can generate an all-in-one sortable metric used to evaluate pitchers. The plate discipline framework may be tailored to mimic (or to correlate to) various measures of deception or effectiveness.

In the first three articles of this series, we developed indices for wPDI to approximate the PitcherList metric, CSW. The Called Strikes + Whiffs (CSW) statistic was featured in last year’s FSWA Research Article of the Year by Alex Fast, and is defined as:

Called Strikes + Whiffs
Total Pitches

We separately tacked the called strikes and whiffs components, and landed on the following wPDI equation to represent CSW:

wPDI_CSW: Pitching Outcome Indexes for CSW

Outcome	Description	Index
A	Out of Zone / Swung On / No Contact	105%
B	Out of Zone / Swung On / Contact Made	0%
C	Out of Zone / No Swing	10%
D	In Zone / Swung On / No Contact	110%
E	In Zone / Swung On / Contact Made	0%
F	In Zone / No Swing	90%

wPDI_CSW = 105% * A% + 10% * C% +110% * D% + 90% * F%

Our wPDI representation of Fast’s CSW formula yielded a 92% correlation coefficient, which can be seen graphically here:

In today’s article, we will go a step further, and tackle the relationship between my plate discipline framework, Fast’s CSW, and a pitcher’s Strikeout Rate (K%). Further, we will develop a wPDI equation to express K% in terms of the six pitching outcomes. As always, along with the research, we will discuss interesting findings as they arise.

CSW & Strikeout Rate (K%)

First, let’s take a look at the relationship between CSW and K%. Below is a sample of pitchers from the top, middle and bottom ranges of their strikeout rate in 2019 (minimum 250 pitches).

Strikeout Rate (K%) to CSW – 2019 K% Leaders

Name	IP	wPDI_CSW	CSW	K%	K% to CSW
Josh Hader	75.7	36.0%	35.1%	47.8%	136.4%
Nick Anderson	65.0	36.2%	35.8%	41.7%	116.5%
Kirby Yates	60.7	31.5%	33.7%	41.6%	123.5%
Austin Adams	32.0	36.4%	35.9%	40.8%	113.8%
Ken Giles	53.0	34.7%	36.3%	39.9%	109.9%
Gerrit Cole	212.3	34.6%	35.7%	39.9%	111.9%
Edwin Diaz	58.0	33.6%	33.3%	39.0%	117.2%
Darwinzon Hernandez	30.3	32.6%	32.0%	38.8%	121.2%
Matt Barnes	64.3	31.6%	33.1%	38.6%	116.8%
Felipe Vazquez	60.0	32.1%	33.2%	38.1%	114.9%
Joshua James	61.3	33.6%	32.0%	37.6%	117.6%
Will Smith	65.3	33.5%	34.6%	37.4%	108.1%
Liam Hendriks	85.0	33.0%	33.2%	37.4%	112.8%
Brandon Workman	71.7	33.5%	34.5%	36.4%	105.6%
Aroldis Chapman	57.0	32.4%	32.3%	36.2%	112.2%
Emilio Pagan	70.0	33.5%	34.4%	36.0%	104.6%
Chris Sale	147.3	34.1%	34.3%	35.6%	103.9%
Jay Jackson	30.3	32.5%	32.9%	35.6%	108.3%
Tommy Kahnle	61.3	33.3%	32.4%	35.5%	109.7%
Justin Verlander	223.0	33.2%	34.0%	35.4%	104.1%
Max Scherzer	172.3	32.4%	34.3%	35.1%	102.5%
Colin Poche	51.7	31.9%	32.1%	34.8%	108.5%
Brad Hand	57.3	31.8%	35.2%	34.7%	98.6%
Tanner Rainey	48.3	32.4%	32.9%	34.6%	105.2%
Tyler Duffey	57.7	31.0%	31.7%	34.5%	108.7%
Ryan Pressly	54.3	36.5%	36.2%	34.1%	94.2%
Mike Clevinger	126.0	32.9%	33.8%	33.9%	100.2%
.
.
.
Anthony DeSclafani	166.7	26.4%	27.4%	24.0%	87.7%
Brandon Brennan	47.3	31.4%	30.2%	24.0%	79.5%
Jose Urquidy	41.0	31.0%	31.2%	24.0%	77.0%
Mark Melancon	67.3	28.0%	30.2%	23.9%	79.0%
Matt Hall	23.3	28.5%	31.3%	23.9%	76.3%
A.J. Minter	29.3	28.0%	28.6%	23.8%	83.3%
Dan Winkler	21.7	29.3%	30.2%	23.7%	78.5%
Kyle McGowin	16.0	29.3%	29.2%	23.7%	81.2%
Zack Wheeler	195.3	27.2%	27.9%	23.6%	84.5%
Jon Gray	150.0	29.4%	28.9%	23.6%	81.6%
Derek Law	60.7	29.4%	30.4%	23.5%	77.3%
Daniel Hudson	73.0	26.3%	27.4%	23.4%	85.3%
Drew Smyly	114.0	28.7%	28.7%	23.4%	81.6%
Michael Pineda	146.0	28.4%	28.5%	23.3%	81.8%
Fernando Rodney	47.7	29.3%	28.6%	23.3%	81.5%
Tyler Skaggs	79.7	26.8%	27.1%	23.3%	86.1%
Shawn Armstrong	58.0	27.1%	26.2%	23.3%	88.8%
Ryan Burr	19.7	26.7%	28.7%	23.3%	81.3%
Clay Holmes	50.0	29.9%	30.2%	23.3%	77.2%
Jose Berrios	200.3	28.8%	29.8%	23.2%	78.0%
Marcus Walden	78.0	28.5%	28.6%	23.2%	81.0%
Mike Minor	208.3	28.4%	29.4%	23.2%	78.9%
Cole Hamels	141.7	28.0%	29.0%	23.2%	79.9%
Tyler Mahle	129.7	27.7%	30.5%	23.2%	76.0%
Nestor Cortes	66.7	27.1%	28.8%	23.2%	80.5%
Josh Lucas	15.7	25.0%	25.3%	23.2%	91.8%
Nathan Eovaldi	67.7	28.9%	29.6%	23.2%	78.4%
Dylan Bundy	161.7	29.6%	29.9%	23.1%	77.3%
Zack Greinke	208.7	27.8%	29.8%	23.1%	77.6%
Yusmeiro Petit	83.0	25.7%	28.6%	23.1%	80.8%
Austin Brice	44.7	28.5%	30.6%	23.1%	75.6%
Jeurys Familia	60.0	30.7%	30.3%	23.0%	76.0%
.
.
.
Antonio Senzatela	124.7	23.5%	23.3%	13.1%	56.2%
Reed Garrett	15.3	22.1%	20.7%	13.0%	62.8%
Richard Bleier	55.3	24.2%	24.5%	12.8%	52.2%
Dario Agrazal	73.3	25.5%	26.1%	12.8%	49.0%
Ryan Carpenter	40.7	22.8%	24.8%	12.7%	51.2%
Mike Morin	50.7	26.0%	25.8%	12.4%	48.0%
Erick Fedde	78.0	23.2%	23.5%	12.3%	52.4%
Dan Otero	29.7	22.2%	23.6%	12.2%	51.8%
Brett Anderson	176.0	25.2%	24.6%	12.1%	49.2%
Zac Reininger	28.0	22.9%	23.6%	12.1%	51.2%
Scott Alexander	17.3	24.2%	26.3%	11.8%	44.9%
Brock Burke	26.7	22.4%	22.9%	11.7%	51.1%
Reggie McClain	21.0	25.3%	22.0%	11.6%	52.6%
Pat Neshek	18.0	27.5%	30.6%	11.4%	37.2%
Clayton Richard	45.3	21.5%	22.7%	11.0%	48.5%
James Marvel	17.3	22.8%	23.3%	10.7%	46.0%
Marco Estrada	23.7	23.1%	24.3%	10.4%	42.9%
Odrisamer Despaigne 데스파이네	13.3	17.5%	19.2%	10.3%	53.6%
Matt Koch	20.7	20.7%	20.3%	9.4%	46.2%
Kohl Stewart	25.3	24.5%	25.8%	9.2%	35.7%
Ervin Santana	13.3	21.3%	18.4%	7.8%	42.4%
Josh Rogers	14.3	19.1%	21.3%	7.3%	34.3%
Eric Skoglund	21.0	19.3%	18.2%	4.0%	21.9%

Minimum 250 pitches in 2019.

It is evidently clear that CSW is positively correlated with K%. A higher CSW will typically correspond to a higher K%. For the 2019 baseball season, the corresponding R-Squared is a very decent 67%. Alex Fast, in his article, CSW Rate: An Intro to an Important New Metric cited that the R-Squared over multiple seasons studied was an even more robust 72%.

We also notice that the spread of the relationship grows with the value. At the top of the K% leaderboard, the K% to CSW ratio is often above 100%. In the middle of the field, the ratio hovers near 80% (which is the average ratio). Towards the very bottom, the K% to CSW ratio is closer to 50%. To note, the average ratio of K% to wPDI_CSW is a bit closer to unity, sitting at 82%.

Below are the associated graphs of K% vs. CSW, and K% vs. wPDI_CSW:

As Fast pointed out in his article, CSW is indeed a worthwhile predictor of strikeout rate. The question now becomes, can we do better – and more specifically, could we do better with the wPDI framework?

We have showed that CSW is best modeled using only four out of six of the possible pitching outcomes – Outcomes A, C, D and F. Removing the two contact outcomes – B (contact of the zone) and E (contact in the zone) led to a more meaningful wPDI approximation formula of CSW.

Would including contact outcomes lead to better predictability of strikeouts?

Let’s find out …

wPDI & Strikeout Rate (K%)

The first order of statistical business is to take a look at the full regression equation using all six pitching outcomes, A through F. Using all variables to model K%, we obtain the following wPDI indexes:

K% – Regression Indexes for wPDI – First Attempt

Index A	Index B	Index C	Index D	Index E	Index F
146.6%	-28.8%	3.8%	226.2%	-19.7%	72.2%

The associated R-Squared of the regression is 70.8%, which is more robust than the 67.0% obtained when using CSW alone. That is a result that I had hoped for!

There are a number of interesting observations that I discern from the calculated indexes (as far as predicting the strikeout rate using wPDI):

Contact made (Outcomes B & E) has a negative effect on the strikeout rate.
A swing and miss inside the zone (Outcome D) is 54% more valuable than a swing and miss inside the zone (Outcome A).
A called strike (generally, Outcome F) is far less valuable than a swinging strike (Outcomes A & D).
Not swinging outside of the zone (Outcome C) has a very slightly positive effect on the strikeout rate.

Observation #1 seems fairly obvious. Aside from foul balls, we expect that contact does not assist with a strikeout. Even for a foul, although it may add to the strike count on a batter without adding to the ball count – it will rarely directly produce a strikeout. A foul tip straight into the catcher’s mitt would be the only scenario.

Observation #2 is interesting and insightful. I originally conjectured that a strike outside of the zone is more deceptive than a strike inside the zone. [The Maddux Plate Discipline Index (mPDI) is based off of this very premise.] However, the K% regression analysis indicates that swings and misses inside the zone are more predictive of strikeout rate than those outside of the zone.

Observation #3 is fairly understandable. Though more than half of all strikes arise from called ones (58% in 2019), approximately three-fourths of all strikeouts arise as the result of a swing. It is the final pitch that matters more. To use a football parallel, the 1st string runningback might get the bulkload of a team’s yardage, and take the team 99 yards to the goal line. However, if the power running back comes in for the final yard, he is the player to score the touchdown and put up six points on the scoreboard. The one who finishes the drive often matters more.

Observation #4 seems counter-intuitive. Outcome C, which is taking a pitch outside of the zone – is most often called a ball. Unless the umpire is in conflict with technology, Outcome C should not generate a strikeout, and hence produce a non-positive index. Yes, 3.8% is a fairly low figure [noise], but I would have imagined that it should have been signed negatively, if any. As an aside, one should note that our wPDI_CSW equation did include Outcome C, in the called strike portion – due to umpire error.

In attempting to clean up the regression, our next step is to perform the analysis using only the positive coefficients. Ideally, it would be better not to use negatively signed coefficients for baseball equations, so let’s give it a shot. For our second go-around, we will regress on only outcomes A, C, D & F (which correspond exactly to the wPDI_CSW indexes).

K% – Regression Indexes for wPDI – Second Attempt

Index A	Index C	Index D	Index F
152.4%	-10.9%	205.0%	60.4%

For the limited regression, the R-Squared went down to 67.8% – which is not an optimal result. What is also interesting to me, is that the sign has now flipped for the outcome C index. Without the contact indexes (Outcomes B & E) present, the regression aligned its sign with our intuition!

With the low value of the C index, and with the non-zero B & E indexes, my next regression refinement run will include all outcomes other than the “called ball.” The following indexes were generated, which produced a higher R-Squared of 70.7%:

K% – Regression Indexes for wPDI – Final Attempt

Index A	Index B	Index D	Index E	Index F
149.9%	-25.7%	224.8%	-17.2%	75.4%

Yes, we have a few negatively signed indexes, but they are both intuitive and mathematically needed. For the final plate discipline equation, I will round all indexes to the nearest 5%. I will officially define wPDI_K as:

wPDI_K: Pitching Outcome Indexes for K%

Outcome	Description	Index
A	Out of Zone / Swung On / No Contact	150%
B	Out of Zone / Swung On / Contact Made	-20%
C	Out of Zone / No Swing	0%
D	In Zone / Swung On / No Contact	225%
E	In Zone / Swung On / Contact Made	-20%
F	In Zone / No Swing	75%

wPDI_K = 150% * A% – 25% * B% + 225% * D% – 20% * E% + 75% * F%

The wPDI approximation of a pitcher’s strikeout rate (K%) yielded an 84% correlation coefficient, which can be seen graphically here:

This is a nice result, in that we can approximate a pitcher’s strikeout rate fairly well simply by enumerating if a ball is thrown in the zone, if it is swung on, and if it is contacted.

Although CSW does a good job of foretelling a pitcher’s strikeout rate, wPDI_K is a slightly closer representation.

–

We still have plenty of more work to perform with wPDI. In my next article, I will use the plate discipline framework to model a pitcher’s walk rate.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG