wPDI & CSW: Strikeout Rate

Introduction

This is the fourth article in my wPDI vs. CSW series. You can catch up by reading the first three articles – on called strikes, whiffs and residuals.

Here is a quick summary of some of the basics of wPDI & CSW from this series:

Last year, I developed the Weighted Plate Discipline Index (wPDI) framework, whereby all pitches can be classified into six different outcomes as follows:

wPDI: Classifying the 6 Pitching Outcomes
Outcome Outcome Outcome Outcome Outcome Outcome
A B C D E F
Zone? Out of Zone Out of Zone Out of Zone In Zone In Zone In Zone
Swing? Swung On Swung On No Swing Swung On Swung On No Swing
Contact? No Contact Contact Made No Swing No Contact Contact Made No Swing

Each outcome is then assigned a weight, or an index. A% through F% are the percent of pitches thrown in each outcome. The general formula for wPDI, the Weighted Plate Discipline Index is given as:

wPDI = IndexA * A% + IndexB * B% + IndexC * C% + IndexD * D% + IndexE * E% + IndexF * F%

wPDI can generate an all-in-one sortable metric used to evaluate pitchers. The plate discipline framework may be tailored to mimic (or to correlate to) various measures of deception or effectiveness.

In the first three articles of this series, we developed indices for wPDI to approximate the PitcherList metric, CSW. The Called Strikes + Whiffs (CSW) statistic was featured in last year’s FSWA Research Article of the Year by Alex Fast, and is defined as:

Called Strikes + Whiffs
Total Pitches

We separately tacked the called strikes and whiffs components, and landed on the following wPDI equation to represent CSW:

wPDICSW: Pitching Outcome Indexes for CSW
Outcome Description Index
A Out of Zone / Swung On / No Contact 105%
B Out of Zone / Swung On / Contact Made 0%
C Out of Zone / No Swing 10%
D In Zone / Swung On / No Contact 110%
E In Zone / Swung On / Contact Made 0%
F In Zone / No Swing 90%

wPDICSW = 105% * A% +  10% * C% +110% * D% + 90% * F%

Our wPDI representation of Fast’s CSW formula yielded a 92% correlation coefficient, which can be seen graphically here:

In today’s article, we will go a step further, and tackle the relationship between my plate discipline framework, Fast’s CSW, and a pitcher’s Strikeout Rate (K%). Further, we will develop a wPDI equation to express K% in terms of the six pitching outcomes. As always, along with the research, we will discuss interesting findings as they arise.

CSW & Strikeout Rate (K%)

First, let’s take a look at the relationship between CSW and K%. Below is a sample of pitchers from the top, middle and bottom ranges of their strikeout rate in 2019 (minimum 250 pitches).

Strikeout Rate (K%) to CSW – 2019 K% Leaders
Name IP wPDICSW CSW K% K% to CSW
Josh Hader 75.7 36.0% 35.1% 47.8% 136.4%
Nick Anderson 65.0 36.2% 35.8% 41.7% 116.5%
Kirby Yates 60.7 31.5% 33.7% 41.6% 123.5%
Austin Adams 32.0 36.4% 35.9% 40.8% 113.8%
Ken Giles 53.0 34.7% 36.3% 39.9% 109.9%
Gerrit Cole 212.3 34.6% 35.7% 39.9% 111.9%
Edwin Diaz 58.0 33.6% 33.3% 39.0% 117.2%
Darwinzon Hernandez 30.3 32.6% 32.0% 38.8% 121.2%
Matt Barnes 64.3 31.6% 33.1% 38.6% 116.8%
Felipe Vazquez 60.0 32.1% 33.2% 38.1% 114.9%
Joshua James 61.3 33.6% 32.0% 37.6% 117.6%
Will Smith 65.3 33.5% 34.6% 37.4% 108.1%
Liam Hendriks 85.0 33.0% 33.2% 37.4% 112.8%
Brandon Workman 71.7 33.5% 34.5% 36.4% 105.6%
Aroldis Chapman 57.0 32.4% 32.3% 36.2% 112.2%
Emilio Pagan 70.0 33.5% 34.4% 36.0% 104.6%
Chris Sale 147.3 34.1% 34.3% 35.6% 103.9%
Jay Jackson 30.3 32.5% 32.9% 35.6% 108.3%
Tommy Kahnle 61.3 33.3% 32.4% 35.5% 109.7%
Justin Verlander 223.0 33.2% 34.0% 35.4% 104.1%
Max Scherzer 172.3 32.4% 34.3% 35.1% 102.5%
Colin Poche 51.7 31.9% 32.1% 34.8% 108.5%
Brad Hand 57.3 31.8% 35.2% 34.7% 98.6%
Tanner Rainey 48.3 32.4% 32.9% 34.6% 105.2%
Tyler Duffey 57.7 31.0% 31.7% 34.5% 108.7%
Ryan Pressly 54.3 36.5% 36.2% 34.1% 94.2%
Mike Clevinger 126.0 32.9% 33.8% 33.9% 100.2%
.
.
.
Anthony DeSclafani 166.7 26.4% 27.4% 24.0% 87.7%
Brandon Brennan 47.3 31.4% 30.2% 24.0% 79.5%
Jose Urquidy 41.0 31.0% 31.2% 24.0% 77.0%
Mark Melancon 67.3 28.0% 30.2% 23.9% 79.0%
Matt Hall 23.3 28.5% 31.3% 23.9% 76.3%
A.J. Minter 29.3 28.0% 28.6% 23.8% 83.3%
Dan Winkler 21.7 29.3% 30.2% 23.7% 78.5%
Kyle McGowin 16.0 29.3% 29.2% 23.7% 81.2%
Zack Wheeler 195.3 27.2% 27.9% 23.6% 84.5%
Jon Gray 150.0 29.4% 28.9% 23.6% 81.6%
Derek Law 60.7 29.4% 30.4% 23.5% 77.3%
Daniel Hudson 73.0 26.3% 27.4% 23.4% 85.3%
Drew Smyly 114.0 28.7% 28.7% 23.4% 81.6%
Michael Pineda 146.0 28.4% 28.5% 23.3% 81.8%
Fernando Rodney 47.7 29.3% 28.6% 23.3% 81.5%
Tyler Skaggs 79.7 26.8% 27.1% 23.3% 86.1%
Shawn Armstrong 58.0 27.1% 26.2% 23.3% 88.8%
Ryan Burr 19.7 26.7% 28.7% 23.3% 81.3%
Clay Holmes 50.0 29.9% 30.2% 23.3% 77.2%
Jose Berrios 200.3 28.8% 29.8% 23.2% 78.0%
Marcus Walden 78.0 28.5% 28.6% 23.2% 81.0%
Mike Minor 208.3 28.4% 29.4% 23.2% 78.9%
Cole Hamels 141.7 28.0% 29.0% 23.2% 79.9%
Tyler Mahle 129.7 27.7% 30.5% 23.2% 76.0%
Nestor Cortes 66.7 27.1% 28.8% 23.2% 80.5%
Josh Lucas 15.7 25.0% 25.3% 23.2% 91.8%
Nathan Eovaldi 67.7 28.9% 29.6% 23.2% 78.4%
Dylan Bundy 161.7 29.6% 29.9% 23.1% 77.3%
Zack Greinke 208.7 27.8% 29.8% 23.1% 77.6%
Yusmeiro Petit 83.0 25.7% 28.6% 23.1% 80.8%
Austin Brice 44.7 28.5% 30.6% 23.1% 75.6%
Jeurys Familia 60.0 30.7% 30.3% 23.0% 76.0%
.
.
.
Antonio Senzatela 124.7 23.5% 23.3% 13.1% 56.2%
Reed Garrett 15.3 22.1% 20.7% 13.0% 62.8%
Richard Bleier 55.3 24.2% 24.5% 12.8% 52.2%
Dario Agrazal 73.3 25.5% 26.1% 12.8% 49.0%
Ryan Carpenter 40.7 22.8% 24.8% 12.7% 51.2%
Mike Morin 50.7 26.0% 25.8% 12.4% 48.0%
Erick Fedde 78.0 23.2% 23.5% 12.3% 52.4%
Dan Otero 29.7 22.2% 23.6% 12.2% 51.8%
Brett Anderson 176.0 25.2% 24.6% 12.1% 49.2%
Zac Reininger 28.0 22.9% 23.6% 12.1% 51.2%
Scott Alexander 17.3 24.2% 26.3% 11.8% 44.9%
Brock Burke 26.7 22.4% 22.9% 11.7% 51.1%
Reggie McClain 21.0 25.3% 22.0% 11.6% 52.6%
Pat Neshek 18.0 27.5% 30.6% 11.4% 37.2%
Clayton Richard 45.3 21.5% 22.7% 11.0% 48.5%
James Marvel 17.3 22.8% 23.3% 10.7% 46.0%
Marco Estrada 23.7 23.1% 24.3% 10.4% 42.9%
Odrisamer Despaigne 데스파이네 13.3 17.5% 19.2% 10.3% 53.6%
Matt Koch 20.7 20.7% 20.3% 9.4% 46.2%
Kohl Stewart 25.3 24.5% 25.8% 9.2% 35.7%
Ervin Santana 13.3 21.3% 18.4% 7.8% 42.4%
Josh Rogers 14.3 19.1% 21.3% 7.3% 34.3%
Eric Skoglund 21.0 19.3% 18.2% 4.0% 21.9%
Minimum 250 pitches in 2019.

It is evidently clear that CSW is positively correlated with K%. A higher CSW will typically correspond to a higher K%. For the 2019 baseball season, the corresponding R-Squared is a very decent 67%. Alex Fast, in his article, CSW Rate: An Intro to an Important New Metric cited that the R-Squared over multiple seasons studied was an even more robust 72%.

We also notice that the spread of the relationship grows with the value. At the top of the K% leaderboard, the K% to CSW ratio is often above 100%. In the middle of the field, the ratio hovers near 80% (which is the average ratio). Towards the very bottom, the K% to CSW ratio is closer to 50%. To note, the average ratio of K% to wPDICSW is a bit closer to unity, sitting at 82%.

Below are the associated graphs of K% vs. CSW, and K% vs. wPDICSW:

As Fast pointed out in his article, CSW is indeed a worthwhile predictor of strikeout rate. The question now becomes, can we do better – and more specifically, could we do better with the wPDI framework?

We have showed that CSW is best modeled using only four out of six of the possible pitching outcomes – Outcomes A, C, D and F. Removing the two contact outcomes – B (contact of the zone) and E (contact in the zone) led to a more meaningful wPDI approximation formula of CSW.

Would including contact outcomes lead to better predictability of strikeouts?

Let’s find out …

wPDI & Strikeout Rate (K%)

The first order of statistical business is to take a look at the full regression equation using all six pitching outcomes, A through F. Using all variables to model K%, we obtain the following wPDI indexes:

K% – Regression Indexes for wPDI – First Attempt
Index A Index B Index C Index D Index E Index F
146.6% -28.8% 3.8% 226.2% -19.7% 72.2%

The associated R-Squared of the regression is 70.8%, which is more robust than the 67.0% obtained when using CSW alone. That is a result that I had hoped for!

There are a number of interesting observations that I discern from the calculated indexes (as far as predicting the strikeout rate using wPDI):

1. Contact made (Outcomes B & E) has a negative effect on the strikeout rate.
2. A swing and miss inside the zone (Outcome D) is 54% more valuable than a swing and miss inside the zone (Outcome A).
3. A called strike (generally, Outcome F) is far less valuable than a swinging strike (Outcomes A & D).
4. Not swinging outside of the zone (Outcome C) has a very slightly positive effect on the strikeout rate.

Observation #1 seems fairly obvious. Aside from foul balls, we expect that contact does not assist with a strikeout. Even for a foul, although it may add to the strike count on a batter without adding to the ball count – it will rarely directly produce a strikeout. A foul tip straight into the catcher’s mitt would be the only scenario.

Observation #2 is interesting and insightful. I originally conjectured that a strike outside of the zone is more deceptive than a strike inside the zone. [The Maddux Plate Discipline Index (mPDI) is based off of this very premise.] However, the K% regression analysis indicates that swings and misses inside the zone are more predictive of strikeout rate than those outside of the zone.

Observation #3 is fairly understandable. Though more than half of all strikes arise from called ones (58% in 2019), approximately three-fourths of all strikeouts arise as the result of a swing. It is the final pitch that matters more. To use a football parallel, the 1st string runningback might get the bulkload of a team’s yardage, and take the team 99 yards to the goal line. However, if the power running back comes in for the final yard, he is the player to score the touchdown and put up six points on the scoreboard. The one who finishes the drive often matters more.

Observation #4 seems counter-intuitive. Outcome C, which is taking a pitch outside of the zone – is most often called a ball. Unless the umpire is in conflict with technology, Outcome C should not generate a strikeout, and hence produce a non-positive index. Yes, 3.8% is a fairly low figure [noise], but I would have imagined that it should have been signed negatively, if any. As an aside, one should note that our wPDICSW equation did include Outcome C, in the called strike portion – due to umpire error.

In attempting to clean up the regression, our next step is to perform the analysis using only the positive coefficients. Ideally, it would be better not to use negatively signed coefficients for baseball equations, so let’s give it a shot. For our second go-around, we will regress on only outcomes A, C, D & F (which correspond exactly to the wPDICSW indexes).

K% – Regression Indexes for wPDI – Second Attempt
Index A Index C Index D Index F
152.4% -10.9% 205.0% 60.4%

For the limited regression, the R-Squared went down to 67.8% – which is not an optimal result. What is also interesting to me, is that the sign has now flipped for the outcome C index. Without the contact indexes (Outcomes B & E) present, the regression aligned its sign with our intuition!

With the low value of the C index, and with the non-zero B & E indexes, my next regression refinement run will include all outcomes other than the “called ball.” The following indexes were generated, which produced a higher R-Squared of 70.7%:

K% – Regression Indexes for wPDI – Final Attempt
Index A Index B Index D Index E Index F
149.9% -25.7% 224.8% -17.2% 75.4%

Yes, we have a few negatively signed indexes, but they are both intuitive and mathematically needed. For the final plate discipline equation, I will round all indexes to the nearest 5%. I will officially define wPDIK as:

wPDIK: Pitching Outcome Indexes for K%
Outcome Description Index
A Out of Zone / Swung On / No Contact 150%
B Out of Zone / Swung On / Contact Made -20%
C Out of Zone / No Swing 0%
D In Zone / Swung On / No Contact 225%
E In Zone / Swung On / Contact Made -20%
F In Zone / No Swing 75%

wPDIK = 150% * A% – 25% * B% + 225% * D% – 20% * E% + 75% * F%

The wPDI approximation of a pitcher’s strikeout rate (K%) yielded an 84% correlation coefficient, which can be seen graphically here:

This is a nice result, in that we can approximate a pitcher’s strikeout rate fairly well simply by enumerating if a ball is thrown in the zone, if it is swung on, and if it is contacted.

Although CSW does a good job of foretelling a pitcher’s strikeout rate, wPDIK is a slightly closer representation.

We still have plenty of more work to perform with wPDI. In my next article, I will use the plate discipline framework to model a pitcher’s walk rate.

Ariel is the 2019 FSWA Baseball Writer of the Year. He is the creator of the ATC (Average Total Cost) Projection System. Ariel was ranked by FantasyPros as the #1 fantasy baseball expert in 2019. His ATC Projections were ranked as the #1 most accurate projection system in 2019. Ariel also writes for CBS Sports, SportsLine, RotoBaller, and is the host of the Beat the Shift Podcast (@Beat_Shift_Pod). Ariel is a member of the inaugural Tout Wars Draft & Hold league, a member of the inaugural Mixed LABR Auction league and plays high stakes contests in the NFBC. Ariel is the 2020 Tout Wars Head to Head League Champion. Ariel Cohen is a fellow of the Casualty Actuarial Society (CAS) and the Society of Actuaries (SOA). He is a Vice President of Risk Management for a large international insurance and reinsurance company. Follow Ariel on Twitter at @ATCNY.

Member
Member
scoutingdept

wPDIsubK needs a more marketable name