Today, I will attempt to develop a simple pitcher metric. My exercise will provide us with a recapitulation of the plate discipline data at our disposal, while at the same time afford us the opportunity to unearth some fascinating pitching tendencies of lesser known hurlers.
To do this, let’s start with the basic ingredients of plate discipline, from the point of view of the pitcher.
We can break down any pitch into these simple binary events:
- Was the ball thrown in the strike zone?
- Was the ball swung on?
- Did the batter make contact with the ball?
On FanGraphs, we have a multitude of statistics which track these plate discipline possibilities. They are available here for every major league player. [Definitions can be found here in our library.]
Let’s now define a few quantities which enumerate the resulting outcomes (for the binary events).
1) Was the ball thrown in the strike zone?
Zone% = Pitches in the strike zone / Total pitches
- Zone% = The percent of pitches which are thrown in the zone.
- (1-Zone%) = The percent of pitches which are thrown out of the zone.
2) Was the ball swung on?
Z-Swing% = Swings at pitches inside the zone / pitches inside the zone
O-Swing% = Swings at pitches outside the zone / pitches outside the zone
The denominators of these two quantities are a subset of all pitches (i.e. Z-Swing is only for pitches in the zone, and O-Swing is only for pitches outside of the zone). To reflect these outcomes as a percent of ALL pitches, the quantities are:
- (Zone%) * Z-Swing% = The percent of all pitches which are in the zone & that are swung on.
- (Zone%) * (1- Z-Swing%) = The percent of all pitches which are in the zone & that are not swung on.
- (1 – Zone%) * O-Swing% = The percent of all pitches which are out of the zone & that are swung on.
- (1 – Zone%) * (1- O-Swing%) = The percent of all pitches which are out of the zone & that are not swung on.
3) Did the batter make contact with the ball?
Z-Contact% = Number of pitches on which contact was made on pitches inside the zone / Swings on pitches inside the zone
O-Contact% = Number of pitches on which contact was made on pitches outside the zone / Swings on pitches outside the zone
Now onto contact. The denominators once again are a subset of all pitches (i.e. Z-Contact is only for pitches swung on in the zone, and O-Contact is only for pitches swung on outside of the zone). To reflect these outcomes as a percent of ALL pitches, the quantities are:
- (Zone%) * (Z-Swing%) * Z-Contact% = The percent of all pitches which are in the zone, swung on & contact is made.
- (Zone%) * (Z-Swing%) * (1 – Z-Contact%) = The percent of all pitches which are in the zone, swung on & contact is not made.
- (1 – Zone%) * (O-Swing%) * O-Contact% = The percent of all pitches which are out of the zone, swung on & contact is made.
- (1 – Zone%) * (O-Swing%) * (1 – O-Contact%) = The percent of all pitches which are out of the zone, swung on & contact is not made.
We listed three binary events – each with exactly two possible results. By pure multiplication, there would be 23 = 8 possible outcomes. However, because a pitch that is not swung on by definition cannot have any contact made – we may remove 2 of the 8 scenarios. We are now left with just 6 possible outcomes, for which we already have defined said quantities.
From a pitcher’s perspective, some of these outcomes are better than others. Let’s talk about each of them for a bit.
|Zone?||Out of Zone||Out of Zone||Out of Zone||In Zone||In Zone||In Zone|
|Swing?||Swung On||Swung On||No Swing||Swung On||Swung On||No Swing|
|Contact?||No Contact||Contact Made||No Swing||No Contact||Contact Made||No Swing|
Questions: How can we rank these 6 outcomes? Which of these 6 outcomes are good for pitchers? Which are not good? Which are somewhere in the middle?
Outcome A – Out of Zone / Swung On / No Contact – I would classify this as a very desirable outcome. In fact, I think that it is the most desirable. The batter shouldn’t be swinging at a pitch out of the zone in the first place – and on top of that – he didn’t make contact. The pitch will be counted as a strike. It is a very effective and deceptive pitch.
Outcome B – Out of Zone / Swung On / Contact Made – I would classify this as a generally desirable outcome, but it is in the middle. Pitchers should want batters swinging at pitches outside the zone. The batter shouldn’t be swinging, but it is far from the best outcome – the batter did make contact.
Outcome C – Out of Zone / No Swing – I would classify this as a generally undesirable outcome. Unless you have a particularly good catcher who frames well or unless you have a lousy umpire, 85-90% of these pitches will end up as balls. Obviously, you won’t give up base hits on these pitches – but as far as pitcher effectiveness, I wouldn’t classify this as a positive pitch. Looking at the flip side, it’s a very desirable outcome for a batter.
Outcome D – In Zone / Swung On / No Contact – This outcome is extremely desirable, as it will result in a strike. However, I would rank this outcome lower than A – which was out of the zone. Pitchers should desire swings on pitches outside the zone, rather than inside it.
Outcome E – In Zone / Swung On / Contact – This outcome is the least desirable. The pitch was in the zone, and the ball was struck. This is the largest outcome for generating hits and runs for the opposing batter. Obviously, it is possible for a foul ball to ensue or for weak contact to be generated in fair play. Compared to the other outcomes, I would rank this one as the most inferior.
Outcome F – In Zone / No Swing – A highly desirable outcome. The pitch will be called a strike with high probability, unless you have a poor catcher framer, or a poor umpire.
Ranking the above:
|A||Out of Zone / Swung On / No Contact||100%|
|D||In Zone / Swung On / No Contact||90%|
|F||In Zone / No Swing||80%|
|B||Out of Zone / Swung On / Contact Made||65%|
|C||Out of Zone / No Swing||10%|
|E||In Zone / Swung On / Contact Made||0%|
The indexes that I provide are from 0% to 100%. The most desirable is at 100%, with the least desirable at a 0%. These aren’t arbitrary, although for the moment, they aren’t formed on a purely mathematical basis. The rough idea is:
- A & E are clearly the top and bottom to set the range – 100% & 0%.
- D is set at 10% lower than A to show the more desirable outcome of generating a swinging strike out of the zone.
- In a recent Twitter poll that I conducted (see the additional notes below) surveyors concluded that F ranks lower than D. F is then set to be 10% lower.
- B is a middle outcome, but on the positive side. We need to set it over 50%. It is positive since it generates a swing outside of the strike zone, despite the contact. I have set it at 65%.
- C is set at 10% to reflect a conservative 10% chance of getting a pitch out of the zone called for a strike.
Now let’s put all of this together and define a new statistic!
Outcomes as % of All Pitches
Here are the outcome definitions in terms of the plate discipline metrics. A% + … + F % = 100%
A% = Out of Zone / Swung On / No Contact = (1 – Zone%) * (O-Swing%) * (1 – O-Contact%)
B% = Out of Zone / Swung On / Contact Made = (1 – Zone%) * (O-Swing%) * O-Contact%
C% = Out of Zone / No Swing = (1 – Zone%) * (1- O-Swing%)
D% = In Zone / Swung On / No Contact = (Zone%) * (Z-Swing%) * (1 – Z-Contact%)
E% = In Zone / Swung On / Contact Made = (Zone%) * (Z-Swing%) * Z-Contact%
F% = In Zone / No Swing = (Zone%) * (1- Z-Swing%)
Weighted Plate Discipline Index (wPDI) for Pitchers:
The formula for wPDI, the Weighted Plate Discipline Index:
wPDI = IndexA * A% + IndexB * B% + IndexC * C% + IndexD * D% + IndexE * E% + IndexF * F%
Similar to wOBA, this weighted index awards higher values to the better outcomes. It meaningfully aggregates pitcher plate discipline outcomes. It is a way to compare pitchers via one single value.
The indexes are obviously the key. For now, let’s peek at the leaderboards using the proposed indexes. Let’s see if the list of the top pitchers coincides with 2018 surface stats. Let’s also see if we can generate some interesting findings.
The leaderboards below have been generated entirely from 2018 plate discipline data:
Chris Sale sits atop the starting pitcher wPDI leaderboard. Other notable recognizable names in the top 10 include deGrom, Scherzer, Snell, Carrasco and Verlander. Patrick Corbin makes this list at #2 mostly due to his high A and low E components; he generated a lot of swings and misses outside of the zone and produced little contact inside the zone.
Domingo German is a player that stands out within the top 10. He especially exceled in outcome F – making sure that batters did not even swing at his pitches in the zone. Keep an eye on him to start 2019. Last night, in German’s victory over the Tigers – his wPDI was .409!
Also, interesting to see within the top 25 are Jason Vargas and Marco Gonzales. Vargas excelled at E & F – some of the in-zone outcomes. Gonzales excelled at B & C – some of the out-of-zone outcomes. Gonzales got a lot of batters to make contact on pitches outside of the zone.
Of note is Collin McHugh at #6, although he should technically be classified as a reliever as far as 2018 goes. In his first start to ’19 – he produced a .416 wPDI.
Recognizable elite relief pitchers include Diaz, Treinen, Chapman and Betances, who are found within the top ten. Kirby Yates, Josh Hader and Will Smith are other closers found near the top.
Oliver Perez at #4? Well, there must be a reason why his is still employed and still being used in decently high leverage situations. Maybe this explains it.
Atop all relievers though, was Ryan Pressly – who led all pitchers in 2018 at .401 [min 5 IP]. Pressly exhibited elite A, C and F components. That is, Pressly avoided bats extremely well. He generated lots of swings and misses on out-of-zone pitches, yet when the batter didn’t swing – it was often a strike. If Osuna faulters in 2019, it’s clear who should be given the next save opportunity.
Jace Fry excelled in components A, E and F – which is somewhat similar to Pressly. Fry threw a few more balls out of the zone which were not swung on, but he even further limited contact on balls swung on in the zone.
- The maximum wPDI in 2018 was around .400, with the lowest (not shown) around .250. The average wPDI across all pitchers was approximately .325.
- Since A% + B% + C% + D% + E% + F% = 100%, if I had used an index of 100% for each of the outcomes, all pitchers would have exhibited a value of 100%.
- wPDI does not currently consider other possibly useful modifiers such as contact type (hard/medium/soft), or call data (called strikes, called balls), etc. Instead, wPDI contemplates only 3 binary events. wPDI currently goes for simplicity – breaking everything down into only 6 possible outcomes.
- I took to Twitter to help me with the ranking of outcomes (polls conducted here, here and here). I tried to incorporate poll relativity in creating the initial indexes.
- I completely disagreed with Twitters ranking of A vs. D. Twitter slightly preferred swinging / no contact while in the zone over out of the zone. If I am pitcher – I’d much rather induce a swing on a lousy pitch than at a good one.
- I also disagreed somewhat with Twitter’s ranking of B vs. F – which the voters seemed to be evenly split on. To me, not generating a swing on a pitch in the zone is more desirable than getting contact on an outside pitch.
- wPDI is a skills-based metric. At some point into 2019, we will be able to see which pitchers exhibit skills growth and decline.
- Although one single game is still a small sample size, it’s nice that wPDI can produce a “game score.” Theoretically, one can track wPDI game to game, and consider rolling averages. For wOBA, where the denominator is plate appearances – one game is an extremely small sample size. With wPDI – the denominator is pitches, which will converge a lot faster.
All in all, this was a very useful exercise. Looking at the individual components of wPDI can tell you a lot about the effectiveness and deceptive characteristics of pitchers.
There is more work to be done on wPDI, starting with the indexes. There may be index values which nicely correlate outcomes to strikeouts, or which correlates outcomes to the limiting of walks, etc. This was a first, but meaningful attempt. We also need to ask the question of whether to add more complexity to the metric, or to keep it simple. Should we limit wPDI to these 6 outcomes, or should we add in some other binary events and expand?
wPDI is not fully ready for prime time just yet. I first wanted to establish and to demonstrate the concept. You, the collective readers of this website, are the best possible source of feedback.
Ariel was a finalist for two 2018 FSWA Awards - Baseball Article of the Year, and Baseball Writer of the Year. Ariel is the creator of the ATC (Average Total Cost) Projection System. Ariel also writes for CBS Sports and Sportsline, and is the host of the Great Fantasy Baseball Invitational - Beat the Shift Podcast. Ariel and his fantasy partner, Reuven Guy, have used the ATC system projections to finish in the money in several NFBC, RTSports, Doubt Wars and other national leagues, racking up several division titles. Ariel is a member of the inaugural Tout Wars Drat & Hold League. Ariel Cohen is a fellow of the Casualty Actuarial Society (CAS) and the Society of Actuaries (SOA). He is a Vice President of Risk Management for a large international insurance and reinsurance company. Follow Ariel on Twitter at @ATCNY.