Archive for Meta Analysis

2023 Projection Showdown — THE BAT X vs Steamer Home Run Forecasts, Part 1, A Review

Now that the regular season has ended, it’s recap time! Over the next couple of weeks (months?), I’ll be reviewing all my preseason articles. I want to always be held accountable for the advice I provide, but also it’s fun to find out what actually happened and if I was right. We’ll start with the first in my new series this year, the 2023 Projection Showdown, which pitted THE BAT X against Steamer in various hitting categories. We begin with the first category of home runs. In part 1, I identified the hitters who THE BAT X projected for a higher 600 at-bat home run pace than Steamer. Let’s find out which projection system proved closer.

Read the rest of this entry »


Simplifying My Life: Power and Contact Thresholds

There are too many stats (“Welcome to FanGraphs”), so I decided to take a step back and try to remove as much noise as possible when making decisions. I’m not reinventing any concept, just concentrating on the most important factors. The fewer, the better. Today, I’m going to focus on my “new” power factor and mention how I settled on Contact%.

I know several other sources have a focus on keeping their inputs basic, but each one disagrees with the results. I decided to add to the disagreement and pick out the best options for the standard roto game. Read the rest of this entry »


Linking STUFFF Changes to Fantasy Relevant Stats

I have a major love-hate relationship with the STUFFF metrics. After just a few pitches, useful information becomes available to determine if a pitcher has improved or not. On the other hand, the issue I have against STUFFF is the lack of transparency and values change as the dataset increases. With all the STUFFF talk, all I want to know is how changes in it will affect a pitcher’s fantasy-relevant stats. In my first article, I set some ERA baselines for the STUFFF values. The next step is to understand what a change in a STUFFF value has on a pitcher. For example, if I hear their Stuff+ jumps from 90 to 110, why should I care? Is the pitcher’s ERA going to drop by 1.00 or by 0.10 or not at all? I decided to just make a major data dump to have a reference when a STUFFF value does move.

Caution: The following values may or may not be predictive. They could just be descriptive. There is just not enough information (2 years of information) to run any ideal predictive test at this point, especially with STUFFF’s vagueness and everchanging nature.

Read the rest of this entry »


Upgrading My Individual Pitch Result Metric

On a personal level, the All-Star break can be declared a success as I’ve made major improvements to my pitch result evaluator, pERA. I was supposed to do dive into it last season, but I spent most of the time dealing with the league’s new rules so this update got pushed off until now. I planned on adding Ball Percentage (Ball%), Called Strikes (CStr%), and StatCast batted ball information. I felt each add would provide a clearer picture of the pitcher’s pitches. I eventually found out I was double counting the same information with Ball% and CStr% and needed to remove one. Read the rest of this entry »


For a Starter to Beat His ERA Estimators …

The “ability” of a pitcher to consistently beat his ERA estimators will always be a discussion top. Today, I’m going to put context on who has suppressed their ERA for two straight seasons and how they performed in the third season. I’ve been trying to see if I have missed anything while digging into under and overperforming starts and found that I might have missed the obvious, the starter’s team.

Before getting to the team context, here are the baseline chances for starting pitchers to consistently beat certain ERA benchmarks. Read the rest of this entry »


Fastball Quality Matters …

Last week, I examined if throwing too many four-seam fastballs led to a pitcher being predictable and getting hit around. What I noticed was that I needed to expand out past just four-seamers and include sinkers. Again, I failed to find a connection between fastball quality-and-quantity and weakly hit batted balls. Instead, I was able to determine some benchmarks to find good fastballs.

Through some observations, I believed that throwing too many fastballs, especially if they were of poor quality (e.g. slow, average spin), would get hit harder. I dug through the numbers just hoping for my thoughts to be verified but I found jack squat. Nothing. Read the rest of this entry »


What is Too Many Four-Seamers?

The question came up when I examined David Peterson. I wondered if he was getting hit around because he was throwing a ton of subpar fastballs. Today, I’m back-testing the theory.

I had no idea what I was going to find but the results, positive or negative, will help to shape future studies. I examined starters from 2021 and 2022 who threw at least 20 innings (n=201). I limited the time frame to include the STUFFF metrics that have only been around that long. Also, I limited this study to guys who threw their four-seamer more than their sinker. I started with just four-seamers and stayed away from sinkers. The STUFFF metrics are separated based on pitch type so I wanted to stay in one lane.

The narrative behind four-seamers (or any fastball) would be that batters would familiarize themselves with these fastballs. I know that bad fastballs won’t generate as many strikeouts but do they get hit around more, especially if that’s all batters see.

Additionally, I included my pERA values which is only based on if the pitch misses (SwStr%) and the direction it is hit (GB%). These values might seem high but I don’t scale the value based on pitch type and fastballs generate fewer swings-and-misses than non-fastballs. It’s time to start the journey.

First, I grouped the pitchers by how far their ERA estimator was from their actual ERA. Here are the results.

Four-Seamer Fastball Metrics Depending on ERA-FIP
ERA-FIP > 1 Between -1 and 1 < -1
BABIP .322 .286 .241
HR/9 1.5 1.2 1.3
K% 18.7% 21.6% 22.6%
FF% 42.5% 37.8% 34.4%
FF%/(FF%+SI%) 79.1% 78.4% 71.1%
FFv 93.1 93.1 92.9
wFF/C -1.26 -0.21 0.12
Stuff+ 86.4 91.9 94.9
Bot+ 47.6 52.4 50.0
pERA 4.82 4.67 4.68

 

Four-Seamer Fastball Metrics Depending on ERA-xFIP
ERA-xFIP > 1 Between -1 and 1 < -1
BABIP .310 .287 .254
HR/9 1.8 1.2 1.0
K% 18.9% 21.7% 22.9%
FF% 39.4% 38.2% 35.1%
FF%/(FF%+SI%) 77.9% 78.2% 76.9%
FFv 93.0 93.2 92.9
wFF/C -1.57 -0.19 0.76
Stuff+ 87.2 91.3 99.1
Bot+ 48.8 52.2 53.5
pERA 4.88 4.68 4.50

 

Four-Seamer Fastball Metrics Depending on ERA-SIERA
ERA-SIERA > 1 Between -1 and 1 < -1
BABIP .307 .287 .264
HR/9 1.9 1.2 0.9
K% 18.9% 21.8% 21.6%
FF% 39.7% 38.0% 36.6%
FF%/(FF%+SI%) 79.6% 77.5% 79.2%
FFv 92.8 93.2 92.7
wFF/C -1.51 -0.21 0.58
Stuff+ 87.4 92.0 93.4
Bot+ 49.2 52.4 51.7
pERA 4.87 4.67 4.58

 

Four-Seamer Fastball Metrics Depending on ERA-xERA
ERA-xERA > 1 Between -1 and 1 < -1
BABIP .309 .286 .276
HR/9 1.8 1.2 1.3
K% 18.9% 21.9% 19.8%
FF% 41.0% 38.0% 35.7%
FF%/(FF%+SI%) 80.1% 78.8% 70.8%
FFv 92.5 93.2 92.9
wFF/C -1.61 -0.13 -0.39
Stuff+ 85.2 92.6 88.6
Bot+ 47.1 52.7 49.2
pERA 4.83 4.65 4.86

There is a lot to unpack, but the biggest takeaways for me are

  • The pitchers with higher than expected ERA threw more fastballs on average.
  • The pitchers with higher-than-expected ERA generally had worse STUFFF.
  • The pitchers with lower-than-expected ERA mixed in more sinkers.
  • Fastball velocity didn’t matter. It still remains linked to strikeouts.

Here are two more groupings by HR/9 and BABIP.

Average Four-Seamer Fastball Metrics Depending on HR/9
HR/9 > 1.7 Between 0.7 and 1.7 < .0.7
BABIP .294 .285 .293
HR/9 2.2 1.2 .6
K% 18.1% 21.8% 23.8%
FF% 39.7% 37.7% 38.8%
FF%/(FF%+SI%) 79.3% 78.2% 73.8%
FFv 92.466 93.156 93.943
wFF/C -1.72 -.09 .40
Stuff+ 85.7 92.5 92.3
Bot+ 49.3 52.1 53.8
pERA 4.99 4.65 4.49

 

Average Four-Seamer Fastball Metrics Depending on BABIP
BABIP > .317 Between .253 and .317 < .253
BABIP .334 .284 .237
HR/9 1.3 1.3 1.2
K% 20.1% 21.4% 22.9%
FF% 40.5% 37.8% 36.0%
FF%/(FF%+SI%) 75.5% 78.9% 76.9%
pfxvFA 93.212 93.112 92.941
pfxwFA/C -.76 -.32 .50
Stuff+ 85.6 92.1 96.8
Bot+ 51.1 52.1 51.5
pERA 4.75 4.69 4.59

The results are a little messier but the conclusions are close to being the same.

  • The batters who got hit around threw a few more fastballs on average.
  • The pitchers who got hit around had worse STUFFF.
  • Fastball velocity or sinker/four-seam mix didn’t matter to over-or-under-perform batted ball metric.

The two major factors seem to be the usage rate and the STUFFF metrics.

After eyeballing the above tables, it seems like a usage under 40% along with a Stuff+ value under 90 and a Bot Stuff under 50. To see if these benchmarks work, I took the 2023 starters and grouped them.

 

2023 ERA-ERA Estimators for Starters Throwing Lots of Bad Four Seamers
Four-seam traits FIP xFIP SIERA
Usage >40%, BotStuff <50 -0.10 -0.19 -0.03
Everyone else 0.06 0.07 0.04
Usage >40%, Stuff+ <90 -0.12 0.19 0.17
Everyone else 0.06 0.06 0.04

The pitchers I expected to perform worse actually performed better. That’s suboptimal. I did find out what possibly didn’t work but it would be nice if the values were predictive. I ran one last comparison for future reference, here are the pitchers’ stats for if their ERA is above or below their ERA estimators so far this season.

 

2023 Stats for Grouped by ERA-ERA Estimator Above or Below Zero
ERA minus estimator FF% wFA/C BABIP HR/9 botStf FF Stf+ FF
ERA-FIP >0 40.2% -0.53 .320 1.4 47.9 93.6
ERA-FIP <0 42.6% 0.17 .268 1.3 49.7 96.6
ERA-FIP >0 41.2% -0.80 .318 1.6 48.0 92.7
ERA-FIP <0 41.5% 0.47 .270 1.0 49.5 97.6
ERA-SIERA <0 40.7% -0.86 .318 1.6 47.3 92.2
ERA-SIERA >0 42.1% 0.53 .270 1.0 50.3 98.2

The usage doesn’t matter this season but the STUFFF values show some signs worth continued investigation.

That’s enough failure for one article. Here is what I see needs to be done next.

  • Sinkers will be included by weighting the results by usage. David Peterson mixes in some (bad) sinkers so maybe the combination brings more clarity.
  • I’m going to attempt a fastball grade that takes into account the predictive values (STUFFF), pitch results (pERA), and batted ball results (pVAL). From some past work, I wasn’t a huge fan of pVALs but I think they might help show the possible disconnects between shape and results (e.g. ability to hide the ball).

While I didn’t come to any groundbreaking information, I found what not to believe and hopefully, I can improve the future results.


Strikeout Rate’s Link to WHIP

I’m still in disbelief from a recent finding I made. It started with this comment in a recent article I wrote about STUFF:

How much WHIP changed in the two “Stuff” models was almost too good to be true. In both cases, the walk rate increased as a pitcher’s stuff got better, but the hit suppression was so large that the WHIP declined.

Well I was wrong about the hit suppression. I went back and found no link to BABIP. The difference was because WHIP is on an innings denominator and a strikeout removes the chance for a Hit and Walk. An out comes down to the random chance of a batted ball. I know it’s confusing so here is an example assuming a pitcher with a 9 K/9, 3 BB/9, and .300 BABIP and throws 6 IP/GS. Read the rest of this entry »


Ball%: Simple, Underutilized, & Highly Effective

A few days, I got into a spat looking into Tanner Bibee.

My issue was that even though Bibee’s walk rate was good in 2022 (combined minor league rate of 1.8 BB/9) there were signs that his walks could be an issue once this season started. While Bibee had some luck in 3-2 counts is one issue, I’m just going to focus on Ball% (Balls/Pitches). Read the rest of this entry »


Referencing Pitch Quality Models to More Traditional Stats

WARNING: If you are reading this article, some or most the exact values are out of date. The pitch quality models seem to go through at least a yearly adjustment so I can’t verify if all the numbers will hold up. With that caveat, it’s useful to have an overall idea of what each one means.

Last week, I was looking into Joey Lucchesi and I created this convoluted mess of a table.

Joey Lucchesi’s Pitch Modeling Stats
Model SI CU/CH FF/FC Stuff Overall
Bot 52 46 37 43 49
Stuff+ 89 91 72 86 98
pERA (AAA) 5.54 -0.44 4.79 2.74 3.14
pERA (AAA comps) 4.72 2.89 4.23

To start off with, having three different metrics using three different scales is confusing. Not as obvious was that I didn’t know exact what the two “Stuff” metric were exactly measuring. I had some idea listening to their creators and others using them. I decided to take a step back and put some perspective on the two pitch quality models so others and myself could correctly reference them and know what other metrics they corelate to.

Note: When I mention stuff metrics, I’m just referring to the Stuff values for Stuff+ and Pitching Bot. I know it can be confusing, especially with one system having the name Stuff+.

Two start out with, this article won’t answer two questions. First, I’m not looking into the predictiveness of the stats. While I have done some work on it, I feel that should be its own article. Second, I’m just looking at the combined values, not the individual pitches. Again, a separate article for another day.

Here at FanGraphs, we introduced the pitch modeling metrics over a month ago introducing PitchingBot and Stuff+ with separate writeups.

Here is a short description of each from the original articles.

PitchingBot

In short, PitchingBot takes inputs such as pitcher handedness, batter handedness, strike zone height, count, velocity, spin rate, movement, release point, extension, and location to determine the quality of a pitch, as well as its possible outcomes. Those outcomes are then aggregated and normalized on a 20-80 scouting scale, which is what is displayed on the leaderboards.

Stuff+

Stuff+ only looks at the physical characteristics of a pitch, including but not limited to: release point, velocity, vertical and horizontal movement, and spin rate.

Stuff+, Location+, and Pitching+ are all on the familiar “+” scale (like wRC+), with 100 being average.

While both supply a reason behind their values, it sucks that they each have their own scale. Personally, I have my pERA values and similar pitches on an ERA scale so there is readily recognizable reference.

The first item of business was to put put both of the metrics on an ERA scale. By lining up the values from 2021 and 2022 (min 40 IP) with the pitcher’s actual ERA, the following two formulas were created.

    • PitchingBot values to an equivalent ERA (r-squared of .992): 22.697*e^(-.035*Bot Metric)
    • Stuff+ values to an equivalent ERA (r-squared of .996): 49.19*e^(-.025*Stuff+ Metric)

With the two formulas, here is a quick reference table for stuff values and the ERA equivalent.

Conversion Table for ERA to “Stuff” Equivalents
ERA Equivalent BotPlus Stuff+
1.50 78 135
2.00 69 124
2.50 63 115
3.00 58 108
3.50 53 101
4.00 50 96
4.50 46 92
5.00 43 87
5.50 41 83
6.00 40 80

For an example, say a pitcher has a Stuff+ of exactly 100. We would expect the hitter to have an BotStuff around 52 and an ERA around 3.60.

The next step I did was bucket the three metrics for each PitchingBot (stuff, command, and overall) and Stuff+ (Stuff+, Location+, and Pitching+) and then compare them to other pitching metrics. To start with, here is a limited comparison (limited table size) with all the values in this Google Doc.

Pitching Bot

Comparison of PitchingBot’s Stuff to Other Metrics
Range botOvr botStf botCmd Pitching+ Stuff+ Location+ ERA K/9 BB/9 WHIP HR/9
>70 62 72 49 106 125 98 3.05 11.9 3.9 1.13 0.8
65-70 62 67 53 106 120 100 3.01 11.1 3.3 1.12 0.8
60-65 58 62 52 104 113 99 3.30 10.5 3.5 1.17 0.9
55-60 56 57 53 102 107 100 3.62 9.8 3.3 1.20 1.0
50-55 54 52 53 101 102 100 3.81 9.0 3.2 1.25 1.1
45-50 52 47 54 99 97 101 4.16 8.4 3.1 1.28 1.2
40-45 49 42 53 97 91 100 4.63 7.7 3.1 1.35 1.4
<40 46 36 54 96 86 101 4.60 6.7 2.9 1.36 1.3

 

Comparison of PitchingBot’s Command to Other Metrics
Range botOvr botStf botCmd Pitching+ Stuff+ Location+ ERA K/9 BB/9 WHIP HR/9
>70 62 44 70 101 88 108 4.06 9.4 1.4 1.11 0.9
65-70 64 50 66 105 104 107 3.75 8.8 2.0 1.16 1.1
60-65 59 50 62 104 101 104 3.59 9.0 2.3 1.16 1.2
55-60 56 50 57 102 101 102 3.85 8.7 2.7 1.21 1.1
50-55 53 52 52 101 103 100 3.83 9.1 3.3 1.26 1.1
45-50 49 51 47 98 99 97 4.37 8.7 3.9 1.36 1.1
40-45 46 53 42 97 102 95 4.28 9.3 4.5 1.35 1.1
<40 43 56 35 96 104 92 4.11 9.6 4.6 1.37 1.0

 

Comparison of PitchingBot’s Overall to Other Metrics
Range botOvr botStf botCmd Pitching+ Stuff+ Location+ ERA K/9 BB/9 WHIP HR/9
>70 72 69 61 112 128 105 2.39 11.9 2.2 0.95 0.8
65-70 67 62 61 109 118 104 3.07 10.7 2.4 1.05 1.0
60-65 61 58 58 105 110 103 3.41 10.2 2.7 1.13 1.0
55-60 57 54 56 102 104 101 3.59 9.3 3.0 1.21 1.0
50-55 52 49 53 100 99 100 4.05 8.5 3.2 1.29 1.2
45-50 47 45 50 97 93 99 4.40 8.1 3.6 1.34 1.2
40-45 42 44 45 95 92 97 4.87 7.8 4.0 1.43 1.3
<40 37 47 38 93 94 93 4.84 8.6 4.8 1.43 1.2

Stuff+

Comparison of Stuff+’s Stuff+ to Other Metrics
Range Pitching+ Stuff+ Location+ botOvr botStf botCmd ERA K/9 BB/9 WHIP HR/9
>130 111 137 102 65 68 54 2.38 12.5 2.6 0.92 0.9
125-130 107 127 100 61 65 52 2.97 12.0 3.4 1.12 1.0
120-125 106 121 99 61 65 52 3.02 10.2 3.3 1.12 0.7
115-120 106 117 101 60 62 54 3.24 10.8 3.3 1.14 0.9
110-115 104 112 100 58 57 54 3.25 10.0 3.2 1.16 1.0
105-110 102 107 100 54 54 52 3.63 9.5 3.2 1.20 1.0
100-105 100 102 100 52 51 52 3.87 9.1 3.4 1.25 1.1
95-100 99 97 100 52 49 53 4.17 8.4 3.2 1.30 1.2
90-95 98 92 100 49 45 53 4.49 8.0 3.2 1.33 1.3
85-90 96 87 100 49 44 53 4.78 7.4 3.3 1.41 1.3
80-85 95 82 100 48 41 53 4.64 6.9 2.9 1.38 1.4
75-80 94 78 101 46 38 54 4.64 6.8 3.0 1.39 1.3
<75 92 70 101 46 37 55 5.97 6.0 2.9 1.54 1.6

 

Comparison of Stuff+’s Command+ to Other Metrics
Range Pitching+ Stuff+ Location+ botOvr botStf botCmd ERA K/9 BB/9 WHIP HR/9
105-110 105 104 106 61 50 63 3.49 8.9 2.0 1.13 1.1
100-105 102 101 102 55 50 56 3.83 8.9 2.8 1.22 1.1
95-100 98 100 97 50 52 48 4.14 8.8 3.8 1.32 1.1
90-95 96 104 93 45 57 41 4.44 9.9 5.1 1.44 1.0
85-90 95 106 87 41 63 33 4.32 11.3 5.3 1.26 0.8

 

Comparison of Stuff+’s Pitching+ to Other Metrics
Range Pitching+ Stuff+ Location+ botOvr botStf botCmd ERA K/9 BB/9 WHIP HR/9
>115 116 140 105 72 69 60 2.35 12.3 1.7 0.83 0.9
110-115 111 125 104 67 64 59 2.84 11.3 2.4 1.00 0.9
105-110 107 114 102 61 58 57 3.16 10.2 2.7 1.10 1.0
100-105 102 104 101 55 53 54 3.65 9.2 3.1 1.22 1.0
95-100 97 94 99 49 47 51 4.30 8.2 3.5 1.33 1.2
<95 93 87 96 43 44 46 5.38 7.5 4.1 1.51 1.4

 

Looking over the information, both of the “Stuff” values seems to generally catch what each is trying to describe. The stuff values corelate to strikeouts and the command/location grades point walk rate.

After reading through the definitions of how the batted ball data is collected, I expected the Bot values to have a larger variance in the StatCast values (linked spreadsheet). That concept wasn’t the case and there ended up being almost not correlation to any of the measures to actually limiting hard contact. With hard contact not being predictive, I was surprised when I got to WHIP.

How much WHIP changed in the two “Stuff” models was almost too good to be true. In both cases, the walk rate increased as a pitcher’s stuff got better, but the hit suppression was so large that the WHIP declined.

The ability to detect hit suppression is on another scale than has ever been measured. It’s almost too good to be true.

Overall, I see two major issues with the stuff metrics:

  • The formulas behind the values is a black box so there is no way to back check the results. Also, the calculations are constantly changing so it’s tough to know which formula is being used. It’ll be impossible to incorporate the information if it keeps changing
  • The pitch model metrics are trained off of just the 2021 and 2022 data. Of course the data is going to almost lineup perfectly for now. It’ll be interesting to see how they hold up this season and three to four seasons down the road.

The next step for me will be to dive into the small sample of 2023 data. Does the near perfect accountability of all batted ball outcomes continue based on just pitch metrics or were the metrics correlated too close to the actual results I’m examining.

I did get a few questions answered but working through these Pitch Quality Models but I generated a ton more. As I get time, I’ll keep diving into the subject to see what is usable going forward.