Which Source for Pitching Metrics is Best?

Rob Silver, the 2016 National Fantasy Baseball Championship (NFBC) Main Event winner and high-stakes fantasy baseball extraordinaire, messaged me on Twitter a few days ago to ask a question: Which source of pitching statistics are most accurate? I’m paraphrasing. Also, I could paraphrase the question any number of ways: Which source should we be using? Which most reliably correlates with pitcher performance?

It was a question for which I had no answer. Admittedly, I use a variety of sources, none of which align with one another — something I have noticed before but about which I can do nothing but shrug and accept it as a quirk of being a sabermetrician who bears the struggle of dealing with publicly available data.

The sources cryptically mentioned above include the following:

  • Plate Discipline,” hosted at FanGraphs; these data are supplied by Baseball Info Solutions (BIS)
  • Pitch Info,” also hosted at FanGraphs and which displaced PITCHf/x data a couple of years ago; these data are effectively PITCHf/x, but cleaned up and refined
  • PITCHf/x, which populates Baseball Prospectus’ PITCHf/x leaderboards

As aforementioned, none of the data align, at least not perfectly. Moreover, Brooks Baseball, which I use frequently for player-specific analyses of more granular pitching data, are allegedly powered by Pitch Info. But, again, to my knowledge, do not align perfectly with the Pitch Info data on FanGraphs, the two of which are presumably one in the same (but not).

Silver’s question prompted me to finally tackle the issue head-on. What follows are the results.

You Aren't a FanGraphs Member
It looks like you aren't yet a FanGraphs Member (or aren't logged in). We aren't mad, just disappointed.
We get it. You want to read this article. But before we let you get back to it, we'd like to point out a few of the good reasons why you should become a Member.
1. Ad Free viewing! We won't bug you with this ad, or any other.
2. Unlimited articles! Non-Members only get to read 10 free articles a month. Members never get cut off.
3. Dark mode and Classic mode!
4. Custom player page dashboards! Choose the player cards you want, in the order you want them.
5. One-click data exports! Export our projections and leaderboards for your personal projects.
6. Remove the photos on the home page! (Honestly, this doesn't sound so great to us, but some people wanted it, and we like to give our Members what they want.)
7. Even more Steamer projections! We have handedness, percentile, and context neutral projections available for Members only.
8. Get FanGraphs Walk-Off, a customized year end review! Find out exactly how you used FanGraphs this year, and how that compares to other Members. Don't be a victim of FOMO.
9. A weekly mailbag column, exclusively for Members.
10. Help support FanGraphs and our entire staff! Our Members provide us with critical resources to improve the site and deliver new features!
We hope you'll consider a Membership today, for yourself or as a gift! And we realize this has been an awfully long sales pitch, so we've also removed all the other ads in this article. We didn't want to overdo it.

I pulled five years’ worth of data from each source, split up by season and limited to pitchers who qualified for the ERA title (at least 162 innings recorded) or threw at least 2,500 pitches. This creates a panel of roughly 350 player-seasons.

For FanGraphs data (Plate Discipline and Pitch Info), I used or calculated the following variables:

  • Swing that makes contact on a pitch in the zone: Zone% * Z-Swing% * Z-Contact%
  • Swing that does not make contact on a pitch in the zone: Zone% * Z-Swing% * (1 — Z-Contact%)
  • Swing that makes contact on a pitch outside the zone: (1 — Zone%) * O-Swing% * O-Contact%
  • Swing that does not make contact on a pitch outside the zone: (1 — Zone%) * O-Swing% * (1 — O-Contact%)
  • No swing on a pitch in the zone: Zone% * (1 — Z-Swing%)
  • No swing on a pitch outside the zone: (1 — Zone%) * (1 — O-Swing%)

For each player, these percentages sum to 100%, comprising every possible outcome for a plate appearance at the highest possible level, all expressed as percentages/frequencies.

For Baseball Prospectus data (PITCHf/x), I took a more circuitous route given the tools (variables) available to me:

  • Swing that makes contact: [Sw Rate] * (1 — Whf/Sw)
  • Swing that does not make contact: [Sw Rate] * Whf/Sw
  • Called strike percentage: [Called S] / Num
    This serves as a proxy for “No swing on a pitch in the zone”
  • Called ball percentage: [Called B] / Num
    This serves as a proxy for “No swing on a pitch outside the zone”

You might think the less-granular data from the PITCHf/x leaderboards would produce the worst results. You might be wrong (but, also, you might be right — there’s a complication here upon which I’ll expound shortly).

For each set of data, I specified separate regression equations using every outcome listed above as independent variables for each of the following descriptive dependent variables:

  • K% (strikeout rate)
  • BB% (walk rate)
  • ERA (earned run average)
  • FIP (Fielding Independent Pitching)
  • xFIP (Expected FIP)
  • SIERA (what uhhhh what does that stand for)

This table summarizes the adjusted r2 produced by each equation for every dependent variable and data source:

Goodness of Fit Measurements
Metric PITCHf/x Plate Discipline Pitch Info
K% 0.832 0.784 0.804
BB% 0.633 0.589 0.561
ERA 0.227 0.194 0.195
FIP 0.455 0.401 0.408
xFIP 0.544 0.545 0.525
SIERA 0.631 0.582 0.576

Across the board, PITCHf/x performs better than Plate Discipline and Pitch Info (except for xFIP, which neutralizes the ill effects of home runs and fly balls, thereby inadvertently leveling the playing field). You can describe — not predict, but describe — strikeout and walk rates really, really well with each data set, and as you should. ERA, while a more dubious affair, still bears a moderate correlation to the data. Note the increasing correlation with the peripheral pitching metrics — FIP, xFIP, SIERA — which, by no coincidence whatsoever, mirrors the strength with which they describe/predict ERA in-season. (Please do not make me dig up the SIERA over xFIP over FIP diatribe.)

But, ah, yes, the complication: the PITCHf/x data uses not only fewer variables but also potentially more accurate independent variables. Called strikes and balls are inherently more accurate when describing the outcomes of plate appearances with no swing: they remove all human error that might be associated with an umpire expanding the strike zone (i.e. strike on a pitch typically called a ball) or squeezing the pitcher (ball on a typical strike). This sets PITCHf/x data at a slight advantage, although I can’t confirm by how much. Would it be better than the others without it? Worse? Comparable?

Given the uncertainty here, and given how close each data source compares to one another, it’s hard for me to determine this as anything other than a three-way draw. I can’t, in good faith, declare PITCHf/x has a distinct edge (small as it may be) because of this wrinkle. My best advice to you: use all of them; mentally blend them together and understand that they describe pitcher performance in different ways that ultimately produce similar outcomes. Sorry, I know it’s an underwhelming result. At least it brings me, and hopefully Silver, some peace of mind.

I’ll leave you with this final nugget. Swinging strike rate (SwStr%) is easily the most popular standalone peripheral metric that fantasy analysts use as a shorthand for pitcher effectiveness. Here’s how each data source’s swinging strike rate correlates with strikeout rate, just so the record shows:

Plate Discipline: r2 = 0.753
Pitch Info: r2 = 0.753
PITCHf/x: r2 = 0.771

Just note you have to calculate PITCHf/x swinging strike rate on your own (swing rate multiplied by whiffs-per-swing). But it’s only microscopically better. In other words: you’re on your own, kid.





Two-time FSWA award winner, including 2018 Baseball Writer of the Year, and 8-time award finalist. Featured in Lindy's magazine (2018, 2019), Rotowire magazine (2021), and Baseball Prospectus (2022, 2023, 2024, 2025). Biased toward a nicely rolled baseball pant.

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Nicklaus GautMember since 2018
7 years ago

Thank you for the write up, as similar things have been on my mind lately. You described “Plate Discipline” as being hosted at FanGraphs , with data being supplied by BIS. Are FG’s “Pitch Values”, Pitchf/x data that’s ‘cleaned up and refined’ in-house by FanGraphs, or is that done by BIS as well?