Author Archive

The Truth About Pitch Values

It seems as though each year, fantasy baseball analysts, “professional” and amateur alike, hone in on a new — or, if not new, then relatively untouched — metric or data set for their endlessly eager consumption. In 2015, FanGraphs introduced batted ball data to its leaderboards. In 2016, Statcast data was unveiled, although it arguably didn’t become popular until 2017, and before the 2017 season FanGraphs changed the game with its splits leaderboard. Baseball Prospectus has introduced myriad new metrics, too — DRA in 2015, DRC+ last year, etc. — and we began to lean into pitch-specific performance analysis last year. (The latter-most topic is relevant to what follows here.)

I recently joined Christopher Welsh and Scott Bogman of In This League on their podcast. I thought one of the evening’s questions was particularly topical and prescient (and I paraphrase): What will 2019’s it metric be? The question was asked with pitch values, something I’ve seen garner increasing attention on Twitter, in mind.

You can acquaint yourself with pitch values directly from the man who created them:

Read the rest of this entry »

Pitch Type Performance: 2018 Summary

Shortly after the onset of last season, I dug into pitch-level statistics to see how much swinging strike rate (SwStr%), ground ball rate (GB%), and isolated power (ISO) varied by pitch type. I felt inspired after analyzing Madison Bumgarner before the 2018 season and noticed his fastball, once elite, was utterly broken after his dirt bike accident. (See his 2018 player caption and this July post in which I followed up MadBum’s lack of progress.) I felt encouraged by the praise the post received from readers and fellow analysts alike for the clarity it provided. I’d like to think it helped move the needle, even if only slightly, in terms of how we evaluate pitchers.

I wanted to refresh the guts of that post for the 2018 season with additional metrics. There’s not much else to discuss; this’ll be short and sweet. (I’ll toss in some gratuitous high-level analysis following these tables.)


  • All data is courtesy of PITCHf/x via Baseball Prospectus
  • All tables present average rates for starting pitchers only
  • Due to pitch tracking/stringing not being perfectly precise, the numbers below are highly accurate but not completely so and may not align exactly with FanGraphs’ batted ball data (for example, Baseball Info Solution strings far fewer line drives than does PITCHf/x)
  • Click headers to sort!

Batted ball outcomes by pitch:

Read the rest of this entry »

The Biggest Hitter K% Outliers of 2018

Yesterday, I devised a new expected strikeout rate for pitchers and used it to identify qualified starting pitchers who over- or under-performed in 2018. I’m reluctant to make out the exercise to be more than it is. I simply wanted to take the most intuitive approach to describing a pitcher’s strikeout rate (K%): by using the plate discipline exhibited by opposing hitters. Today, I seek to do the same for hitters. I can tell you now the discussion will be much more qualitative than quantitative.

Read the rest of this entry »

The Biggest Pitcher K% Outliers of 2018

Mike Foltynewicz, a first-ballot Hall of Namer, immediately strikes me as someone who outperformed his strikeout rate (K%) in 2018. I don’t have to look far for confirmation: his 27.2% strikeout rate outstripped his 10.3% swinging strike rate (SwStr%) by a mile. Because whiff rate correlates so strongly with strikeout rate, it serves as a useful proxy for what one could expect of a pitcher’s strikeout ability.

I generally follow this rule of thumb when I’m reluctant to get too into the weeds when assessing peripherals: SwStr% * 2 = K%. It’s imperfect but useful in a pinch. Folty violates this rule of thumb pretty dramatically. Of 13 qualified pitchers who struck out at least 27% of hitters, his 10.3% swinging strike rate falls well short of the shortlist’s 2nd-lowest mark (Charlie Morton, 11.9%). Foltynewicz’s 2018 performance has already wilted under what amounts to very little duress.

Still, I wanted to allow Foltynewicz the opportunity to redeem himself. Whiff rate does not a pitcher make; there are other components to plate discipline allowed such as chase rate (O-Swing%) and zone rate (Zone%), among others, that describe each pitcher in much finer detail. I broke down a pitcher’s plate discipline allowed into its component pitch outcomes:

Read the rest of this entry »

2018 Statcast Park Impacts (Not Quite Factors)

The longer we have Statcast data at our disposal, the more ways we find novel uses for them. What follows is my proposal to use the difference between expected and actual value segmented by batted ball type and venue to determine park factors (and potentially evaluate defensive value, as described in the footnote). Unfortunately, someone smarter than me was already way ahead of me. I’ll get to that in a second.

A typical park factors grid, such as those produced by ESPN or FanGraphs, commonly relies on outcomes — outcomes of plate appearances (ESPN), batted ball categories, or both (FanGraphs). They describe what actually occurred, the way wOBA describes a hitter’s actual production. Conversely, expected wOBA (xwOBA) describes what should have occurred based on a batted ball’s exit velocity (EV) and launch angle (LA). It strips away everything else, holding constant all other environmental factors in order to deliver an otherwise-context-neutral EV/LA-based value.

The difference between wOBA and xwOBA (“wOBA minus xwOBA,” or wOBA—xwOBA for short), therefore, effectively captures all value amassed or lost by other variables. In other words, if wOBA explains what actually happened in a non-neutral environment, and xwOBA explains what should’ve happened in a neutral environment, then the difference between them characterizes the effect of the environment — the ballpark itself.

Unfortunately for me (but fortunately for everyone else), Tony Blengino already did this (which is why he’s a former MLB executive and I’m not). In 2017, he used Statcast data to calculate park factors on the basis of expected outcomes relative to actual league-average production. For all intents and purposes, it’s the same idea.

Consider this post a refresher on the topic.

Let me call your attention back to a simpler time. If you search “Miguel Cabrera xwOBA” on Twitter, you’ll find, well, not a multitude, but at least a sampling, of Tweets from the summer of 2017 lamenting Cabrera’s (and his teammate’s) bad luck by measure of wOBA—xwOBA:

Read the rest of this entry »

Chris Archer’s Last, Best Hope

Fantasy owners have been chasing, to no avail, Chris Archer’s 2015 season, during which he recorded a 29% strikeout rate with a 3.23 ERA. After finishing just outside the top-50 overall by National Fantasy Baseball Championship (NFBC) average draft position (ADP), Archer averaged the No. 50 pick from 2016 through 2018. Unfortunately, the outcomes annually and in aggregate have been awful…

Chris Archer’s Career Halves
2012-15 564.2 3.33 1.19 24.1% 8.1% 46.3% 3.36 3.47 3.51
2016-18 550.2 4.12 1.28 27.5% 7.5% 44.8% 3.64 3.44 3.54

… even though his peripherals before and after 2015 have been nearly identical. That’s the persistent problem with Archer: he has given us perpetual reason to chase results he may never again achieve.

I’m here to argue Archer’s woes started not in 2016, when his ERA ballooned to 4.02, but in 2015 — yes, his career year. That’s because he stopped throwing his sinker in 2015, opting instead to rely on a pitifully bad four-seam fastball as his primary offering. His 2015 success can be attributed primarily to his slider, which he began to feature much more prominently, but the remaining success was thanks to good luck.

Read the rest of this entry »

How Sprint Speed Relates to Stolen Bases

Yesterday, I wrote about how sprint speed relates to wOBA minus expected wOBA (wOBA–xwOBA). Today, I summarize my investigation into what factors most readily affect a player’s stolen base success rate (SB%).

This invitation from BatFlip Crazy, embedded in this lengthy Twitter exchange, served as the catalyst for the research. In hindsight, I’m not sure I totally answered the question. Manipulating data from multiple different sources (in this case, Baseball Reference and Baseball Savant) can be exhausting.

I used my final Frankenstein data set, which contained statistics for all players from 2016-18 with at least 100 stolen base opportunities (SBOs) in a given season, to investigate relationships among the following various stolen base metrics:

Read the rest of this entry »

How Sprint Speed Relates to wOBA–xwOBA

Fantasy analysts and enthusiasts alike are still searching for ways to use Statcast’s expected wOBA (xwOBA) metric meaningfully to gain an edge. Unfortunately, beyond leveraging the difference between xwOBA and actual wOBA (what I, and probably countless others, refer to as the “wOBA minus xwOBA differential”), I don’t know yet how else you can use xwOBA effectively. Given already-widespread use of the metric, the minimal edge you can glean will come from interpretation.

I discussed the interpretation of xwOBA multiple times in 2018. In May, I highlighted hitters on whom to buy low because of their extreme/outlier wOBA–xwOBA differentials. In July, I called out xwOBA’s inability to account for what appeared to be the ball becoming un-juiced, thereby overestimating xwOBA across the league. In September, I investigated the predictiveness of xwOBA in-season (that is, the predictiveness of first-half xwOBA on second-half wOBA).

I discussed all of these topics during my presentation at BaseballHQ’s annual First Pitch Arizona forum, especially the former-most. Basically all of the hitters I tabbed as buy-lows outgained their prior performance by substantial margins — all of them, that is, except for Victor Martinez. Could he be considered a miss? Sure, except he was different from the rest of his fellow underachievers: he perennially underperforms his xwOBA. Perhaps the better question, then, is: Why was he a miss?

Read the rest of this entry »

2019 xADP, New and Improved

About a month ago, I published a post that predicted 2019 ADP (“xADP”) values using eight years’ worth of average draft position (ADP) data from the National Fantasy Baseball Championship (NFBC) and end-of-season (EOS) values from Razzball. The model was pretty good — it explained nearly 60 percent of the data’s variance (adjusted r2 = 0.59), which is pretty dang good. It felt unfulfilled, though; it accounted for some players but not others — namely, breakout rookies who were completely off the radar the previous season and top prospects who had yet to debut.

I took some time (really, a lot of time) to clean up my data to see how much it would improve my model, if at all:

  1. Originally, my data set did not account for players who were not drafted (aka had no ADP value) but made an impact in 2018 (think Juan Soto). Conversely, my data did account for players who were drafted but made no impact in 2018 (think, uh, Troy Tulowitzki, I guess). It was kind of like addressing a Type I error but ignoring a Type II error (or the other way around? I don’t know). I took painstaking care to fill in these holes.
  2. I took equally painstaking care to ensure all player names were consistent — no “Nick Castellanos”/”Nicholas Castellanos” mismatches that might pollute the analysis. Odds are, there are a couple of players I missed, but having spent hours poring over the data, I feel confident that the issue is no longer pervasive.
  3. I added ages! They make a small impact, most meaningful to players at the extremes, such as the very young (think Ronald Acuna) and the very old (think Nelson Cruz).
  4. Lastly, a theoretical and methodological adjustment: I forced negative ADP values to $0. I wanted the model to reflect an actual draft, in which players are never bought at auction for negative dollars — rather, their values converge on zero. It’s important to note here that a player can still end the season with negative value based on the concept of replacement level. Accordingly, only negative ADP values, and not negative EOS values, were forced zero.

Fortunately, the extra work was worth it: the model boasts an adjusted r2 of 0.75 (with ages; 0.73 without). That’s a massive improvement, and it can be attributed almost entirely to the slight (but profound) change in the model specification.

Read the rest of this entry »

Which Source for Pitching Metrics is Best?

Rob Silver, the 2016 National Fantasy Baseball Championship (NFBC) Main Event winner and high-stakes fantasy baseball extraordinaire, messaged me on Twitter a few days ago to ask a question: Which source of pitching statistics are most accurate? I’m paraphrasing. Also, I could paraphrase the question any number of ways: Which source should we be using? Which most reliably correlates with pitcher performance?

It was a question for which I had no answer. Admittedly, I use a variety of sources, none of which align with one another — something I have noticed before but about which I can do nothing but shrug and accept it as a quirk of being a sabermetrician who bears the struggle of dealing with publicly available data.

The sources cryptically mentioned above include the following:

Read the rest of this entry »