I haven’t been this excited to announce something since I first created THE BAT. Today I get to announce a brand new projection system I’m calling THE BAT Experimental (or THE BAT X)! These are now available on the FanGraphs player pages and as sortable projections.
THE BAT X is going to serve as a sort of drawing board for new ideas and innovations that I’m not quite ready to replace the tried-and-true, classic version of THE BAT with yet. Basically, it will be a set of projections that should be the best possible version of THE BAT, but which I’m not 100% certain of yet. The first of these THE BAT X innovations: Statcast data!
I’ve been meaning to fold Statcast into THE BAT since its inception, but I’ve always been too busy to find the time to do it the right way. With the delayed start to the 2020 MLB season, however, I’ve spent the past couple months diving deep into the world of Statcast data, evaluating what stats make hitters effective, what stats are the most stable, what stats are the most predictive, how player aging impacts them, what they can tell us about a player’s intentional approach, and all kinds of fun questions like that.
I evaluated 150+ variables, starting with the basics like average launch angle and exit velocity and then evaluating various deviations and subsets of launch angles, spray angles, exit velocities, hit qualities, and so on. League-wide HR rates spike over 20% between a launch angle of 23° and 34°? I want to see how that predicts future home runs. BABIPs start to plummet once the launch angle goes over 27° degrees? Yup, let’s throw that into the mix. What about a player’s maximum exit velocity? Or the top 5% of his exit velocities? Let’s see if that adds value. Sprint speed? Oh yeah, that’s gotta’ be in there too.
What I finally arrived at was a brand new projection system based entirely on Statcast data. Stability and regression to the mean, weighing of multiple years of data, aging curves, the works. But because Statcast data gives us better insight into a hitter’s intentional process-level decisions (i.e. hitters essentially have the ability to decide what kind of swing plane they want), I didn’t want to treat this exactly the same as a normal projection. (Normally, we kind of look at what a hitter did last year, look at what he did the years before, give a bit of extra weight to the more recent data but more or less just split the difference.) No, I wanted to know if a hitter who changes his approach is more likely to maintain that approach the following year.
Are there certain players for which last year matters more than it otherwise would? I’m not going to go into all of the secret sauce here, but it’s safe to say that there are indeed ways to project the continuance of an approach change, which is pretty damn fantastic. And it makes sense: if a player’s breakout last year was supported by peripherals, it should count more than if it didn’t, right? After all, he’s going to try to do the same thing again! The only appropriate name for this system, the only thing I could possibly think to call a projection system with THE BAT methodology and Statcast data, is… THE BATcast. I know; it’s pretty badass.
Of course, I didn’t stop there, because as great as Statcast data is, it’s not the whole story. Everything that goes into the classic version of THE BAT also has plenty of value, and so we really want both. That’s where THE BAT X comes in. I ran tests to find the optimal combination of THE BAT and THE BATcast to form a final projection: THE BAT X.
In theory (and according to back-tests I’ve run), THE BAT X is the best of the three systems. It should be, after all, since it starts with THE BAT and layers on valuable new information. We only have a few years of Statcast, though, and early years have some data missing. There’s also the issue of imputed data, which THE BATcast does its best to handle. Overall, given the small-ish sample, I’m not ready to say “This is the new THE BAT” and throw the old projections away, especially when the classic version of THE BAT has performed so well for DFS players and in season-long accuracy contests (Ariel Cohen found it to be the best non-aggregate system he tested over the past two years; at FantasyPros, it was found to be the best non-aggregate system in 2018 and the third-best in 2019 of the dozen-plus systems they tested).
Plus, keeping the original projections around affords us the added benefit of being able to compare projections for players. It will gives us the ability to look at what a hitter is projected to do just based on surface level stats, and what he is projected to be after incorporating more advanced stats. It could help us identify potential breakout players, players that may have fundamentally changed something about their game to make them better… or the reverse.
Take Alex Bregman. Every projection system here on FanGraphs (including THE BAT) pegs him for a wOBA between .390 and .395 and as the fifth-best hitter in MLB. THE BAT X, however, puts him at .381 – one of the largest negative differences for an elite hitter between THE BAT and THE BAT X. Bregman is obviously great, and his surface numbers have been fantastic, but is he a top 5 hitter in baseball? Probably not.
THE BAT X puts him outside the top 10. Why? Because his average exit velocity is only 1 mph above average. The average of his top 5% of exit velocities is 2 mph below average. The percentage of balls he hits above 100 mph? 7% below average. His Barrel rate was 16% above average in 2018, but it was 25% below average in 2019. None of that is good.
Bregman is a fantastic hitter, no doubt. The plate discipline is elite, and Statcast mostly just tells us new information about what happens after contact. But this is a great example of why we need both sides of this coin. The surface numbers are probably inflated given the fairly middling Statcast peripherals, but it also seems really unlikely that Bregman is below-average based solely on those peripherals when his actual results have been so outstanding (the dude hit 41 dingers last year—that didn’t happen by chance).
There are surely things the Statcast data can’t capture that show up in the surface numbers. THE BAT X combines the best of both worlds, and comparing it to the classic version of THE BAT will allow us to see which players may be playing over their heads and which may be underperforming and are primed for a breakout.
Here is a list of the top and bottom players heading into 2019 that THE BAT X either liked or disliked more than THE BAT. As you can see, THE BAT X did an excellent job and identifying both riser and faller candidates.
|Player||THE BAT||THE BAT X||Difference||Actual 2019|
|Jackie Bradley Jr.||.311||.324||.013||.314|
|Player||THE BAT||THE BAT X||Difference||Actual 2019|
Of the top 20 players THE BAT X liked more than THE BAT, 15 finished 2019 with a wOBA higher than their projected wOBA in THE BAT. On the other side, 13 finished 2019 with a wOBA lower than their projected wOBA in THE BAT.
Here’s the lists for 2020. Update your fantasy cheat sheets accordingly.
|Player||THE BAT||THE BAT X||Difference|
|Vladimir Guerrero Jr.||.350||.362||.012|
|Player||THE BAT||THE BAT X||Difference|
Now, as with any system, THE BAT X is not perfect. It is likely to have some biases. I’ve already noticed that it seems to be lower on guys like Nolan Arenado, J.D. Martinez, and Eugenio Suarez than THE BAT every year. Maybe these guys truly are performing over their heads a bit and they’ll come down in 2020, or maybe they’re just doing something that the Statcast data can’t capture.
But outliers and biases exist in any system, and the overall results from THE BAT X have been quite fantastic in backtesting. Please check it out, play around with it, and see what you think. If you notice anything or have any questions/comments/suggestions, please feel free to reach out to me. Finding me on Twitter (@DerekCarty) is the easiest way.
If you’re interested in learning more about this stuff, I gave a presentation entitled “Statcast: Fact, Fiction, and Prediction” as part of Pitcher List’s PitchCon online quarantine conference a couple weekends ago, which you can re-watch on their YouTube.
And if you’re interested in seeing more granular data, I’ve put a lot of Statcast data up over at EV Analytics. You can see a bunch of the more predictive Statcast variables that go into THE BAT X… both past data and projected data for each player. You can also see correlations with that data and actual performance, both from a descriptive and predictive standpoint.
I’ve also published the intermediary system, THE BATcast, over there as well. It’s the least valuable of the three systems, but you’ll be able to compare THE BAT, THE BATcast, and THE BAT X to see exactly how each player rates according to each method and get a better sense of what the player’s ceiling is according to Statcast. I think this stuff is incredibly cool, and I hope you do too!
Finally, I want to give a big thanks to Tom Tango, Alex Chamberlain, and Andrew Perpetua. While there were a lot of people that helped me in various ways who all deserve thanks, these three in particular took a lot of time out to talk things through with me, offer suggestions, and answer questions I had while I played catch-up with Statcast data. Without you, THE BAT X wouldn’t be as good as (I hope) it is, and I am very, very grateful.