# Exit Velocity and xOBA Outliers

Over the past few days something very odd has happened in baseball. Somehow, some way, Zack Cozart has become the active fWAR leader. Okay, this may not be as big a surprise this afternoon as it may have been when you first heard about this, but it is still pretty crazy, right? Or maybe you saw him sneaking up the leaderboards over the past few days or weeks. Either way, it has happened. He’s now number one amongst the active players, and perhaps he’ll soon overtake Trout for first amongst all players.

In terms of fantasy, WAR doesn’t carry much weight, of course. Especially for a guy like Cozart who generates a solid chunk of his value through defensive excellence. Even still, Cozart is posting numbers at a rate that far exceeds his career numbers. He currently stands with a .351 batting average and a 1.059 OPS, up from his career .254 average and .704 OPS.

Zack Cozart has below average exit velocity, only 84 mph. We’re not talking a touch below average, either, he’s more than a full standard deviation below with a z-score of -1.37. All this made me curious about where exactly he sat on the exit velocity spectrum, whether there are other similar outliers, and if there is anything we can learn from them.

I created a Viz using this data. The top image displays the xOBA versus Exit Velocity, and the bottom compares the xOBA and wOBA.

I’ve limited the default settings to a 200 PA minimum and highlighted the outliers. You can change the plate appearances minimum and highlight any player or team that you’d like.

Notice how, among those with at least 200 PA, Cozart’s exit velocity is quite a bit below average. Only a dozen or so batters have a lower exit velocity, and none of them have a higher xOBA, or wOBA for that matter. I added an EV slider to the wOBA vs xOBA chart, and if you set the slider to a maximum of 86 mph you’ll see the only batter with at least 200 PA with a higher wOBA is…. Joey Votto?!

Huh.

So that’s a bit weird. Or maybe it shouldn’t be, since Votto was highlighted on the xOBA vs EV chart as well. You’d expect a guy like Votto to hit the ball a little harder than that, though, and in the past he’s averaged around 88 mph as opposed to 86.

This season, and I want to stress that we’re only talking about a third of a season here, Joey Votto and Zack Cozart have been oddly similar batters, producing very similar distributions of exit velocities and launch angles. Votto has been superior, generating about a dozen more high value fly balls, which explains their difference in home runs to date. In fact, xStats has given Cozart only 5 expected home runs, compared to the 9 he’s actually hit. For Votto, it has gifted 12 expected home runs, as opposed to the 15 he’s registered. In both cases xStats feel these guys have gotten lucky with their fly balls, and with these modest exit velocities you can see why. Great American Ball Park is a great place to hit, though.

It is remarkable that Cozart could climb his way to the active WAR leaderboard using such modest exit velocity numbers, but his swing rate, especially his out of zone swing rate have each reached career lows. His out of zone swing rate is down to 24%, down from 29% last season. He’s clearly made an adjustment in approach, which is boosting his walk rate well past his career high. More than double his career high, actually. These sorts of plate discipline changes are likely sustainable, but they have not translated to significantly better batted balls.

Don’t get me wrong, his batted balls have carried value this season, his expected .294/.385/.461 slash line is nothing to sneeze at, and a full season with that level production would be very valuable. But, at the same time, he’s heavily reliant on launch angle, and launch angle is a fickle mistress.

It should be noted that there *could* be a systematic measurement problem in Cincinnati that could be influencing these batted ball results. I strongly emphasize *could*. I don’t know of any evidence pointing towards exit velocity numbers getting impacted in this manner. That said, Cozart has yet to reach league average exit velocity in any of the three seasons we have data for, and Joey Votto appears to be hitting well below his career norm at the moment. In all likelihood, Cozart will continue this pace, and Votto will gain a few miles per hour.

I should note that this 2017 level of production is not unprecedented for Cozart. He put up similar, albeit weaker, numbers last April and May, prior to falling off in June. In 2015 Cozart suffered a terrible knee injury, tearing his ACL, LCL, and biceps tendon. Ever since this event, Cozart has struggled with knee problems, as you may imagine after such a serious injury.

Last April he described an event where his knee buckle under him while lunging to catch a ball hit by Matt Joyce. This issue, perhaps, may have been more mental than physical, he claims to have feared a reinjury to the point where minor twinges to his knee made him uneasy. Which I completely understand, having gone through a similarly serious injury to my hand. The fear of reinjury is very powerful, and it can be overwhelming at times.

It is clear that these knee problems held back Cozart last season, but I don’t know the degree to which it hampered him. I know he complained of knee problems in early May, and he missed time in August for soreness in his patellar and Achilles tendons. However, I’m not sure how much of a hindrance the knee could have been in the intervening months where, truth be told, he didn’t hit very well.

He Cozart doesn’t appear to have the bat speed to generate above average exit velocity on a regular basis, so he is forced to rely more on launch angle to generate his value. Launch angle is much less reliable, and as a result the batter has much less control over their success in the long term.

So, what should you expect out of Cozart from here on out? Well, xStats has granted him a .294/.385/.461 slash line to date. Which is still far above his career average production. ZiPS has him pegged for a .274/.335/453 rest of season slash line, while xStats has given him .254/.315/.401 and Steamer .254/.317/.415. All three systems give him 9 to 10 homers.

Personally, I take the under on his rest of season projections. Except home runs, I bet he can hit another 9 or 10 of those, but the batting average may be closer to .240 than .260, and his OPS might be closer to .700 than .800. I’m not convinced his numbers last season were limited entirely due to knee troubles, although I am certain the knee played a role as well.

## Another Outlier

On the xOBA vs wOBA chart, I’ve highlighted Nick Castellanos, as he is just about as far from the trendline as you can get. His wOBA sitting around .300, and his xOBA .381. Castellanos represents one of the largest differences between xStats and game production, as his batted ball quality has gone largely unchanged since last season but his production numbers have tanked.

His exit velocity is practically the same, albeit slightly better. His average launch angle is down a bit, but that could be a positive sign as I’ll get into in a moment. Importantly, his average fly balls and line drives have increased average exit velocity. This, as it turns out, makes a big difference.

I’ve created a Viz showing his spray chart. Note that the dimensions in this Viz, the x and y axis, are in feet.

The default Viz shows his full spray chart. If you wish to follow along please set the game years to 2016 and 2017, the angle 26 to 39, and exit velocity 0 to 100. Doing so will show you a group of batted balls. Of these, there are 5 singles, 4 doubles, 1 triple, 3 home runs, 62 outs, and 2 sac flies. That’s a .173 batting average and .373 slugging. These are not valuable batted balls.

Now, and this works especially well on a desktop computer, flick between 2016 and 2017 on the slider. If you drag the slider back and forth without letting go, you should be able to flip between the seasons very quickly so you can compare them.

Even if you cannot flip quickly, it shouldn’t take long to realize that the vast majority of these poorly hit balls occurred in 2016. To be precise, 70 out of the 94 occurred in 2016. Yes, obviously 2016 was a larger sample size to draw from. My dataset has 241 plate appearances in 2017 versus 447 in 2016. This drop in poorly hit fly balls is very significant. You would think cutting out these balls would have lead to better overall batted ball results. But, to date, they haven’t. And that’s weird.

As I said before, Castellanos is hitting the ball on the ground more often this season. Which, normally, wouldn’t be a great thing, but he has replaced poorly hit fly balls with poorly hit ground balls. Ignoring the rare occasions when fly balls result in home runs (~3-4%), the ground balls tend to have higher overall success rates. Neither category are particularly valuable, though.

With all this said, Castellanos is clearly frustrated with the start to his season. Wednesday night, after being lifted for a pinch runner, he threw his helmet, which struck Miguel Cabrera’s face. It was an accident, the helmet bounced off the bat rack, but emotions are charged. He’s frustrated, and judging from these batted ball stats, I can see why. He’s hitting the ball just as well as he did last season, but with dramatically worse results. Where do you turn if you’re hitting the ball hard but it isn’t finding grass?

Castellanos claims to be healthy, and his manager backs it up. His batted ball stats do, too, even if his production is missing. He’s slashed a .220/.302/.372 line to date, while xStats has awarded him an expected .284/.360/.543 line. He’s hit five home runs, xStats gave him 13 expected home runs. This is a gigantic disparity.

There is a chance that he’s been well shifted against in the outfield. Many of his high value batted balls have been caught, and many of those balls were hit to roughly the same area of the field. Perhaps his batted balls, even if they are valuable in the abstract sense, are predictable and easy to shift against. Or maybe he’s gone up against above average defense, I’m not sure.

So what should you expect for the rest of the season? ZiPS, Steamer, and xStats each roughly agree upon batting average and OBP, projecting .256/.314, .264/.323, and .263/.317 respectively. The slugging percentages differ quite a bit. ZiPS .428, Steamer .446, and xStats .487. Personally, I would take the over on all three of these projections. I expect him to put up numbers equivalent to his 2016 figures from here on out, that would be .285/.331/.496 with 14 home runs.

*This was written on Thursday morning, and I am happy to see Castellanos have a great game on Thursday afternoon.*

Andrew Perpetua is the creator of CitiFieldHR.com and xStats.org, and plays around with Statcast data for fun. Follow him on Twitter @AndrewPerpetua.

I picked up Castellanos a week or so ago so I’ve been studying him quite a bit. I’ve started using Baseball Savant quite a bit more recently, especially the xwOBA-wOBA search. Castellanos is on year 3 of underperforming his xwOBA:

——xwOBA – wOBA

2015 .325 – .313

2016 .379 – .354

2017 .378 – .311

I think it was you who noted that speed plays a pretty significant factor in xwOBA-wOBA, however Castellanos wouldn’t seem to really have that issue – he’s by no means fast, but he isn’t slow either. Regardless, this seems to be a trend for him and kind of an odd one because his batted ball profile is so pristine. He hits buckets of line drives, never pops up and hits the ball hard regularly. Obviously you’d like more BB and fewer K’s but neither of those numbers are terrible either and he’s actually taking a few more BB this year. Certainly I would expect his numbers to come back quite a bit, but it seems a little like he might never match his xwOBA for some reason.

For the record, I am using xOBA not xwOBA. They are two different stats using different algorithms. Although in this case they match up well so it doesn’t change your argument much.

however, I also have xAVG, xOBP, and xSLG

2015: .265/.311/.463

2016: .289/.334/.581

his actual AVG, OBP, SLG

2015: .255/.303/.419

2016: .285/.331/.496

The slugging is off in both cases, and perhaps that is why the xOBA is off. Maybe there are park factors at work here. His home runs are up by 57% both times, which might be a coincidence. The average and OBP are close enough though.

This year his xAVG and xOBP:

.283/.357

and his AVG/OBP

.230/.309

In the coming weeks I’m going to be focusing on hammering out these park effects.

re: xOBA vs. xwOBA, have you done any testing (i.e. cross-validation) to see what effect including the horizontal angle actually has on predictive power?

Also, how exactly are you calculating your averages? I’ve wondered this about these sorts of stats for a while now – the two methods that seem most appropriate are either a kernel regression or a random forest, but both have implementation choices to be made and it’s hard to interpret the results (especially re: whether we should expect the stat to be over-fit or under-fit) without knowing the details.

I’ve done some testing regarding horizontal angle, maybe not enough. I found that when you ignore horizontal angle you place way too much emphasis on the middle of the field, you end up claiming that balls hit to right and left center field are the highest value balls, and balls hit down the line have low value. When you include horizontal angle, dead center field that fail to leave the park lose much of their value, and balls down the line become very valuable.

I use kernels, and I believe MLB does as well.

I have no doubt that including horizontal angle gives a better picture of the actual value of the balls – but I could also imagine that it ends up overfitting (balls hit down the line might not be such a repeatable skill), so I’m more interested in the effect ignoring horizontal angle has on the RMSE in a cross-validation.

Mind sharing how you’re determining kernel bandwidth, and whether it’s constant or variable bandwidth? The approach I took last year when I was playing with this stuff was variable bandwidth selected via. take-away-one AICc cross-valdiation, which worked *really well* but was computationally awful (admittedly, I did my analysis in R, which is dreadfully slow, but it was a full 24 hours to do calculate the regression for a third of a season of data – and it’s O(n^2), so scaling would have been a huge problem). Constant-bandwidth regressions all seemed to wash out too much detail (though, I never got around to testing the predictive power as I mentioned above, so they may well work better in practice – I dunno!).

I’ve always wondered about the horizontal angle part too. Makes perfect sense that including it would do a better job explaining what a player should have achieved, but less clear how much predictive power it adds. you’d think it would make a difference for heavy pull guys, but it would seem to dangerously overvalue liners into the gap that could easily become liners at an OF, or vice-versa.

These days, I tend to check both Andrew’s xStats and Savant.

The real underlying problem is that the kernel regression approach, even when done correctly, isn’t predicting player performance – it’s predicting ball-in-play outcomes for balls hit on specific trajectories.

The leave-out-one cross validation I mentioned for selecting kernel bandwidths shows this clearly – the criterion optimized for is “how well, on average, is the outcome of any ball in play predicted by the regression from all other balls in play?” This does a good job of removing a fair number of factors that we would call “luck” – happenstance defensive positioning, skill of the defensive players, etc (though it also removes stuff which certainly isn’t “luck,” like defensive shifts).

However, this alone cannot predict player performance – to do that, we also have to estimate the player’s own batted-ball-trajectory distribution. Right now, all of the approaches appear to simply apply the regression to the player’s observed distribution. However, this is clearly not optimal – there’s clearly also going to be some “luck” involved in how “well” (as measured by the regression) the player has hit the ball.

So, with this approach, you have the seemingly paradoxical possibility that doing a *worse* job at estimating the kernel regression (i.e. finding the true ‘expected value’ of a ball put in play on a certain trajectory, averaged over all situations) might actually result in doing a *better* job at predicting player performance, because by “blurring” the ball-in-play regression you might wash out some detail in the batted ball distributions that isn’t necessarily repeatable (i.e., what we would call “luck”).

Looking forward, while I think doing things like ignoring horizontal angle (or, more generally, artificially broadening the bandwidths in the kernel regression) might work as a practical hack to improve predictive utility and is worth looking into, the “proper” solution is probably going to involve directly attacking the second half of the problem (i.e. accurately estimating player batted ball distributions) with more rigor. Some sort of Bayesian approach (along the lines of using the observed batted ball distribution to update from some prior “league-average” distribution) might be fruitful here, though I admit the details are likely beyond my (current) competence.

But those differences are significantly smaller than this year’s gap. (12…25…67). Could be park factors. Detroit is 2nd worse for hitters in MLB so far this year per this article: http://www.fangraphs.com/blogs/park-factors-and-other-early-season-statcast-fun/