One Hitter, Two Hitter, Red Hitter, Blue Hitter
How would you define Jeff McNeil as a hitter in just a few words? If you had to place him in his own “group” of hitters, who else would you place him with? Last week, I used a cluster analysis to find a player that might compare to Luis Arraez and in turn, help provide some approach recommendations for increasing his power. This week, I’ll use that same cluster analysis, with just a few tweaks, to determine what combination of Statcast and plate discipline metrics increases roto value on average. Let’s start with a refresher on my process.
First, I downloaded all data from the Statcast and plate discipline leaderboards for qualified hitters in 2022. Second, I conducted two Principal Component Analyses, one on the Statcast metrics and one on the plate discipline metrics. This gave me one nice number per player that encompassed all of their Statcast numbers and one nice number per player that encompassed all of their plate discipline metrics. With those numbers I created this visual:
Now, you don’t know what’s good and what’s bad, do you? That’s the trouble with the PCA. It’s a great way to summarize but you lose some of the interpretability of the metrics. So, let me break down each cluster by showing you the averages across all players in each cluster:
Cluster | EV | maxEV | LA | Barrel% | HardHit% | AVG | xBA | SLG | xSLG | wOBA | xwOBA |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 90.8 | 113.9 | 13.0 | 10.9 | 46.1 | 0.279 | 0.273 | 0.485 | 0.477 | 0.357 | 0.354 |
2 | 86.4 | 108.0 | 13.2 | 3.3 | 29.1 | 0.261 | 0.254 | 0.382 | 0.357 | 0.314 | 0.304 |
3 | 89.3 | 111.5 | 15.5 | 9.9 | 40.8 | 0.232 | 0.236 | 0.406 | 0.410 | 0.315 | 0.321 |
4 | 89.9 | 112.2 | 13.3 | 8.8 | 42.1 | 0.268 | 0.258 | 0.456 | 0.432 | 0.347 | 0.337 |
5 | 88.3 | 110.4 | 10.3 | 5.5 | 37.3 | 0.261 | 0.256 | 0.394 | 0.383 | 0.312 | 0.309 |
6 | 90.3 | 112.8 | 13.3 | 11.0 | 44.7 | 0.256 | 0.253 | 0.449 | 0.447 | 0.338 | 0.339 |
Blue – Min
Yellow – Max
–
Cluster | O-Swing% | Z-Swing% | Swing% | O-Contact% | Z-Contact% | Contact% | Zone% | F-Strike% | SwStr% | CStr% | CSW% |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 33.9 | 73.5 | 49.7 | 64.6 | 87.0 | 77.9 | 39.8 | 60.8 | 11.0 | 14.1 | 25.1 |
2 | 30.9 | 66.1 | 45.8 | 76.7 | 91.6 | 85.8 | 42.3 | 60.6 | 6.6 | 18.1 | 24.7 |
3 | 30.9 | 68.9 | 46.5 | 61.7 | 83.5 | 74.9 | 41.0 | 60.7 | 11.8 | 16.5 | 28.2 |
4 | 31.0 | 69.9 | 46.9 | 67.0 | 87.2 | 79.4 | 40.9 | 59.8 | 9.7 | 16.0 | 25.7 |
5 | 35.2 | 70.0 | 49.4 | 69.4 | 88.9 | 80.6 | 41.0 | 62.3 | 9.8 | 16.0 | 25.8 |
6 | 31.6 | 69.0 | 47.0 | 62.9 | 85.2 | 76.3 | 41.0 | 60.7 | 11.2 | 16.4 | 27.6 |
Blue – Min
Yellow – Max
Now, let’s try to classify each of these clusters:
Cluster 1 – Let the big dogs eat. This group rules in nearly all statcast metrics and is just barely beaten out for the Barrel% prize by Cluster 6. They swing often and don’t see the ball in the zone often. Ex: Aaron Judge.
Cluster 2 – Contact. These hitters are not being fooled, swinging less often but making contact when they do. They have good averages but could benefit from increased power. Ex: Jeff McNeil.
Cluster 3 – Whiffs. These players are not making as much contact, swinging and missing a lot, and have low averages/expected averages. Ex: Cody Bellinger.
Cluster 4 – Good, not great. Don’t let the colorless cells fool you, these hitters are good. They make good hard contact, get on base often and rarely get fooled. Ex: Juan Soto.
Cluster 5 – Aggressive. These hitters are swinging outside of the zone often, at the first pitch often, and getting on base less often. Ex: Nick Castellanos.
Cluster 6 – Expected. These hitters are accomplishing what x-stats say they should. They hit the ball hard, but could perhaps benefit from a little more patience. Ex: Ryan Mountcastle.
–
Now to put the cherry on top, which cluster provided the most hitting value (mSB excluded from calculation) in roto dollars? Well, we can use the YTD 2022 auction calculator to find that, as you would guess, cluster 1 wins because they had Aaron Judge. But, what about the others? Here’s the average value produced from each cluster in 2022:
Cluster 1 $23.61
Cluster 2 $4.69
Cluster 3 $4.71
Cluster 4 $16.39
Cluster 5 $4.46
Cluster 6 $13.60
–
If you’re wondering what all of this means for your 2023 season draft, you’re not alone. One thing that is very interesting, however, is that cluster 4 has no statcast max level yellow highlighting it’s row, yet these are the second most valuable players. Cluster 6 is doing what we expect them to do, they are valuable and they own the barrel% category. While everyone is looking at MaxEVs and baseball savant profiles, these clusters may be helpful in finding value at discounts. Now, here’s a very long table of each player and their respective cluster. Enjoy.
Great work.
I think there are always too many categories. It would be interesting to see which ones are correlated and either combine or remove them.
MaxEV, AvgEV, Barrel%, HardHit%, SLG, xSLG, and a useful xISO (xSLG-xAVG). Get them down to one metric.
It’s the same with plate discipline numbers.
My guess you’ll end up with a power value, a LA value, contact value, and an OOZ/chase value.