Archive for Statcast

One Hitter, Two Hitter, Red Hitter, Blue Hitter

Brad Penner-USA TODAY Sports

How would you define Jeff McNeil as a hitter in just a few words? If you had to place him in his own “group” of hitters, who else would you place him with? Last week, I used a cluster analysis to find a player that might compare to Luis Arraez and in turn, help provide some approach recommendations for increasing his power. This week, I’ll use that same cluster analysis, with just a few tweaks, to determine what combination of Statcast and plate discipline metrics increases roto value on average. Let’s start with a refresher on my process.

Read the rest of this entry »


Luis Arraez Needs To Swing and Miss More Often

Jay Biggerstaff-USA TODAY Sports

The 2022 American League batting title was won with .316. It was the lowest batting average to earn the award in the American League since Carl Yastrzemski hit .301 in 1968. Rod Carew earned the best AL batting average in 1972 with .318 and Tony Gwynn hit .313 in 1988 to earn the NL award. But typically, the batting title is awarded for a higher average. The average batting average of players winning the batting title in both the AL and NL over the past 50 seasons has been .345. Arraez’s .316 average was impressive, but it probably won’t benefit your fantasy team when quite enough when it brings only 8 home runs along with it.

Is there room for more power in Arraez’s approach? Don’t tinker with a good thing is what I immediately think, but then again, will .316 and probably slightly below (Arraez steamer 2022: avg .305), continue to top leaderboards? Furthermore, Arraez is up for arbitration prior to the 2023 season and won’t be a free agent until 2026. He has plenty of room to work for a few extra dollars in the power department. Shoot, he even said he wanted to add power himself when speaking with two of the most powerful in Giancarlo Stanton and Aaron Judge at the All-Star game (0:36):

So, what can he do? How can Luis Arraez add a little more power without changing who he is? I’m not a swing expert, but I did stay at a Holiday Inn last night and I know how to run a clustering model on high-dimensional data. But we’ll get to that in a minute.

Let’s start with who he is. First, he’s a man who does not strike out. He had the lowest K% at 7.1% among qualified hitters in 2022. He also never swings and misses. His 2.5% SwStr% was also the lowest among qualified hitters and lower than the new kid on the block Steven Kwan’s second place 3.1%. Second, he doesn’t steal bases. Four bags in 2022 and two bags in 2021 didn’t accentuate Arraez’s ability to get on base. Lastly, he doesn’t hit for power. His .104 ISO ranked 12th from the bottom among qualified hitters in 2022. From a fantasy perspective, Arraez is not necessarily a one-sided player, but he’s close. He got on base enough times to be driven in to score enough times and both his mR and mAVG returned positive value according to our auction calculator:

Luis Arraez, 2022 YTD Value
Name PA mAVG mRBI mR mSB mHR PTS aPOS Dollars
Luis Arraez 603 $6.93 -$2.79 $3.07 -$1.53 -$3.61 $2.07 $9.51 $12.59

So where does this profile place him amongst his peers? Well, looking at a lot of columns in a spreadsheet can make it difficult to put a single label on a player. There’s just too much to sway your opinion. In order to combat this and help us create a more summarized view of many metrics, I’ll use a Principal Component Analysis (PCA) to “increas[e] the interpretability of data while preserving the maximum amount of information, and enabling the visualization of multidimensional data”. I created two sets of variables, one mostly batted ball, and plate discipline and the other Statcast metrics with a few non-Statcast metrics that more or less define power. Here they are:

Batted ball and plate discipline metrics
LD%, GB%, FB%, Pull%, Cent%, Oppo%, Swing%, Contact%, Zone%, SwStr%, CStr%

Statcast/power metrics
HR/FB,EV, maxEV, LA, Barrels, Barrel%, HardHit, SLG, xSLG

With a PCA I’m able to reduce these lists to two numbers which can then be passed through a k-means cluster analysis, grouping players into nice segments for visualization. Typically, a cluster analysis is used to gather insights on unlabeled data and it is a type of unsupervised learning. In this case, we’re using it to make comparisons we otherwise wouldn’t have:

Cluster Diagram 1

Arraez finds himself, surprisingly, in the high-power end of cluster 1. To better understand why that is, we can compare his Statcast/power metrics with the averages from cluster 1. In addition, I’ll throw in that player all the way to the left, Tony Kemp, to help us compare Arraez with his cluster-mates:

Cluster 1 Metrics
Name HR/FB EV maxEV LA Barrels Barrel% HardHit SLG xSLG
Tony Kemp 4.3 84.4 103.2 15.0 7 1.6 65 0.334 0.291
Luis Arraez 4.8 88.9 107.3 12.9 18 3.6 153 0.420 0.408
Cluster 1 Average 7.7 86.7 108.8 12.2 17 4.2 126 0.383 0.365
SOURCE: Statcast

Now we have a group for Arraez that makes sense. Next, let’s look at a few players who are higher up on the power scale, but aren’t changing too much in the batted ball/plate discipline area. Here’s our cluster image from before but with two new names identified that might be able to help Arraez inch over to the next cluster:

Cluster Diagram 2

Shifting into Cluster 3
Name HR/FB EV maxEV LA Barrels Barrel% HardHit SLG xSLG
Josh Bell 12.1 88.9 112.2 8.3 33 7.2 186 0.422 0.424
Brandon Nimmo 10.9 89.4 111.9 6.1 33 7.0 187 0.433 0.409
Arraez 4.8 88.9 107.3 12.9 18 3.6 153 0.420 0.408
SOURCE: Statcast

I am not saying that Luis Arraez should just go up there and try to be more like Josh. But I am using him as an example to determine what makes his profile more powerful. Josh Bell, 6′ 4″ / 255, and Luis Arraez, 5′ 10″ / 175, are different. While I don’t expect Luis Arraez to just suddenly increase his exit velocity, I am certain he has the skills to change his approach. One place to start would be adding more pull.

Shifting into Cluster 3
Name LD% GB% FB% Pull% Cent% Oppo% Swing% Contact% SwStr%
Josh Bell 18.6 50.4 30.9 38.4 36.0 25.7 45.3 80.6 8.8
Brandon Nimmo 17.7 50.5 31.7 32.1 38.5 29.4 43.7 82.6 7.6
Luis Arraez 25.8 41.2 32.9 31.6 37.9 30.6 42.7 94.1 2.5
SOURCE: Statcast

Josh pulls the ball more. Josh also swings and misses more often. But while not swinging and missing is really impressive in this day and age, how valuable is it from both a fantasy perspective and a real-life perspective? Increasing his swinging-strike percentage while also increasing his slugging percentage would benefit everyone involved. Arraez is already hitting the ball with decent slugging results when it’s put inside, though he could improve on high-inside pitches, and all of his 2022 home runs came off pulled balls:

Arraez SLG/BIP Heatmap

Luis Arraez 2022 Home Run Spray

While watching a player who can spray the ball all over the field is fun, Arraez’s numbers aren’t great when going oppo. He slugged .638 when pulling the ball but when he slapped the ball the other way in 2022, he had mediocre results and his slugging percentage was brought down to .364. Just look at how many outs he hit into the opposite direction:

Arraez Field Out Spray

In 2022, Arraez’s HardHit% increased from 27.8% to 30.6% when he pulled the ball. When he was ahead in the count and pulled the ball, it jumped to 32.5%. Given a little more freedom from the worry of striking out, he added more power. But, here’s where things get a little odd. Arraez put the ball in the air more often than Bell and Nimmo in 2022 and his average launch angle was higher as well. If we look at his baseball savant radial chart isolated to singles, doubles and home runs (he only hit one triple in 2022), he clearly knows how to elevate the ball to hit for power:

Luis Arraez Radial

But, without the exit velocity to take the ball out, he ends up with a lot of fly ball outs. Looking at the table above, he’s putting the ball in the air more often than Nimmo and Bell but with a significantly lower HR/FB rate.

Arraez Field Outs

Let’s summarize. Luis Arraez could be more valuable if he hit with a little more power. One way he might add power is to start pulling the ball more and leveling out his swing ever-so-slightly. This may cause him to swing and miss more often, but he can afford it. Arraez earned nearly $13 in 2022 and we should expect that to increase if he can adjust. It may seem nuts, but Luis Arraez needs to start swinging and missing more often.


Limitation of Baseball Savant’s Graphic Snapshot

John E. Sokolowski-USA TODAY Sports

I’m sure everyone has seen this graphic on Baseball Savant but if not, go take a look.

This may be the most trusted but misleading graphic used in (fantasy) baseball analysis. It was all over Twitter today with the Teoscar Hernández trade to show off his greatness. Read the rest of this entry »


Lay Off the High Ones

It’s like Dottie said, “Lay off the high ones.” But, it ain’t so easy. Just ask Kit. I feel like I’ve been seeing more swings and misses on pitches up and out of the zone. Here’s an example, courtesy of Pitching Ninja:

Read the rest of this entry »


Who Hit It Harder? Round 2

Round 1 | Round 2

Have you ever been to a circus or county fair, and they have that game where random people hit a spot with a sledgehammer and try to ring the bell at the top? With enough force, it can be done, but contestants must be strong! The game, according to Wikipedia is called the high striker. You can hear the game being played from afar, a crack of a hammer, a crowd cheering, and every once in a while, a bell ringing. You can hear the shouts too, “Step right up, step right up! See if you have the strength to ring the bell! You sir! You look like a strong man who can impress all these people. Just five bucks a whack! Step right up and show us how strong you really are!”

Part of the reason this is so fun and entertaining is because it’s one of those cases where all else really is equal. In baseball, that rarely, if ever, truly happens. Take for example two hitters who have struck the same pitch type with the same launch angle. How would you determine which one was hit harder?

Guess That EV
Launch Angle 2022 MaxEV 2022 HardHit% Exit Velocity
Player A 23 118.4 43.4% ?
Player B 23 117.4 61.2% ?

What other information would you like? The count? The pitcher? Whether or not runners were on base? Now, we’re adding in variation. We’re giving one person in our analogy a heavier hammer or maybe one of our contestants is somehow stronger when other people are watching. Ok, enough with the analogy, let’s add some variation to our baseball data points:

Guess That EV, Added Info
Launch Angle 2022 MaxEV 2022 HardHit% Count Pitcher Runners On: Exit Velocity
Player A 23 118.4 43.4% 0-0 Brent Suter Third ?
Player B 23 117.4 61.2% 2-1 Brock Burke ?

As we know, these two batted balls are not, could not be totally equal. They were in different cities with different weather scenarios with different pitchers with different runners on base. Both, however, did leave the yard for home runs:

Player A – Oneil CruzVideo Link – 113.9 MPH

Player B – Yordan AlvarezVideo Link – 114.6 MPH

So, what makes these two batted balls unique? Well, a lot actually. But, hit an in-the-zone pitch that hard and it’s going to go a long way. It just all depends on how hard you can swing the hammer. Without further ado, let’s play another round of, “Who Hit It Harder!”

 

Who Hit It Harder? – Round 2

3-2 count, sinkers in statcast gameday zone 8.Sinker Pitch Chart, Gameday Zone 8

In this exercise, I’ll give you three batted balls under somewhat similar conditions and your job is to determine which batter hit the ball harder. Here are our hitters along with some data points:

Batted Ball Data: Round 2
Batter Pitch Type Pitch Velocity Batter Stands Pitcher Throws Count
Keston Hiura SI 92.5 R R 3-2
Ronald Acuña Jr. SI 93.4 R L 3-2
Anthony Santander SI 95.3 R L 3-2
SOURCE: Statcast

Here are three heavy hitters who have stepped up to the plate and worked their way into a full count. A sinker, low in the zone comes at them and they each put the ball in play. There’s not a whole lot of differentiation here. Santander certainly had a faster pitch to hit, but both he and Acuña benefited from a righty-lefty matchup. Here’s some more information for you to use to determine who hit it harder:

2022 Averages: Round 2
Batter PA maxEV 2022 Average EV HardHit% Barrel%
Keston Hiura 156 112.4 93.3 50% 18.2%
Ronald Acuña Jr. 368 117.9 91.1 52.3% 12.7%
Anthony Santander 460 113.2 90.1 42.7% 10.5%
2022 MLB Averages 88.6 38.3% 7.6%
SOURCE: Statcast

If you use HardHit% to simply help you decide then you can just play the percentages and choose Ronald Acuña. But, what about that perfect combination of exit velocity and launch angle? Percentages tell you that Hiura finds the sweet spot more often, but that’s misleading because he’s only had 156 plate appearances. So, what do we do? How about adding in, probably the most important metric to putting this puzzle together, launch angle:

Hint 1: Round 2
Batter Launch Angle
Keston Hiura 35
Ronald Acuña Jr. -12
Anthony Santander 3
SOURCE: Statcast

Remember that what we’re after is exit velocity. We know that each of these hitters can hit the ball hard, but under these conditions, who hit it harder? We can probably assume that Acuña’s ball goes into the ground. Does a sinker hit into the ground have a lower EV than one that is put in the air? How does the pitcher influence your decision?:

Hint 2: Round 2
Batter Pitcher IP EV maxEV Barrel% HardHit% ERA xERA
Keston Hiura Adrian Sampson 샘슨 53.1 86.8 115.2 6.0% 32.1% 3.88 3.88
Ronald Acuña Jr. Ranger Suárez 107.1 86.9 115.8 6.5% 30.9% 3.52 3.64
Anthony Santander Aaron Ashby 91.1 87.8 112.3 6.3% 34.4 4.32
SOURCE: Statcast

 

Here’s one graph that will show you it’s really anybody’s guess. Balls can usually be hit with high exit velocity despite the launch angle, but typically balls hit straight into the ground, angles of -40 or below, have a hard time getting above 100 MPH.

 

Scatter Plot, LA vs. EV (Zone 8 Sinkers)

Now, it’s time to guess. Decide which hitter had the higher EV and cross your fingers. Want to see for yourself? Here are the links to each individual at-bat.

Keston Hiura video

Ronald Acuña Jr. video

Anthony Santander video

 

 

ANSWER:

Round 2: Answer
Batter Events Hit Distance Launch Speed
Keston Hiura home_run 416 110.5
Ronald Acuña Jr. field_out 349 110.8
Anthony Santander single 389 111.2
SOURCE: Statcast

It may seem strange to have a groundball single take the cake by only .4 MPH. But if this were a leaderboard, Santander would be on top. It goes to show that a high exit velocity doesn’t always translate to a home run. But, exit velocity and launch angle together do. When a sinker low in the zone just doesn’t sink enough, it can go a long way. However, these three outcomes show us that context is key. A ball hit with a proper angle and force can make good things happen. But, that’s also why a sinker, low in the zone in a 3-2 count can make a monster hitter like Ronald Acuña Jr. head back to the dugout. Now, we just need to get him, Santander, and Hiura to swing by the high striker the next time the circus is in town.


Can a Baseball Make it to the Moon?

Short answer: No.

Read the rest of this entry »


Who Hit It Harder? Round 1

John E. Sokolowski-USA TODAY Sports

Stop me if you’ve heard this one before; Aaron Judge hits the ball hard.

On June 6th, 2022 Dylan Bundy left a changeup right over the heart of the plate for none other than current 2022 MVP candidate Aaron Judge. If you’ve never seen Aaron Judge before, he’s big. He’s not the type of guy you want to leave one over the heart of the plate for. Can you guess what happened? It was smoked. The ball was scorched 116 MPH to left field and while it doesn’t look like much of an issue for the left fielder in the video below, I can guarantee that if it were me (and probably you too) in left field, there would be more avoidance of the ball than intentionally getting in front of it.

Let’s take a look:

Read the rest of this entry »


Starting Pitcher xERA Underperformers — Jun 23, 2022

I haven’t done a lot of research on Statcast’s xERA metric, but it’s similar to batter xwOBA in that it uses a pitcher’s actual batted balls against to compute what a pitcher’s ERA “should” be. That means for all those who love justifying a pitcher’s low BABIP being the result of allowing soft contact, xERA should theoretically account for this. Now, this doesn’t mean the pitcher will continue to allow the types of batted balls that have resulted in a suppressed or inflated xERA, but it does suggest that what they have already allowed should yield the calculated xERA. So let’s review the pitchers who have most underperformed their xERA marks.

Read the rest of this entry »


Early Season ISO Leaps

This early in the season it’s easy to jump all over players who are putting up big numbers. But just remember, you drafted a team while considering their projected season-long stats and while it may be tempting to drop a poor performing player at the start of the season for a hot starter, take caution. Players who hit doubles and triples and home runs early in the season can make a lot of noise, as they’re likely to put up strong category stats. Isolated power is a nice statistic that allows us to see, “how often a player hits for extra bases”. But, ISO does not stabilize until around 16o at-bats, and it will probably take another two weeks or so before we can really call this a good sample size. Let’s take a look at players who have shown strong early season ISO, how it compares to the ISO they showed at the start of last year and how that compares to their career average. Read the rest of this entry »


Creating Synthetic Data In a Data-less World

What will we do without the zeros and ones of spring training? The underground, black market .csv file that comes from the person who knows the person who operates a Rapsodo in a mini-camp? How will we go on without knowing spin rates or the depth of clay infield impression drilled by various brands of signature spikes? I have an idea, let’s make it up.

Read the rest of this entry »