# Can a Baseball Make it to the Moon?

Earth’s escape velocity is just under 7 miles per second or 25,200 miles per hour, and even though Miguel Sanó did hit one ~117 MPH in 2021, he needed just a little more juice to leave the atmosphere. No, “moon blasts” is really just a catchy name given to a contest hosted by FTX and MLB where contestants guess who will hit the longest homerun the rest of the season (after August 10th). The same contest was held last year and as long as Statcast’s measuring system was used, Sanó’s 495-foot bomb at Fenway Park on August 25th was the winner.

Take a look:

Ho-ly-smokes! What more is there to write? That was 2021’s contest winner, but who will win in 2022? Not only are contestants expected to guess the batter, but they must also guess the distance and the type of home run (solo, one-run, two-run, etc.,). Let’s start with the first portion, who will hit it and how far will it go? Before we go there, let’s just take a look at the distribution of hit distances so far this season:

Anything above 450 feet is a bomb. Ok, got it. We have a feel for how far it should go, now who is going to hit it the farthest? In order to aid an answer to this question, I built a simple multi-linear regression model on all home runs hit between August 10th and the end of the 2021 season. Since I’m trying to predict a home run in those months, I need to train my data on home runs hit in those months. While the above histogram shows 2022’s home runs so far, this histogram shows all home run distances hit after August 1oth, 2021:

My hope is that training data on 2021’s end-of-season distances will alleviate some, but not all, seasonal patterns. In order to keep it very simple, I’ve only used exit velocity, launch angle, whether the batter was a lefty or a righty, and the park in which the home run was hit in order to “predict” the distance. The R-squared of the model is only 0.49, or the model only explains 49% of the variance in the dataset. I certainly could experiment further with added features, but I will need to deploy this model on a dataset to make predictions and I can’t know ahead of time something like which way the wind will be blowing in San Diego on August 28th. I can’t even know for sure who will be hitting on a regular basis. That’s what makes using a model so tough. I’ve “taught” the model a few things about what influences the distance of a home run, but now I need to produce something it can make predictions from.

It does help that I know the rest of the season’s schedule and I used this as a jump-off point for creating some simulated data. I went over to our statcast leaderboards and selected the top 100 hardest hit ball leaders (maxEV) and I’ve simulated four plate appearances per stadium after August 10th (the contest’s deadline to submit). Take Giancarlo Stanton for example. I know that the Yankees will play in Boston on Wednesday, September 14th and I can simulate a few different exit velocities for Stanton. I can also attach those exit velocities to his 2022 average launch angle. The simulated data looks like this:

Deploy Data – Stanton @ Fenway
Name EV LA Park Stand
Giancarlo Stanton 95.1 9.1 BOS R
70.4 9.1
82.8 9.1
119.8 9.1
107.5 9.1

It’s a really simple simulation. Certainly, this could be improved upon. I could vary his launch angle, I could add more variance to his exit velocities. I could even extend beyond his max, and test how far he could hit a dinger in Boston after really eating his Wheaties. But, hey, I have a day job too. Now that I have this simulated data, I can pass it through my model to see how far each variant could potentially travel.

Predictions – Stanton @ Fenway
Name EV LA Park Stand Predicted Distance (Feet)
Giancarlo Stanton 95.1 9.1 BOS R 361.0
70.4 9.1 270.2
82.8 9.1 315.6
119.8 9.1 451.7
107.5 9.1 406.3

My simple model tells us that if Stanton hits a ball at Fenway at his maxEV on the year, 119.8 MPH, then he could potentially hit beyond 450 feet! That’s a bomb. We’ve already addressed this. But is this possible Stantonian blast in Fenway the farthest hit ball predicted by my model? No. He’s not predicted to hit the farthest home run, but he is in the top 10, twice:

Farthest Hit Home Run Predictions, 2022 ROS
Name EV LA Park Stand Predicted Hit Distance (Feet)
Shohei Ohtani 119.1 12.7 CLE L 481.6
Amed Rosario 115.8 5.2 CLE R 468.5
Mike Zunino 115.0 20.8 CLE R 467.4
Gary Sánchez 114.7 14.3 CLE R 465.4
Luis Robert 114.8 9.5 CLE R 465.2
Carlos Correa 114.6 10.3 CLE R 464.5
Giancarlo Stanton 119.8 9.1 LAA R 464.1
Cal Raleigh 114.0 23.6 CLE R 463.9
Giancarlo Stanton 119.8 9.1 MIL R 463.7
Cal Raleigh 114.0 23.6 CLE L 463.2

First things first, why does Cleveland show up so many times? Remember, my model is learning from the end of the 2021 season. Cleveland was home run friendly with a 2021 HR as L factor of 103 and a HR as R factor of 99. In addition, Progressive Field hosted the fourth most home runs in the backend of 2021 with 71. Finally, the top 100 hitters by maxEV so far this year are headed to Cleveland a few more times in 2022. My simulation created 228 plate appearances in Cleveland, the fifth most. That measure is not equivalent to actual, expected plate appearances because I didn’t duplicate games. In other words, Giancarlo Stanton will likely get more than five plate appearances in Fenway by the end of the season, but I’m only simulating five. Again, I have a day job.

So now, it’s up to what we believe. Can Shohei hit a 482-foot bomb at Progressive Field? I think so. He’s certainly hit a few home runs there before:

I’m going to stay true to an analytical approach here and chose Ohtani. There are certainly things that could be tweaked when it comes to the modeling process. Maybe you can build a better one. Or, maybe you can just take the advice of a commenter in last year’s version of this article and, “…just go with Giancarlo Stanton” Either way, you have until tomorrow at midnight EST to submit your pick and you can do so here. Good luck!