We are officially in the second half of the season. I know, I know, people like to call the All Star Break the middle point. I can understand why, it is a natural (artificial) divider in the season in the same way a river or a mountain range may divide countries. Personally, I prefer to look at the game totals, and there are fewer ahead of us than there are behind us.
Players who got off to hot starts have mostly come back to Earth. Mostly. Aaron Judge is seems to have enough delta v to effect an Hohmann transfer, so he could be leaving Earth at any moment. Beyond that, players should be settling into their normal, expected production for the season.
Speaking of expected production, the halfway point seems to be a good time to take a look at power numbers, particularly home runs. For the past few years I have run a little side project called Citi Field Homeruns, where I meticulously track the home runs in Citi Field, aiming to measure the impact of changing ballpark dimensions. Okay, so this explanation is largely irrelevant to my goal here, other than to establish a few small pieces of information:
First, I have observed that home run rates in April-June are lower than July-September.
Second, you can accurately predict season home run totals using only one month worth of data, although more months generate a better prediction.
Third, most importantly, I don’t know if either of these points are generalized to all of major league baseball or if they only apply to Citi Field.
That last piece is key, and I don’t want to gloss over it. However, I want to touch on that first point briefly. The first half of the season has fewer home runs than the second half. Below I have a table depicting the home run rates by month through the 2016 season in Citi Field, this is eight seasons worth of data. Again, I don’t *know* that this can be generalized to other parks, but I am almost certain it can be.
|Month||Percent of Total|
Remember, not every month has the same number of games. April and June tend to have fewer home games in Citi Field. April, I believe, is mostly due to the frequent off days during that month. You generally have 2 or 3 off days in the first week, for example. The schedule can also get pretty wonky if a team in your division is opening a new stadium, which happened this year in Atlanta. I am not sure why June has fewer home games. It is genuinely a mystery to me. If you have an idea, I’ll gladly listen.
July has fewer home games as well, due to the All Star Break, but the home run rate is so high during that month that it totally overwhelms the fewer games.
So what is the point of this? Two fold. First, notice that the first half of the season represents 44% of the home runs. Second, temperature and humidity each play a large role, and this is something you always need to keep in mind when looking at predictive home run stats, which is really the purpose of this piece.
Expected Home Runs
I have a stat called xHR, expected home runs. I take each batted ball and draw a series of concentric spheres around them, where each of the three dimensions are exit velocity, horizontal launch angle, and vertical launch angle. The smallest sphere contains balls that are practically identical to the given batted ball, while each larger sphere contains balls that are increasingly different. The very largest sphere contains balls that can be radically different, and mostly serves to sort of smooth out the data so it isn’t as lumpy. I find the average success rate for each sphere (in this case, HR/BIP) and weight each sphere based upon its size. The smallest sphere is weighted heavily, and the largest barely at all.
The exit velocities I use are adjusted for game time temperature, which is supplied by major league baseball. I am not sure exactly how this is measured, I believe it is done by a nearby weather station at some point prior to first pitch. Hopefully in the very near future I will move towards a superior system, and I am currently working on adding both humidity and perhaps a more granular temperature reading. Maybe temperature at the start of each inning, for example.
Exit velocity is also adjusted for what you could call ‘ballpark bias.’ This is a controversial step, and I am not entirely sure it is necessary, but it seems to increase the predictive nature of the stats so I will continue doing so.
These two exit velocity adjustments can range from -2.5 to +2.5 mph, depending on how extreme the conditions may be. In general, velocity is changed by roughly 1 mph or less.
Finally, the home run rates are compared to a homerun park effect. For example, Citi Field has a home run park effect of 1.036 for left handed batters and 1.029 for right handed batters.
Players with large differences between HR and xHR
Alrighty, everything up until this point hopefully set up a backdrop for how I want you to look at the following data. Keep in mind that we are roughly 53 or 54% of the way through the season but, according to my Citi Field Homerun data, only about 47% to the season home run total.
Keep in mind that temperature plays a large role in home runs, but that I do my best to account for it. Humidity is important, too, but I currently cannot account for that.
Okay, let’s look at a few batters who have the most extreme differences between their HR and xHR rates. In order to achieve a reasonable size I limited this to batters with at least 200 PA, a season total of at least 12 home runs, and a delta of at least 3.
(to make the table size reasonable)
The names at the top of this list probably aren’t big surprises. Moustakas, Morrison, Gonzalez, and Alonso have all set career high home run totals halfway through this season, and have xHR totals that are 6, 4.9, 4.3, and 3.7 lower than their actual home run totals. On an intuitive level, this probably makes sense. They have likely played above their heads, even when you consider changes in approach. However, I am sure you’re wondering what exactly this might mean going forward.
Well, conservatively, you should assume their home run rates will regress. However, considering home run rates in July and August tend to spike, what exactly might a regression look like?
Perhaps you should divide the expected home run total by .47 (HR% to date) and subtract the expected number to date. More simply, you can multiply xHR by 1.13 to achieve the same effect. I have done so in the table below.
By outward appearance you might assume several of these guys are on pace for 50 home run seasons, and if they keep up their current pace some of them could exceed that mark. However, if you assume their home run rate will fall back to their expected rate from here on out, none of them appear likely to hit many more than 45 or 46 homers on the season. Which is certainly still a great season, don’t get me wrong.
On the flip side, Moreland, Goldschmidt, Machado, Arenado, and Dozier each seem to be lining up for big second halves. Machado’s all or nothing power numbers to this point in the season might have you sick of trading batting average and OPS for dingers. I totally understand that, I’m in the same boat. The others, though, this should bring a smile to your face if you own Moreland, Goldy, and Dozier. Each of them are on pace for 30+ homer seasons. For Goldy and Dozier that isn’t a surprise, but always nice to see. For Moreland, though, that could be a big boom. Especially when you look around towards Steamer and ZiPS who are each projecting 8-9 for the remainder of the season.
Granted, these monthly home run percentages may only represent your standard temperate environment, and as a result shouldn’t be applied to places like Arizona or San Diego. Maybe you’re better off dividing Goldschmidt’s xHR by .54 (games played) instead of .47 (HR% to date). This drops his rest of season total to only 19, and an expected season total of 38, more in line with his career mark.