Reverse Engineering the Sharks: Intro

Gambling and baseball go way back. Everyone, casinos and betters, look to gain an edge with the hope of free money. Today, I’m starting a series diving into the lines with the hope of finding which information, fantasy or otherwise, can be extracted from the sharks and books using the FanGraphs win rate as a baseline.

To start the study, I used the game projections available here at FanGraphs and the historic Vegas line. While the FanGraphs projections were right more times than wrong, there was a systematic error I quickly found. The home-field advantage was set to 8%. The home field teams were expected to win 54% of the time and lose 46% (math!) for the 8% spread. I looked back at the league-wide home-field advantage over the past dozen season and found a smaller margin.

Year: Home Field Advantage
2019: 5.8%
2018: 5.6%
2017: 8.0%
2016: 6.0%
2015: 8.2%
2014: 6.0%
2013: 7.6%
2012: 6.6%
2011: 5.2%
2010: 11.8%
2009: 9.8%
2008: 11.2%

The spread used to range from 10% to 12% but now it’s closer to 6%. I informed the Mr. Appelman of this discrepancy and he will get around to changing it once he’s done changing diapers for his first baby.

So how does the small difference improve the odds of just picking the favorite? Not a bunch and in 2017 when the difference jumped to 8%, the 8% difference unsurprisingly did better.

Note: The historic Vegas odds spreadsheet I have doesn’t differentiate between the first and second game of a doubleheader, so they were removed from the analysis.

Win Rate of FanGraphs & Vegas
Year 8% 6% Vegas
2014 55.9% 55.6% 56.3%
2015 55.6% 55.4% 57.3%
2016 56.4% 56.8% 58.6%
2017 56.9% 56.8% 58.0%
2018 57.5% 58.0% 57.6%
Overall 56.4% 56.5% 57.6%

Now the difference isn’t huge, but I didn’t expect it to be. Some games are just not competitive, and the correct rate will be dominated by these expected blowouts.

The other difference is that Vegas has over a 1% advantage over our win rate. In gambling, this percentage is huge. Being 1% below the win/loss means the vig(orish)/rake/take can’t be overcome. It’s time to explain a little on vig. For those who know all the ins-and-outs of gambling, skip the next two paragraphs. For those not as familiar, keep reading.

Just because Vegas favors one team over another, it doesn’t mean they come close to giving 50/50 odds. For baseball, Vegas usually gives themselves a 3% cut if all bets. Gamblers must overcome this difference to turn a profit over the long run. Right now, the projections are making that breakeven point even larger.

The Vegas’s Breakeven Win% is the percentage of times the team needs to win for the bet’s payout to breakeven. For example, say Team 1 has the odds of +100 and team 2 has the odds of -110. If a bettor put $100 on Team 1 and the team wins, they’ll win $100. So, in this instance, the bettor needs to Team 1 to win just 50% of the time to breakeven. To bet Team 2, the owner needs to put down $110 to win $100. When betting Team 2, the owners needs to win 52.3% of the time to breakeven. The extra percentages needed to breakeven is how Vegas sportsbooks make money. For future reading, here is another explanation of determining the breakeven win rate and a calculator/table to quickly determine the odds.

With the win rate out of the way, it’s time to dive into a weekend’s worth of games to find the biggest disagreements and why they exist. I collected the lines from Friday to Sunday with some games missing. I was slow on Saturday for the early games and Vegas wasn’t offering lines on some others.

To break down the differences, I’m going to start going through the games with the biggest differences and look for a possible cause. For reference, the FanGraphs win projections use the Depth Chart projections (average of ZiPS and Steamer). Also, I don’t care one bit right now on the results of these games. I’m more interested in finding how the inputs work than the actual results.

Game 1 Friday Tigers at Athletics
Favorite: Athletics (Both)
Vegas Breakeven % (Athletics): 75.2%
FanGraphs Win%: 62.7%

Detroit’s Spencer Turnbull started out the season decent, but our projections don’t like him with an ERA over 4.50. He’s gotten worse in the second half with a BB/9 over 5.0 and ERA and estimators over 5.00. Also, Homer Bailey has quit walking batter in Oakland (3.8 BB/9 to 1.8 BB/9). The starting pitcher projections explain some of the difference.

All three of the Tigers-A’s games had a huge spread. I wonder if bullpen quality is correctly being taken into account by FanGraphs. The Tigers this season have a bullpen ERA of 4.93 while the Athletics have it at 4.08. I need to talk to Appelman later to find out what is going on here.

Game 2 Friday: Cubs at Brewers
Favorite: Cubs (FanGraphs), Brewers (Vegas)
Vegas Breakeven % (Brewers): 51.2%
FanGraphs Win%: 43%

The projection we use for Cole Hamels had him right above a 4.00 ERA. Since coming off the IL in early August, his velocity is down 1 mph and his ERA is over seven and his ERA estimators are over five. FanGraphs seems to be slow at adjusting his projection.

Game 3 Friday: Angels at White Sox
Favorite: White Sox (both)
Vegas Breakeven % (White Sox): 60.9%
FanGraphs Win%: 53%

I’m not surprised by this movement. FanGraphs has Lucas Giolito’s talent over a 4.25 ERA with his ERA and estimators near 3.50. It’s a huge difference and it can be seen in the line.

Game 4 Friday: Rangers at Orioles
Favorite: Orioles (both)
Vegas Breakeven % (Orioles): 58.5%
FanGraphs Win%: 53.4%

I think a name game was happening where people heard of the Orioles Dylan Bundy and not of the Rangers Brock Burke. The FanGraphs projections have both starters with similar projections (4.89 ERA vs 5.10 ERA), and lineups, so the only difference was the home-field advantage. Vegas had more faith in Bundy.

Game 5 Friday: Yankees at Red Sox
Favorite: Yankees (Vegas), Red Sox (FanGraphs)
Vegas Breakeven % (Yankees): 55.6%
FanGraphs Win%: 48.7%

The Red Sox started Jhoulys Chacín (4.50 to 5.50 ERA talent) and the Yankees went with Domingo Germán. Our projections had Chacin with an ERA just under 5.00 and his xFIP dropping from 5.33 in the first half to 3.80 in the second half. Just on the surface, the game feels like an edge for the Yankees, but the home field advantage and Chacin not being complete garbage moved the FanGraphs line to the Red Sox.

Game 6 Saturday: Blue Jays at Rays
Favorite: Rays (Both)
Vegas Breakeven % (Rays): 73%
FanGraphs Win%: 62%

I’d love to point the finger at rookie Anthony Kay (MLB debut) being the difference, but the FanGraphs projection had Kay at a 5.50 ERA. It’s tough to think Vegas expected him to be worse.

Game 7 Saturday: Cardinals at Pirates
Favorite: Cardinals (Both)
Vegas Breakeven % (Cardinals): 60.3
FanGraphs Win%: 51.2%

I couldn’t find any discrepancies in the projections of the two starters, Adam Wainwright and Steven Brault. All the hitters were in the lineup. No idea.

Game 8 Saturday: Tigers at Athletics
Favorite: Athletics
Vegas Breakeven % (Athletics): 75.7%
FanGraphs Win%: 68.7%

Vegas is just punishing the Tigers, and in this instance, they may have made their move because of Chris Bassitt. Bassitt has cut his walk rate from 3.7 in the first half to 1.9 in the second half. Jordan Zimmermann has a projection near a 5.00 ERA and that should be his talent level in my opinion.

Game 9 Saturday: Giants at Dodgers
Favorite: Dodgers (both)
Vegas Breakeven % (Dodgers): 68.9%
FanGraphs Win%: 62.6%

Both starters entered the game with similar FanGraphs ERA talents with Tony Gonsolin at ~4.75 and Tyler Beede at ~5.00. On the other hand, the pair’s actual ERA has more of a spread with Gonsolin at 2.81 and Beede at 5.33. The difference might be in the bullpen with the Giants at a 4.81 ERA and the Dodgers at 3.15.

Game 10 Saturday: Mariners at Astros
Favorite: Astros (both)
Vegas Breakeven %(Astros): 80.7%
FanGraphs Win%: 74.7%

The advantage for the Astros was huge with Justin Verlander facing Yusei Kikuchi. I think the difference may be reasonable with my evaluation of Kikuchi’s talent nearly a half run over this projection. Also, there is a huge difference in bullpens with the Astros team ERA at 3.74 and the Mariners at 5.00.

Game 11 Sunday: Cardinals at Pirates
Favorite: Cardinals (both)
Vegas Breakeven % (Cardinals): 68.5%
FanGraphs Win%: 51.5%

Damn what a difference … again. The gambling public did not have much faith in James Marvel’s major league debut against Jack Flaherty. Marvel came into the game with a 5.00 ERA projection which seems fine to me. The difference would be with Flaherty who had an ERA projection near 3.75 but with some second-half changes, it’s closer to maybe 3.25. The Cards did have some advantage in the bullpen of 4.35 vs 4.81. Vegas seems to punish starters making their MLB debut (see Anthony Kay above) so maybe it’s something I need investigate.

Game 12 Sunday: Detroit at Oakland
Favorite: Oakland (both)
Vegas Breakeven %(Oakland): 75.7%
FanGraphs Win%: 68.9%

This game was Sean Manaea’s second start so FanGraphs had his talent near a ~4.50 ERA. His SIERA and xFIP have him near at 4.00 ERA but bettors may be focused on the 0.75 ERA and 11.3 K/9. Also, there is the 0.85 difference in bullpen ERA.

Game 13 Sunday: Phillies at Mets
Favorite: Mets
Vegas Breakeven % (Phillies): 40.8%
FanGraphs Win%: 34.6%

It’s easy to see why the Phillies were projected to lose with the Mets at home starting Noah Syndergaard. Let me start with Phillies starter, Vince Velasquez. His FanGraphs projection and my evaluation of his talent have him just under 5.00 ERA. If anything, his projection should be worse with declining velocity and results in the season’s second half. With Syndergaard, he’s seen his second-half results improve (4.26 xFIP to 3.32 xFIP). The Mets bullpen had performed better with a 4.09 ERA with the Phillies at 4.60. For me, I can’t see why the Phillies got some additional love.

Those are the unlucky 13 games which had the biggest spreads between the Vegas breakeven rate and the FanGraphs rate. From just these games, I have four items to investigate going forward.

  1. Dig into the FanGraphs code a bit and find out how much a difference in ERA (projected versus assumed) changes our winning percentage.
  2. Determine how bullpens are accounted for our projections.
  3. Find how pitchers perform in their major league debut.
  4. Is familiarity bias causing some spreads to widen?

With just a first stab at three days of data, I have created hours’ worth of research for myself. As always, I would love any feedback on any of the individual games or overall conclusions. With this background, I’m going to continue going over the biggest spreads and figure out why differences exist.

We hoped you liked reading Reverse Engineering the Sharks: Intro by Jeff Zimmerman!

Please support FanGraphs by becoming a member. We publish thousands of articles a year, host multiple podcasts, and have an ever growing database of baseball stats.

FanGraphs does not have a paywall. With your membership, we can continue to offer the content you've come to rely on and add to our unique baseball coverage.

Support FanGraphs

Jeff, one of the authors of the fantasy baseball guide,The Process, writes for RotoGraphs, The Hardball Times, Rotowire, Baseball America, and BaseballHQ. He has been nominated for two SABR Analytics Research Award for Contemporary Analysis and won it in 2013 in tandem with Bill Petti. He has won three FSWA Awards including on for his MASH series. In his first two seasons in Tout Wars, he's won the H2H league and mixed auction league. Follow him on Twitter @jeffwzimmerman.

newest oldest most voted

Would love to see more gambling content on here!