Disclaimer: This is just for fun. I am, by no means, claiming that the predicted average draft positions (ADPs) described below will happen. Obviously! I’m no prophet. Also, I am not claiming these predictions are merely educated guesses. In fact, these aren’t even my predictions — they’re yours. Or, well, they’re not your predictions — they’re my computer’s predictions, but fitting your behavior to observed events.
That’s a complicated way of saying: by using historical ADP data and end-of-season (EOS) values, we can model future ADP values. (xADP, if you will.) Namely, with 2018 EOS, 2018 ADP, and 2017 EOS, we can predict 2019 ADP — and explain almost 60 percent of its variance (adjusted r2 = 0.59).
Over the years, I’ve compiled seven years’ worth of ADP data from the National Fantasy Baseball Championship (NFBC) and EOS values courtesy of Razzball — almost 5,500 player-seasons, although for these purposes I’ve narrowed my n to 2,315.
The problem with ADP (and EOS) ranks is they’re perfectly linear, whereas the values they represent are not. For example, the difference in value between the 1st and 2nd picks in a draft is not the same as the difference in value between the 399th and 400th picks. To resolve this issue, I converted the linear values (1, 2, 3 … 99, 100, 101 … 398, 399, 400) into logarithmic dollar equivalents ($58, $52, $48 … $13.70, $13.60, $13.50 … $0.17, $0.15, $0.12). In order to do this, though, I needed to construct a hypothetical league in which these values would be applied. Somewhat arbitrarily, I chose a 15-team league with 27-player rosters, such that the draft pool would be 405 players deep. I made this decision partly due to limitations with my older data. However, I’m not convinced that changing the league size — not dramatically, at least — would make much difference anyway.
(Here’s some messy math. Feel free to skip to the next paragraph if it’s not your jam.)
Because the data effectively operates as a panel — hundreds of players whose actual and perceived values are observed over time — and because we cannot dissociate ourselves from our knowledge, perceptions, and biases of these players, there exists serial correlation among the values. This correlation — the influence of 2017 EOS on 2018 ADP, and so on — artificially inflates the fit of the model. A standard linear regression of this same data produces an adjusted r2 of 0.85, which is incredible but misleading. To correct for this, I specified a first-differences regression, which subtracts the previous period’s value from the current period (for example, 2018 ADP minus 2017 EOS).
(OK, stop skipping.)
Here’s the final model:
ADPt = α*[EOSt-1–ADPt-1] + β*[ADPt-1–EOSt-2] + (εt–εt-1) + EOSt-1
(…maybe you could’ve skipped that part, too.)
I can tell you, without even really looking at the numbers, that the model will probably struggle in some areas. Obviously, it won’t know if a player has retired. It’s unaware of seasons truncated by injury or midseason debut. It has no concept of prospect hype and will absolutely whiff on Vladimir Guerrero Jr., who has yet to debut but will be the prospect gem with tons of helium in 2019 fantasy drafts. It is blissfully unaware of the out-of-control Adalberto Mondesi hype train. You can yell at me all you want about how stupid that is, but you’d be yelling at a computer that can’t think for itself.
It’ll be months before we see this work bear fruit, but we can at least compare the results of the Too Early Mock Drafts that ran in September, the data for which were compiled by the enigmatic Smada.
If there are any glaring omissions, it’s probably because the player made an impact in 2018 despite being drafted outside the top-405 players. Let me know who’s missing and I can give you the ADP value!
I’m also curious to hear your thoughts — on the first round, on perceived accuracy or inaccuracy, everything. My computer wants your feedback. At the end of the day, though, this isn’t anything more than a stupid data trick. It’ll be fun (for me, at least) to see how close this gets.