The Great Valuation System Test: The Process

March 16, 2015

It all began with a comment by Jason Bulay, aka The Stranger, on a post I published in January pitting two popular snake draft strategies against each other — best player available and position scarcity:

I posted this as a comment to Cwik’s article yesterday, but I really want to see everybody put their money where their mouth is with all the draft strategy/player valuation theory. Do a draft (or multiple drafts, because small sample size) where everybody will be scored using 2014 stats (or 2015 projections if you prefer). After the draft, add up everybody’s roster using standard 5×5 scoring. See who wins, and what kind of draft strategy really gets the best roster. See who loses, and mock them without mercy.

Not that I disagree with you – your points make sense. But I think this is something we can actually test and I’d love to read about the results, so why not give it a shot?

His idea was intriguing. We’re all very familiar with the various projection systems and know that the masterminds behind them continually strive to improve them. They are also tested every year and we learn which performed best. But valuation systems get none of this treatment, as there has seemingly been little to no progress made on properly valuing players since the original systems were developed and shared.

Jason eventually followed up with an email and after many back and forth messages, we finally settled on exactly what we wanted to test and how we would accomplish our goal.

Initially, the idea was to try determining what the best draft strategy was, as that was essentially what my article was about. But with so many factors influencing what happens during a draft and what the optimal strategy is for your next pick, I figured it would be impossible to test. So instead, we eventually decided to keep things simple and develop a test of the various valuation systems. Which is the most accurate? Is it possible to determine the best system and how would we do so? Is there even a meaningful difference between the values the systems spit out or should we only concern ourselves with using the best set of projections?

As a super nerd who both forecasts player performance and then calculates dollar values, I wanted to know if the system I use is actually any good. Even if the projections were 100% accurate, if the system overvalues a category or inflates the values of middle infielders too dramatically, my auction performance would suffer. And of course, we all want to know if Billy Hamilton could truly be worth a second round pick.

The Process

From the beginning, we quickly agreed that it was only worth testing hitters. It made things much easier that way, plus I am operating under the assumption that the valuation systems are in more agreement over pitcher values than hitters.

Once we figured out what we were testing, the next step was to discuss how the testing would be performed. After a spirited discussion, I eventually convinced Jason to do things my way. We would first somehow put together or collect 2014 fantasy team rosters from as many leagues as we could. We would then use 2014 stats to generate the standings. Last, we would gather as many final 2014 dollar values as possible or calculate them ourselves, ensuring the league format these values were derived for was the same.

Once we had all our rosters and dollar values, the actual testing could take place. The testing would simply consist of correlating the dollar values earned with the standings points achieved by each team, in addition to correlation the ranking of dollars earned with the standings rank (e.g. 1st place, 2nd place, etc) for each team.

With the plan in place, we now had to gather team rosters from 2014 leagues. At first, we thought we might have to conduct mock drafts asking owners to assume 2014 stats. This would have been a nightmare. We soon came to our senses and I got in touch with Rudy Gamble of Razzball who was awesome enough to provide the team rosters for every 2014 Razzball Commenter League they ran. In total, there were 84 leagues in his file, though we only used 50 of them. Jason’s head would have exploded if I asked him to do any more than that.

The teams were composed of your standard 14 active hitters, minus a catcher. So it’s a one catcher league with 13 total hitters. I wanted to test leagues with two catchers, since the position is given the greatest position scarcity boost, which seemingly differs drastically depending on the valuation system. But we had to work with what we were given.

The Valuation Systems Tested

In total, we tested a whopping 13 sets of values. What follows is a short description of each:

Replacement Level System (REP System developed by Todd Zola of Mastersball) — This is the system I have used for over 10 years. I calculated the values used in the test myself, but given the nature of the system, two people calculating values using the same stats could still get different results.

I explained a little bit about how this system works in my Hamilton post linked to above. It adjusts player stats based on a mythical “replacement” player from each position and then calculates the player’s statistical contribution as a percentage of the category total from the positively valued player pool. It converts that total into categorical dollars earned, with that process being repeated for each category and then all summing to a player’s total worth. All categories are allocated an equal dollar amount and total player values will always equal the amount of the total league budget you choose to allocate to hitters.

Z-Scores — I used two sets of z-score values. First was Zach Sanders’ version taken directly from his end of season values post, which includes a link to his explanation of how the system works. Second was a version that my project partner Jason generated.

Last Player Picked — Values were generated at DraftBuddy, which currently hosts the LPP dollar value calculator. You can read a full explanation of how values are calculated here. In browsing through the various parts of Mays Copeland’s explanation articles, it seems that the system factors in the stats of the average player, rather than the replacement as per the REP system above, and then accounts for the standard deviation, before adjusting for position. Like REP though, it also involves a series of iterations to determine the pool of positively valued players.

ESPN — The only system without actual dollar values, we used the number from their Player Rater, which is essentially some sort of value, but on a different scale. The methodology is a black box.

Razzball Point Shares — Rudy explained the system to me as follows:

I use an SGP-based process where I subtract the average hitter or pitcher value (vs replacement hitter or pitcher value) for each player. I sum these up and then add the replacement level value to each hitter or pitcher. So if an average hitter had 0.0 Point Shares (i.e., SGPs) and the replacement level had -2.9 Point Shares, that average player is boosted to 2.9 Point Shares. The Point Shares for the modeled rostered universe are summed and then divided into the hitter and pitcher budgets respectively.

The position adjustment happens at the beginning. A position factor between 0 and 1 is set (it was at 75%, now set to 0). The initial calculation for ‘average hitter’ is PosFact * (Player Category Total – Average Category Total For All Roster-Worthy Players At That Position) + (1-PosFact)*(Player Category Total – Average Category Total For All Roster-Worthy Players). So a SS with power and no speed (e.g., Jhonny Peralta) would have higher $HR contributions because SS hit for less power BUT would take a bigger penalty on $SB since SS steal more than the average hitter.

We actually tested six different values from Razzball. We first included their posted values from their Player Rater. Then Rudy altered his system (with explanation above) and sent me the values from his new methodology with no positional adjustment and then 25%, 50%, 75% and 100% adjustments. Values from all six of those variations were included in the test.

SGP – This is the popular Standings Gain Points system explained in Art McGee’s How to Value Players for Rotisserie Baseball. This is the first system I ever used back around 2001 before stumbling upon the REP system and switching. The idea is to value players based on how many of a statistic is needed to gain a point in the standings in your league, which are termed SGP denominators. These denominators will vary by league. Adjustments for position can be made in this system as well.

We tested two sets of SGP values. One set used the denominators presented in Larry Schechter’s Winning Fantasy Baseball. However, the denominators were based on 14-man rosters with two catchers, which may differ from a set of denominators calculated in leagues with 13-man rosters. The second set of SGP values used significantly different denominators.

Both sets of SGP values were calculated by Tony Fox (@tfoxy83), author for Shandler Park, who got in touch with me via Twitter and volunteered to do the work. The second set of SGPs used denominators from his league, which uses a smaller roster size.

The Caveats

Position eligibility — Something I hadn’t thought about when we started the process ended up being a real issue. Heading into your draft/auction, it’s easy to determine a player’s position eligibility since you have your league rules and know exactly how many games are required to be played the previous year to qualify. But what about when calculating end of season values? If a player gained eligibility at a shallower position, say catcher, does he now get valued as a catcher? What if he didn’t gain eligibility until the last day of the season? Is there a cut-off date by which you should ignore new position eligibility? Even if you decide on a date, who wants to go through game logs to determine exactly when the new eligibility was gained?!

I decided to make things simple for myself and stuck with pre-season eligibility only. Unfortunately, I’m not sure how other systems handled this situation, which means that a player could be getting valued at different positions. However, Tony Fox confirmed to me that he used pre-season position eligibility as well, so at the very least, we know the SGP values use the same eligibility as the Jason z-scores and REP systems..

User quirks — In my explanation of the REP system, I mentioned that given how the system works, two people trying to run values using the same stats and position eligibility could still calculate different values. It is also possible that both the z-score and SGP methods could yield different values if the user doesn’t follow an identical process. Both Zach and Jason used the z-score method, but I could tell you right now that they did not yield identical values.

Sample Size — Our test consisted of 50 leagues. A sample of 50 plate appearances or innings pitched is tiny. Is 50 leagues too small? I don’t know, perhaps it is. We also only tested it on 2014 leagues. Maybe something quirky happened last season that benefited one valuation system over another. Probably not, but who knows.

A Big Thank You To…

Jason Bulay, who did the heavy lifting by assembling lineups from all hitters drafted for a whopping 600 teams, putting league standings together and matching the dollar values from 13 systems to the players on each team. He’ll be hiking the Pacific Crest Trail for five months beginning in mid-April and he and his wife will be blogging about their journey.

Rudy Gamble of the always informative and entertaining Razzball for providing the team rosters, of which we could not do this test without.

Tony Fox for calculating two sets of SGP values and being just as wonderful in person when I met him at the Baseball HQ First Pitch Forum in NJ.

Tomorrow, I will unveil the results.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG