WAR and Team Success

Hank Scorpio

Member
SoSH Member
Apr 1, 2013
6,918
Salem, NH
With all I've been hearing about Mike Trout's contract situation lately, along with the cost of a win, I wanted to know exactly how good of a barometer WAR is when translating to team success. Would replacing a replacement level player with Mike Trout, a 10 WAR player, turn a middling 80 win team into a 90 win contender? A 90 win contender into a 100 win juggernaut? Hell, for all the talent on the Angels, are they really a 68 win team without Mike Trout?
 
If a win is worth $5M, is Mike Trout worth $50M a season? And was Shane Victorino's (6.2 WAR) 2013 regular season worth $31M? And is Shane Victorino really only one win less valuable than Miguel Cabrera (7.2 WAR)?
 
And who is this replacement player? How bad is a team of 25 replacement players?
 
Using last season as a barometer, a "replacement team" winds up with a record of approximately 47-115.
 
I wanted to see if I could come up with a Pythag W-L-esque measurement of how a team's WAR correlates with their final results, so I did the following:
 
- Calculated each team's total WAR by adding the WAR of all position players (including offensive and defensive considerations) and the WAR of all pitchers.
 
- Determined that an average team gains about 20 wins from it's offense and roughly 13.7 wins from it's pitching. A replacement team should win 47.4 games... these three figures add up to 81 wins, a .500 team.
 
- Using pythag W-L records for all 30 teams, again the replacement team came out to 47.5 wins.
 
- The range replacement teams, using 1 team sample sizes, was 39.5 to 55.6. Using pythag W-L records, the range was 42.4 to 59.6 (St. Louis, seemed to be an outlier). Without St. Louis, the high end of the range dropped to 52.6
 
- Theoretically, adding a team's total WAR to 47.5 should produce a number in line with their actual and pythag win totals.
 
In the chart below:
 
Off.WAR - Total WAR generated by offensive and defensive plays (including those by pitchers).
Pit.WAR - Total WAR generated by pitching
Team WAR - Off. WAR + Pit. WAR
W - Actual 2013 Wins
W-WAR: Wins minus Team WAR (replacement team based only on the team in question, sample size of 1)
P - Pythag Wins
P - WAR: Pythag Wins minus Team WAR (pythag replacement team based only on the team in question, sample size of 1)
WAR + 47.5: How many wins a team should have produced based on their Team WAR
 

 
And below, the standings based on what a team "should have" won, based on WAR. Also included their actual and pythag W-L records:
 

 
For the most part, it remains true to what the standings actually were, but there are a few interesting things.
 
- The Yankees should have been a last place team.
 
- St. Louis under-performed their Pythag by four games, but over-performed their WAR by eight games. By WAR alone, they should have been the second wild card.
 
- Detroit under-perfomed significantly, while Cleveland over-performed modestly. The AL Central was close, but shouldn't have been.
 
- San Francisco had the worst pitching in baseball, based on WAR.
 
- The offense and defense combination put out by the Red Sox blew every other team away. The next closest team was the Dodgers, and they were 9.5 WAR behind.
 
- Overall, Detroit was the only team close to Boston in total WAR, but largely for different reasons. The Red Sox were very offense heavy, while the Tigers were more balanced, although heavier in pitching.
 
- Two of the three worst teams in baseball had pitching that was well above average, but their offense was absolutely putrid.
 
*WAR values are taken from baseball-reference.com - and again, when I say "Off. WAR", it's actually a combination of offensive and defensive WAR. Sorry if that is misleading.
 

Snodgrass'Muff

oppresses WARmongers
SoSH Member
Mar 11, 2008
27,644
Roanoke, VA
One of the areas a look like this is going to get a little messy, and this is likely contributing to the gaps we see between Pythag and WAR wins, is that the defensive component of WAR still requires around three seasons of data to stabilize and we're still only getting one.  So there is bound to be some skewing going on that is affecting the total wins each team theoretically earned.  For example, Shane Victorino was worth 2.2 wins defensively according to b-r.com.  Fangraphs had him at 2.4.  That's 22 and 24 runs respectively in the field.  He's an excellent defender, but there's a pretty good chance he wasn't actually worth an average of 23 runs out there.
 

absintheofmalaise

too many flowers
Dope
SoSH Member
Mar 16, 2005
23,335
The gran facenda
You also need to look at how FG and B-ref calculate WAR. Especially for pitchers. FG uses FIP and innings pitched. B-Ref uses Runs Allowed, so they have team defense in their calculations. Here is a pretty detailed explanation of B-Ref pitcher WAR.
 
How do your numbers using BRef compare to the FG numbers?
 

Hank Scorpio

Member
SoSH Member
Apr 1, 2013
6,918
Salem, NH
For FanGraphs:
 

 
Team Standings
 

 
*I'll edit this and post them in order by wins later - the numbers are right, but the team orders reflect WAR projections for 2014. Oops.
 
Quick graphs to show correlation between WAR and Actual Wins, for the hell of it.
 

 
 

Sampo Gida

Member
SoSH Member
Aug 7, 2010
5,044
Snodgrass'Muff said:
One of the areas a look like this is going to get a little messy, and this is likely contributing to the gaps we see between Pythag and WAR wins, is that the defensive component of WAR still requires around three seasons of data to stabilize and we're still only getting one.  So there is bound to be some skewing going on that is affecting the total wins each team theoretically earned.  For example, Shane Victorino was worth 2.2 wins defensively according to b-r.com.  Fangraphs had him at 2.4.  That's 22 and 24 runs respectively in the field.  He's an excellent defender, but there's a pretty good chance he wasn't actually worth an average of 23 runs out there.
 
 
Technically, by looking at team WAR you are getting about 9 player seasons of defense.  And while a given player like Victorino may be on the high side of any SSS variance (BTW, I think there is a good chance he was as good as the stats say), other players could conceivably be on the low side and in theory it should average out to some extent.   If not, at least any needed regression should be less than the 50% regression recommended by MGL   I believe WAR assigns 15% of total offense players WAR to defense so depending how much you feel needs to be regressed for SSS defense, if any, one could estimate the variance from that.
 
The other drawback to WAR on a team level is it is park adjusted, yet a team plays 50% of its games at its home park.  Of course, that does not explain WAR variance with a team like the Red Sox since one would expect the park adjustment to have reduced the total wins compared to actual, given the team was rather well constructed to take advantage of its home park.
 
In fact, according to  similar study for 2012 the Red Sox sum of player WAR was also about 9 wins more than actual, making me wonder of there is something quirky going on in the park adjustments.  I guess it could be random, but 2 years in a row suggests something more systematic.  
 
http://shutdowninning.com/7/post/2013/09/how-accurate-is-war-at-predicting-team-wins.html
 

SumnerH

Malt Liquor Picker
Dope
SoSH Member
Jul 18, 2005
31,893
Alexandria, VA
Hank Scorpio said:
 
Off.WAR - Total WAR generated by offensive and defensive plays (including those by pitchers).
Pit.WAR - Total WAR generated by pitching
 
 
*WAR values are taken from baseball-reference.com - and again, when I say "Off. WAR", it's actually a combination of offensive and defensive WAR. Sorry if that is misleading.
 
Can you add the numbers up in a more raw manner?  As a basic sanity check, isn't the league's total offensive WAR plus their total defensive WAR (including defense from position players and pitchers, as well as pitching) supposed to sum to zero?
 
http://www.baseball-reference.com/about/war_explained.shtml certainly seems to indicate that WAR should be zero-sum: "After the positional adjustment was applied we forced the major league average to be zero across the league."
 

SumnerH

Malt Liquor Picker
Dope
SoSH Member
Jul 18, 2005
31,893
Alexandria, VA
Sampo Gida said:
In fact, according to  similar study for 2012 the Red Sox sum of player WAR was also about 9 wins more than actual, making me wonder of there is something quirky going on in the park adjustments.  I guess it could be random, but 2 years in a row suggests something more systematic.  
 
http://shutdowninning.com/7/post/2013/09/how-accurate-is-war-at-predicting-team-wins.html
 
My expectation in general would be that teams with a weird park, especially one that heavily favors lefties or righties, where you can construct a team biased to the park, would tend to overachieve in both WAR and actual wins year after year.  You bias your team toward the home park and still are close to 50-50 on the road.  Conversely a home park that's very neutral should be an overall disadvantage since you can't get an edge at home and are splitting away games among too many parks to get a serious edge there (maybe there are some divisions that have enough skew to get a small advantage even when home is neutral, but it shouldn't be nearly as big as the edge for an odd home park).
 
Obviously payroll or other concerns might dominate this factor, but in a vacuum it should exist.
 
This doesn't necessarily explain the WAR-vs-actual discrepancy, but the winning factor should be "sticky"--if WAR's scaled incorrectly compared to relative wins, then you'd see a sticky discrepancy between the two as well.
 

derekson

Member
SoSH Member
Jun 26, 2010
6,224
*WAR values are taken from baseball-reference.com - and again, when I say "Off. WAR", it's actually a combination of offensive and defensive WAR. Sorry if that is misleading.
 
 
 
 
If you're adding oWAR and dWAR then you're double counting position adjustments.
 

OttoC

Member
SoSH Member
Dec 2, 2003
7,353
Hank Scorpio said:
For FanGraphs:
 

 
Team Standings
 

 

...
I used the WAR + 47.7 numbers to develop WAR Win% for each team and I also used each team's RS and RA to calculate its Pythag Win% (exponent = 1.83). Finally, I calculated each team's actual winning percentage , then used Excel's CORREL function to determine how well WAR+47.7 win% and Pythag win% correlated with the actual winning percentage. Hands down, the Pythag performed better (correl = 0.9873 to 0.1986).

*I didn't use values rounded to integers.
 

crystalline

Member
SoSH Member
Oct 12, 2009
5,771
JP
Sumner- Interesting point. Though you are assuming that a team built for a specific park would be 50-50 on the road. It could easily be the case that the changes made to improve at home will cost you on the road. I.e. if a team with a very short right field porch builds up righty power hitters who don't get on base much, they might find themselves losing more away games than average as their fly balls fall in for outs.

As you say, it's all dependent on numbers. To me it seems like the very existence of such an effect depends on running the numbers.

Edit: Further it seems that there are game theoretic factors in play. E.g. one team accumulates eighty power hitters. So teams in their division desire righty starters. So then a team in the division builds up lefty power hitters. Or sinkerballers to combat power followed by weak-stick infield defense, followed by strikeout pitchers to limit slap hitting etc. You can imagine how their factors would either amplify or attenuate the original effect. It all depends on the equilibrium point. Not to mention there are lags at each step.
 

Snodgrass'Muff

oppresses WARmongers
SoSH Member
Mar 11, 2008
27,644
Roanoke, VA
Sampo Gida said:
 
 
Technically, by looking at team WAR you are getting about 9 player seasons of defense.  And while a given player like Victorino may be on the high side of any SSS variance (BTW, I think there is a good chance he was as good as the stats say), other players could conceivably be on the low side and in theory it should average out to some extent.
 
While I don't disagree with the rest of your post, I think this feels a little like assuming the next flip of a coin has to be tails because it came up heads the last three times.  In a league as small as 30 teams, I'm not sure there is enough of a sample size to assume that a team with outliers on the high side will have enough on the low to balance out each season.  The potential for skewing still feels pretty high to me.
 

mauf

Anderson Cooper × Mr. Rogers
Moderator
SoSH Member
I compared HS's hypothetical WAR standings in the OP to Baseball Prospectus's third-order wins standings. Here are the teams where the difference between the two methods was greater than three games -- a positive number means the team was better using the OP's WAR method; a negative number means the team was better using BP third-order wins:

Kansas City (+11)
Colorado (+10)
Boston (+7)
NY Mets (+6)
NY Yankees (+5)
Arizona (+4)

Texas (-4)
Pittsburgh (-5)
St. Louis (-6)
Cincinnati (-8)

I'm not sure what to make of any of this, except that both systems agree that the Tigers were ridiculously good.