The WAR Discussion

Snodgrass'Muff · Jun 13, 2015

HillysLastWalk said:
I would love to get JBJ up here, he has had a really good season (esp bringing down his K rate), but I don't think Mookie should go down. He is fourth on the team with a 1.1 fWAR with a still very low 250 BABIP. Yes, there has been some slight struggles but he still remains unlucky. And the times he does struggle, they aren't prolonged or there isn't the concern that it will deepen like JBJ and X last year.

There are a lot of reasons why Mookie shouldn't be sent down. fWAR is not one of them.

Snodgrass'Muff · Jun 13, 2015

HillysLastWalk said:
Yes Snod. You have belabored this point. It was just a way of saying that he has brought value to this team without typing a novel.

I've belabored the point because people keep doing it. His fWAR over less than half of a season does not say he's brought value to the team. It says you are being lazy in trying to make your point. You don't need to write a novel to point out that he's looked good in the field and that there are several indications that he's suffered some bad luck at the plate. In fact, I just did it in one sentence.

Snodgrass'Muff · Jun 13, 2015

HillysLastWalk said:
Seriously what are you trying to accomplsih. Just stop. Please. Theres no reason to be confrontational, which you are, over some perceived slight, which isnt as "lazy" as you are making it see. Stop. You have accomplished nothing and aren't adding value. Stop.

You have acknowledged the flaws in fWAR and UZR in the past, and yet you continue to misuse it. Laziness was the kindest reason I could think of for it. Mookie's fWAR is primarily due to the defensive component which needs 3 seasons to stabilize. You are over two and a half seasons of data short of it being reliable for anything, never mind proving anything about his value so far this year. If you don't like being called out for misusing stats, stop misusing them. If you want to keep stamping your feet, be my guest. I've said my piece.

Snodgrass'Muff · Jun 13, 2015

HillysLastWalk said:
Listen, you are wrong. Stop. fWAR by its very nature is inaccurate. Here is a quote from the fangraphs site:

That is EXACTLY what I was doing in my original post. I was separating Mookie from the entirety of the roster to show that he is bubbling to the top as one of the more valuable players on this team. Thats the underlying assumption when using this stat. I am not misusing it. But hey, if you really want to PM me with articles that really, really prove your point. Im all ears. I am not infallible. Otherwise, dont bug me again.

This perceived offense is minor compared to your attitude and the standard, let me pop into a thread, derail it, and when you said "your peace" tell everyone how you are now ending the conversation. Its tired and lame. Stop.

You mean this article? http://www.fangraphs.com/library/misc/war/

Funny, you left out the part immediately after what you quoted that said:

For example, a player that has been worth 6.4 WAR and a player that has been worth 6.1 WAR over the course of a season cannot be distinguished from one another using WAR. It is simply too close for this particular tool to tell them apart. WAR can tell you that these two players are likely about equal in value, but you need to dig deeper to separate them.

However, a 6.4 WAR player and a 4.1 WAR player are different enough that you can have a high level of confidence that the first player has been more valuable to their team over the given season.

So when we're talking about an fWAR of 1.1 total, the difference between him and a player at, say 0.1, has error bars large enough that we can't trust that the difference is meaningful. And this is assuming that Fangraphs' write up of their own stat is both accurate and unbiased.

Of course, that was followed with this:

For position players, the largest point of contention comes in measuring defense and estimating the positional adjustment. Our measures of both are more uncertain than our measures of offense, so players who get a good amount of their value through their defensive ratings likely have more uncertainty around their WAR value than players who have defensive value closer to average. This does not mean that WAR is wrong or biased, but rather that it is not yet capable of perfect accuracy and should be used as such.

Mookie's fWAR is mostly driven by his perceived defensive value. That defensive component is less than 1/6 the size it needs to be before it stabilizes. So even if Mookie's fWAR was large enough that it created the separation the write up you quoted is asking for before determining value relative to other players, it falls far short of the needed sample for that number to have any real meaning in the first place.

You are misusing it. Mookie's fWAR relative to the rest of the roster does not mean anything at this point in the season.

crystalline · Jun 13, 2015

Snodgrass said:
You mean this article? http://www.fangraphs.com/library/misc/war/

Funny, you left out the part immediately after what you quoted that said:

So when we're talking about an fWAR of 1.1 total, the difference between him and a player at, say 0.1, has error bars large enough that we can't trust that the difference is meaningful. And this is assuming that Fangraphs' write up of their own stat is both accurate and unbiased.

Of course, that was followed with this:

Mookie's fWAR is mostly driven by his perceived defensive value. That defensive component is less than 1/6 the size it needs to be before it stabilizes. So even if Mookie's fWAR was large enough that it created the separation the write up you quoted is asking for before determining value relative to other players, it falls far short of the needed sample for that number to have any real meaning in the first place.

You are misusing it. Mookie's fWAR relative to the rest of the roster does not mean anything at this point in the season.

If only there was some concept of an interval ... that expressed the confidence ...one should have in an estimate.

Then Fangraphs could have a precise way to express which fWAR values were different than 6.2, and which were statistically identical.

(Hilly, Snod is right. If an estimate is so variable over a short sample that you cannot be sure if an effect is real, then by definition that estimate is useless and should be ignored. It doesn't have residual value except for certain questions about probabilities and so most people just ignore statistically not significant effects - like whether a player's fWAR over half a season is in the top half of all players on the team)

radsoxfan · Jun 14, 2015

crystalline said:
(Hilly, Snod is right. If an estimate is so variable over a short sample that you cannot be sure if an effect is real, then by definition that estimate is useless and should be ignored. It doesn't have residual value except for certain questions about probabilities and so most people just ignore statistically not significant effects - like whether a player's fWAR over half a season is in the top half of all players on the team)

I know this probably doesn't belong in the Mookie thread (and has been touched on before), but since the WAR police are out in full force lately, I'm going to quickly (sort of) address this. I suppose this is mostly a semantic argument since we can have different definitions of the term "useless", but I for one do not find WAR, even at this early juncture, completely useless. I have no problem with looking at data with significant uncertainly, and data that lacks precision, as long as you acknowledge it as such.

The top 2 in fWAR this season are Bryce Harper (4.6) and Mike Trout (3.8). The bottom 2 are Matt Joyce (-1.2) and Melky Cabrera (-0.9). I think most would agree this is not random. The data lacks precision, especially defensively, but to me the numbers are not entirely useless. Even the defensive component (which many are fond of quoting the 3 year stabilization) doesn't have zero value to me. It gains significance as the data points increase, but it does have zero value at 1 or 2 years, then all of a sudden have significant value at 3 years.

Do we know the magnitude of Hanley's awfulness because of his UZR? No. Is it random his UZR is awful? I doubt it. The error bars are just large early on. But especially when they match the scouting description, I don't think saying they are more likely than not at least pointing in the right direction is far-fetched.

The Red Sox currently have 4 offensive players at or below replacement level by fWAR. Hanley Ramirez (-0.4), David Ortiz (-0.3), Mike Napoli (-0.1), Pablo Sandoval (0.0). Is that the exact order of who has been the worst player on the team? Perhaps not. Is there a decent chance one (or more) of those guys has actually been a bit better than replacement level? Sure. Is there at least a small chance Mookie Betts (1.0) has actually been worse than one of them? Sure.

On the flip side…. Is it "more probable than not" (bad phrase around here I know) that those 4 have all been very bad this year? I think so. Has Mookie probably been better than replacement level this year? I'd be willing to say that. Have Pedroia (1.9) and Xander (1.5) been 2 of our best players this year? I think so. Do I freak out if someone wants to throw a mid season fWAR out there as a quick and dirty estimate about production to this point? Nope.

I think most people can understand the precision those numbers lack. Just because the error bars for Pedroia's and Pablo's season to date might overlap, that doesn't mean it's so likely that Pablo has been than Pedroia that we can't even talk about it.

I'm not sure we need to make these discussions into criminal trial proceedings or only use numbers that would be accepted by a scientific journal as our threshold. I, for one, have no problem talking about numbers with significant uncertainty in the context of a Red Sox message board.

End of rant….

Mookie, get well soon.

crystalline · Jun 14, 2015

radsoxfan said:
Even the defensive component (which many are fond of quoting the 3 year stabilization) doesn't have zero value to me. It gains significance as the data points increase, but it does have zero value at 1 or 2 years, then all of a sudden have significant value at 3 years. The error bars are just large early on.

I was a bit snide above, apologies. Yes, you are hitting on a very very important concept. Why don't we just give confidence intervals for fWAR to go along with the value? Then we would know how much value the number had. If you don't like CIs, standard errors or the like are fine, but CIs are easiest to interpret.

radsoxfan said:
But especially when they match the scouting description, I don't think saying they are more likely than not at least pointing in the right direction is far-fetched.

Right. With a confidence interval, your point has an exact interpretation. "More likely than not at least pointing in the right direction?" can be translated to "Does the 95% confidence interval for Mookie's fWAR include 0, or not?"

radsoxfan · Jun 14, 2015

crystalline said:
Right. With a confidence interval, your point has an exact interpretation. "More likely than not at least pointing in the right direction?" can be translated to "Does the 95% confidence interval for Mookie's fWAR include 0, or not?"

All true. And those of us who often read or write scientific literature have this mindset sort of ingrained in us. But why complicate it or get worked up about the 95% number in this context?

Pedroia has likely been one of our best players. There a good chance Mookie has provided some value to the team. fWAR suggests Hanley/Nap/Ortiz/Pablo have sucked. Those statements are OK too, even if its hard to come up with an exact 95% confidence interval. There are obviously many other ways to make those points, some of which I agree would be better in certain instances. But it's not the end of the world to use WAR at this juncture either.

It's not a binary decision tree such that you either absolutely can, or absolutely cannot, use WAR. The number gains more clarity with more data points to be sure, but we don't need to pretend it's complete darkness up until some magical moment in time. Sometimes a very blurry picture can still tell a story. If that blurry picture is misleading sometimes, I'm OK with that. We're not curing cancer here. In at least some instances, I'd still rather see the picture, (even with its flaws) than just ignore it.

Snodgrass'Muff · Jun 23, 2015

ivanvamp said:
Mookie now up to .277/.329/.453/.782, 115 ops+, and 2.8 WAR. He's at 2.8 WAR and we aren't yet halfway through the season.

A lot of that is the defensive component which is still 2.5 seasons away from stabilizing. His 2.8 WAR looks nice, but it doesn't mean nearly as much as you seem to think.

He looks much better at the plate though, and he's been a solid defensive center fielder by any estimation. He's no JBJ out there, but who is? It's nice to see him coming around after a bit of an adjustment period. If he, Xander and Swihart can all keep improving, the struggles of our declining veterans won't sting quite as much, but I fear the team is too far back for it to matter much this season. That said, the long term future still looks pretty bright, IMO.

ivanvamp · Jun 23, 2015

Snodgrass'Muff said:
A lot of that is the defensive component which is still 2.5 seasons away from stabilizing. His 2.8 WAR looks nice, but it doesn't mean nearly as much as you seem to think.

He looks much better at the plate though, and he's been a solid defensive center fielder by any estimation. He's no JBJ out there, but who is? It's nice to see him coming around after a bit of an adjustment period. If he, Xander and Swihart can all keep improving, the struggles of our declining veterans won't sting quite as much, but I fear the team is too far back for it to matter much this season. That said, the long term future still looks pretty bright, IMO.

According to b-ref, 2.1 of his 2.8 WAR are from offense, 0.9 are from defense, which means he must have a -0.2 WAR on the base paths.

He's 9th in MLB in ops for qualified CF. 6th in MLB in WAR for all CF (qualified or not).

I cannot think of a single reason to not look at this and be very, very pleased.

Snodgrass'Muff · Jun 23, 2015

ivanvamp said:
I cannot think of a single reason to not look at this and be very, very pleased.

I didn't argue that we shouldn't be very very pleased with Betts. I just pointed out that there were better ways to make the your case. Mookie is having a very nice season and we should all be very encouraged about his prospects for the future, but how we make our points does matter.

czar · Jun 23, 2015

Snodgrass'Muff said:
A lot of that is the defensive component which is still 2.5 seasons away from stabilizing. His 2.8 WAR looks nice, but it doesn't mean nearly as much as you seem to think.

He looks much better at the plate though, and he's been a solid defensive center fielder by any estimation. He's no JBJ out there, but who is? It's nice to see him coming around after a bit of an adjustment period. If he, Xander and Swihart can all keep improving, the struggles of our declining veterans won't sting quite as much, but I fear the team is too far back for it to matter much this season. That said, the long term future still looks pretty bright, IMO.

Not sure I agree with his WAR not meaning a lot. 2/3 of his fWAR is coming from the offense side of the ledger (looks to be similar for bWAR). He owns a .289 BABIP and I have him at a .310-.320 xBABIP.

He has the 8th best wRC+ among qualified MLB CF and I think there's easily enough upside in the BABIP to get him to the top 5.

Even if he was replacement level defensively (if you want to make the argument that you can't really deal with defensive component over SSS), he still projects out to ~4 fWAR this year.

absintheofmalaise · Jun 23, 2015

I'll break this out when I get back to a computer

StupendousMan · Jun 23, 2015

We should be careful in this discussion to separate two uses of WAR (or other statistics). The "summative" aspect uses WAR to judge "how well did a player do during some period of time?" Even in small samples, there may be some point to summative judgements. To make an extreme example, in one game, player A may have three doubles and a fly-ball out, while player B may have 3 strikeouts and a grounder to third. It would be pretty accurate to say that player A "did better" during that single game, and WAR would agree.

On the other hand, for "predicative" purposes, one needs large samples of performance. Yes, player A had a higher WAR value for that single game than player B .... but few people would therefore expect player A to hit better for the rest of the season, based solely on that evidence. It is widely accepted that in order to predict accurately how well a player will perform over the course of a season, one needs a sample of past performance which is of order one season, if not larger.

Please keep this in mind as the thread grows.

Snodgrass'Muff · Jun 23, 2015

What you are describing in your examples, SM, would be better captured by oWAR and citing oWAR isn't likely to inspire any disagreement as it's based on statistics that are much better at working with smaller samples. When you lump defense into the mix, you dilute the value of that estimation. At the very least, it people are going to insist on using WAR as a catch all number to peg value, regress it properly first. Putting at least that much effort into it is going to go a long way.

DJnVa · Jun 23, 2015

ivanvamp said:
According to b-ref, 2.1 of his 2.8 WAR are from offense, 0.9 are from defense, which means he must have a -0.2 WAR on the base paths.

-0.2 on the bases? I haven't seen every game, but his SB% is very good, is there something I've missed with the base running?

grimshaw · Jun 23, 2015

DrewDawg said:
-0.2 on the bases? I haven't seen every game, but his SB% is very good, is there something I've missed with the base running?

I don't think you're missing anything and not sure how B-Ref differs. Fangraphs has him at 3.2 BsR (baserunning runs above average) which incorporates (quoting directly from Fangraphs):

"1) On a hit, advancing an extra base, not advancing an extra base, or getting thrown out trying to advance an extra base, as long as no other base runner is blocking an advance.

2) A batter getting thrown out trying to advance an extra base on a hit (if he successfully does, we don’t know it, as he is simply awarded a double, for example, on a usual single where he advances an extra base).

3) On a hit, the batter advancing, not advancing, or getting thrown out when a runner is safe or out advancing an extra base.

4) Trailing runners advancing, not advancing or getting thrown out when a leading runner is safe or out trying to advance an extra base on a hit or an out. This is basically lumped together with #1 above.

5) Runners trying to advance on fly ball outs – i.e. tagging up.

6) As mentioned above, on ground balls to the infield, runners on first staying out of the force or DP at second base, whether the batter is out or is safe on a FC.

7) Also as mentioned above, a runner on second advancing or not (or getting thrown out) on a ground ball hit to SS or 3B.
Runners on third base advancing, not advancing, or getting thrown out at home on a ground ball are not considered (on air balls they are). Runner advances or outs on WP or PB are not considered either.*

UBR is included in WAR. It is added with wSB and wGDP to make up the “BsR” column in FanGraphs player profiles."

ivanvamp · Jun 23, 2015

DrewDawg said:
-0.2 on the bases? I haven't seen every game, but his SB% is very good, is there something I've missed with the base running?

Not that I know of. I'm just doing the math. Total WAR is the sum of offensive WAR, defensive WAR, and base running WAR.

B-Ref has him at 2.8 total WAR:

2.1 offensive WAR
0.9 defensive WAR

That comes to 3.0 total WAR. So the only way to get to 2.8 total WAR is for the last component (base running) to be -0.2 WAR. Even though B-Ref doesn't have a column for that.

ivanvamp · Jun 23, 2015

HillysLastWalk said:
Maybe it's because I am quickly glossing over explanations found here, but I find it odd that it's not clear in how they come up with WAR from the other components on that page.

But, this may be part of the puzzle:

Yeah, then I have no idea. I just know that Betts is a really good ballplayer.

absintheofmalaise · Jun 23, 2015

ivanvamp said:
Not that I know of. I'm just doing the math. Total WAR is the sum of offensive WAR, defensive WAR, and base running WAR.

B-Ref has him at 2.8 total WAR:

2.1 offensive WAR
0.9 defensive WAR

That comes to 3.0 total WAR. So the only way to get to 2.8 total WAR is for the last component (base running) to be -0.2 WAR. Even though B-Ref doesn't have a column for that.

FG WAR is Offense(Batting plus Baserunning) + Defense(Fielding + Positional Adjustment*) + League Adjustment + Replacement Runs(20 per 600 PAs) = Runs Above Replacement. You then convert RAR to WAR. Around 10 runs = 1 win.

*positional adjustments: + 12.5 C, + 7.5 SS, + 2.5 2B/3B/CF, - 7.5 RF/LF, - 12.5 1B, -17.5 DH

It's also important to note that UZR was not around prior to 2002.
Here is the BRef info on WAR.

Snodgrass'Muff · Jun 23, 2015

HillysLastWalk said:
For the benefit of this thread (and me!), how does one regress fWAR or bWAR properly, before citing it?

There is a two part series at Baseball Think Factory about UZR, though the principle should apply to defensive metrics in general. The specific question you are asking is addressed in comment 7 of part 2 by MGL, but it's worth reading the entire two part series and the comments as MGL and Tom Tango drop a ton of great information in there as well.

Part 1: http://www.baseballthinkfactory.org/primate_studies/discussion/lichtman_2003-03-14_0/
Part 2: http://www.baseballthinkfactory.org/primate_studies/discussion/lichtman_2003-03-21_0/

For example, if I were to use a (full-time) one-year sample to estimate a player's "ability" I would probably regress on the order of 50% (at least). So Bordick's one year UZR runs of 19 might correspond to a "true" UZR runs (ability) of 9 or 10 (in 116 "games), which is around 12-14 per 162. Of course, if you want to know more about who is "better" in terms of ability or projection for this year, you would look at multiple years to increase sample size. In 01, Bordick was +5 per 162 and A-Rod was +6. In 00, Bordick was +8 and A-Rod was +14. Finally, in 99, Bordick was +21 and A-Rod was +1. So it looks like Bordick is indeed a great defensive shortstop, which is amazing for someone at that age (although good hands will remain more stable with age than good range, I assume). A-Rod looks (from his UZR runs) like a great defensive SS as well, or at least a very good one. As far as who would be better this year (I think Bordick retired, did he not?), it's probably a toss-up. Given Bordick's and A-Rod's age, I'd probably give the edge to A-Rod.

In using a half a season of UZR you'll want to regress that even further. Perhaps as much as 75%.

Snodgrass'Muff · Jun 23, 2015

HillysLastWalk said:
Though, again, I think for the actual usage, on a half-season's worth of data, depending on the discussion - I really don't think you have to regress it first. Not because I don't want to be perfect, I just don't think it's necessary (and worth the time) to regress it for a small point that's being made on this message board.

This is why I think the backlash against the proper use of WAR is more about laziness than anything else. You want a nice, neat, catch all stat you can toss out with minimal effort to say your piece in a thread without having to put much effort into it. That's not improving discussion, that's lowering its quality.

crystalline · Jun 23, 2015

Hilly,

If I want to say that Mookie Betts is one of the top 5 every day players on this team and cite his WAR - what did I get wrong? His WAR currently stands at the value printed on the Web Page (so that's correct), and he is one of the top 5 every day players on this team. If we want to elaborate ... sure knock yourself out. I think this is the disconnect - in the sake of brevity, and having to do some quick math just to state WAR - I think it's OK. Or fangraphs and b-ref need to start adding this regressed statistic to their page!

Your question isn't that complicated - but this whole discussion is not about WAR, it's about statistical "stabilization" or certainty. I don't really understand why people talk about "regressed statistics" with respect to baseball. The whole rest of the world doesn't use "regression to the mean", they just cite a measure of variability along with each number. Different fields have a convention of using standard errors (economics), or SEM, or confidence intervals (most generally) for the measure of variability.

In your posts above you have just documented the reason why no number or statistic has any meaning without a measure of variability along with it.

Your question can be framed as "At this point in the season, is Mookie's WAR significantly different from zero"? There are basically two things that matter: (1) the magnitude of the difference (this is why we say that a guy whose gone 6 for 6 with 5 HR against a pitcher is probably likely to have a special advantage) and (2) the variability of the underlying statistic and the number of games. When people say number A (say ERA) stabilizes faster than number B (WAR), they mean that the variability of the underlying statistic is smaller. All numbers will get more precise as time goes on and samples are accumulated.

It's not hard to do this right. Just forget about regressing something to something else, and start giving CIs on WAR. Then, we would know exactly how many games need to pass before we can use WAR to talk about Betts.

fake edit: we had this discussion before, but URI deleted all of our posts, I think. Dang gingers.

The Talented Allen Ripley · Jun 23, 2015

crystalline said:
fake edit: we had this discussion before, but URI deleted all of our posts, I think. Dang gingers.

He's no ginger. Even we have standards.

crystalline · Jun 23, 2015

Here's a page out of a new sabermetrics book from Zimbalist that discusses variability measures (confidence intervals) and WAR:

crystalline · Jun 23, 2015

HillysLastWalk said:
Could you clarify that? What do you mean by CIs?

In the page above they say "While the creators [...] do generally recognize that the margin of error is [...] large, they do not provide standard errors or confidence intervals."

Exactly. So it's not your fault, or its not our fault even - it's the fault of the people who create WAR. Without a real standard error or CI, we're just guessing as to when WAR is valid and when it's not. I feel like having this discussion about Mookie is kind of a waste of time without those measures, as we're really just debating what we kind-of-touchy-feely-think the CI is.

Lichtman and his "three year stabilization" guideline has always been annoying (as Zimbalist also says). It's pretty useless for real work. It's interesting that Zimbalist thinks these guys do this on purpose so you can't really nail down how good their measures are.

iayork · Jun 23, 2015

crystalline said:
It's not hard to do this right. Just forget about regressing something to something else, and start giving CIs on WAR. Then, we would know exactly how many games need to pass before we can use WAR to talk about Betts.

Yeah, exactly. The whole "regression" conversation is crap. It's not even using the term, let alone the concept, correctly. It gives the wrong message. Confidence intervals are easy to understand and actually give information.

Can you take a season, or a half-season, of UZR and use it to make a prediction? Hell yes. Most of the time it will even be a pretty decent prediction. How often will it be a decent prediction? That's a question people care about, and "regression" doesn't tell you that, while "confidence intervals" will.

Savin Hillbilly · Jun 24, 2015

iayork said:
Yeah, exactly. The whole "regression" conversation is crap. It's not even using the term, let alone the concept, correctly. It gives the wrong message. Confidence intervals are easy to understand and actually give information.

I just went to the Wikipedia page for "Confidence interval." Then I went to the Talk page for "Confidence interval." All I can say after that little adventure is that if the bolded is true, there sure are a lot of stupid people in the world, apparently including some statisticians.

Can you recommend an internet source that explains this concept for non-statisticians in language that is both reasonably comprehensible and technically accurate? 'Cause it sure ain't Wikipedia. I feel like I have a rough intuitive understanding of it, but that feeling could be wildly wrong, and if we're going to start using CI's around here routinely I'd like to know what the hell they mean.

Buffalo Head · Jun 24, 2015

That the title of this thread is not "WAR: What is it good for?" is a crime against humanity.

absintheofmalaise · Jun 24, 2015

Buffalo Head said:
That the title of this thread is not "WAR: What is it good for?" is a crime against humanity.

We had a thread with that title before. I was going for something more mundane this time.

EricFeczko · Jun 24, 2015

I don't like to speak for others, but I think Iayork and Crystalline are referring to the use of CIs in a way that many are not familiar.

Confidence intervals are used in many different ways. Simply put, confidence intervals define the range that represents our confidence that the value of a metric is true; We are confident that X is between Y and Z. For example, let's say we are interested in knowing what the career OBP is for any MLB player, we can measure career OBP for a large sample of MLB players and use the distribution of career OBPs to construct confidence intervals for the mean. 95% confidence intervals reflect the range of OBP where we are 95% confident that the mean OBP is between ___ and ___. Confidence intervals here reflect the mean for the population of MLB players. If, in a small sample size, a given player is outside this range, and if the metric is distributed normally across the population, it is more likely the player will have future sample sizes within this range/closer to the mean. This is known as regression to the mean.

You can also construct confidence intervals for an individual player. For example, if we want to know whether an individual player is likely to have a given OBP in a season, we can use the distribution of season OBPs to construct confidence intervals for the given player. Here, 95% confidence intervals reflects the range of OBP where we are 95% confident that any OBP over a full season for the given player will fall in that range. In other words, confidence intervals here reflect the individual MLB player. Here, it is important to know whether the sample size used to calculate OBP is reliable, unreliable sample sizes reduce our ability to interpret these confidence intervals. To answer such a question, one can (and has) performed reliability analyses to determine the minimum number of plate appearances which can be used to calculate the metric.

What Iayork and Crystalline are referring to is constructing confidence intervals for the metric itself. WAR is not a measured metric like OBP. With OBP, you count the number of plate appearances and number of non-outs and take the ratio. With WAR, you are using a regression model (presumably, or some kind of modelling procedure if not regression) to infer the relationship between events on the field and runs, and runs and wins. All models contain some degree of error, and you can use this error to construct confidence intervals for the statistic itself. Here, 95% confidence intervals means that for a given WAR we are 95% confident that the true number of wins is between ___ and ___. In other words, confidence intervals here reflect the likelihood that the metric means what we think it means. If this range is really big (e.g. a full win), then our ability to interpret what the metric means is limited. As an aside, this sounds like Eric Van's biggest problem; he assumed that his regression models were perfect, instead of accounting for the error present in his models.

There are different methods for constructing confidence intervals, depending on whether you want to use them to measure a population (e.g. using standard deviations on normally distributed data), for an individual (e.g. using resampling approaches), or for a metric derived from a model (e.g. via regression). These methods are not always to specific to a type of question (e.g. resampling can be used to assess an individual or a population).

Savin Hillbilly said:
I just went to the Wikipedia page for "Confidence interval." Then I went to the Talk page for "Confidence interval." All I can say after that little adventure is that if the bolded is true, there sure are a lot of stupid people in the world, apparently including some statisticians.

Can you recommend an internet source that explains this concept for non-statisticians in language that is both reasonably comprehensible and technically accurate? 'Cause it sure ain't Wikipedia. I feel like I have a rough intuitive understanding of it, but that feeling could be wildly wrong, and if we're going to start using CI's around here routinely I'd like to know what the hell they mean.

I've given an introductory workshop class on working with distributions. However, it is really big. If you want me to send it to you just PM me.

EDIT: It is important to separate the two primary issues of whether WAR/UZR are reliable, and whether they are valid. These questions are not mutually dependent upon one another. You can have a reliable metric that is invalid and vice versa.

kieckeredinthehead · Jun 24, 2015

You teach stat workshops and that's the CI definition you use? The 95% confidence interval represents a spread such that if you replayed a player's season 100 times, 95 of those times the estimated confidence intervals would encompass the true WAR.

radsoxfan · Jun 24, 2015

crystalline said:
fake edit: we had this discussion before, but URI deleted all of our posts, I think. Dang gingers.

We did... Apparently our posts were so distasteful and unrelated to The Mookie Thread that they were not only removed but tossed into the ether rather than placed in another thread (as far as I know).

Bummer, and I don't have time to get back into the details of the discussion we had. But suffice it to say I think WAR, while obviously not perfect, still has value even at this stage of the season.

crystalline · Jun 24, 2015

kieckeredinthehead said:
You teach stat workshops and that's the CI definition you use? The 95% confidence interval represents a spread such that if you replayed a player's season 100 times, 95 of those times the estimated confidence intervals would encompass the true WAR.

+1

That's the simple version.
(The mathematically correct version is more complicated, but Bayesian quibbling about the meaning of CIs is a distinction without a practical difference for this purpose. Mathematicians are so anal.)

For Mookie, here's what the CI means.
Mookie has some innate WAR value. If he were to play a billion games, he would produce a WAR commensurate with his skill. That's his true WAR, call it 'WAR_skill". Now, we can't make him play a billion games and measure that precisely, we have to estimate it from data. Given the data we have so far in the season, we can estimate this true underlying WAR. Let's just call the data-based estimate his "WAR", based on this season up to today.

The true value, WAR_skill, and the data derived value WAR, differ. They differ because data is not perfect. CIs measure that difference.

A 95% confidence interval says "Given Mookie's play up to today, what is the range we'd find his measured WAR in 95 times, if we played the exact same season 100 times?"

So lets say Mookie's WAR today is 4.0 with a 95% CI of -1.0 to 7.0. If he played 100 identical 2015 seasons up to today, we would measure WARs between -1 and 7, and his true WAR_skill can be almost anything. In that case, you can say nothing. Our estimate of his true skill is so unreliable his true WAR skill may well be negative! But if the 95% CI is 3.0 to 5.0, we know with high certainty his true WAR skill is near 4, and he is one of the better players in the league.

Edit: correct imprecise language

Savin Hillbilly · Jun 24, 2015

Thanks, crystalline. That's more or less what I thought.

And so the second question is, how do we determine what the 95% CI is for any given WAR number? Seems like a pretty complicated question since the WAR number is an amalgamation of multiple pieces of data measured in different ways, right?

iayork · Jun 24, 2015

HillysLastWalk said:
Meanwhile we have iayork stating that UZR (so by extension fWAR) can be useful with a half-season worth of data.

That's not what I said, and it's not what I meant. Just because you can make a prediction in aggregate with a half-season's worth of data doesn't mean anything about how useful it is for an individual. I think this is something that you've been confused about for a while -- the differences between descriptive and prescriptive use of statistics, the difference between statistical significance and functional significance, and so on.

In case it's not clear, I absolutely agree with Snod that you're not using WAR correctly, to the point that you're being actively misleading. The reason Snod is the only one who's admonishing you is probably because everyone else who understands the concept is sick of the subject, and is just ignoring you when you misuse the terms repeatedly even after you've been corrected. In other words, Snod is trying to help you, and your hostility to him is not doing you any favors.

If you want to get a better sense for what WAR can and cannot do, there's really no substitute for just messing with numbers. You don't have to get fancy; a spreadsheet like Google Sheets, and some scatterplots, is plenty. Grab some values, and just poke around a little. Plot one year's WAR for a bunch of players on the X axis, and the following year's WAR on the Y; you'll see a roughly diagonal line, with a lot of outliers. What's interesting about those outliers? What about looking two or three or four years apart? What about UZR? And so on; there are a million easy things to do, and you'll come away with a much better understanding of what the number are trying to tell you, and what they won't tell you.

Sampo Gida · Jun 24, 2015

Pretty much every estimate or adjustment has some uncertainty. In the real world some effort is made to estimate the uncertainty and report it. Because WAR has so many components and adjustments the uncertainty can perhaps be a bit greater than any single metric. Park adjustments that do not consider a batters handedness or batted ball profiles probably introduce a larger uncertainty than defensive stats . Sure, the defensive component is a SSS, but so isnt 1/2 year of offense a SSS, just not as much of a SSS, some components about 1/2 as much as defense. For example, I believe the consensus is you need 910 AB for BA to stabilize (http://www.fangraphs.com/library/principles/sample-size/)

Defense is a relatively small part of most players overall WAR. So maybe a player has a 10 UZR and his true talent level is 5 UZR. Thats a 0.5 WAR difference. Mookies has a 3.1 UZR at this point, so his WAR is probably understated or right about where it should be if you assume all the other components are accurate. MGL had said 1 yr UZR should be regressed 50%. However, neither FG nor BR does any regression, probably assuming that while defensive stats of less than 3 years are not representative of a players true value, they may be a measure what the player has done , much like offensive stats of less than 1 yr do not represent a players true talent level but do indicate what a player has done (the difference being luck and/or random variance).

So it also depends on what you think WAR should be measuring. If you think it should be a measure of a players true talent level its probably not going to do much for you. If its just a measure of what a player has done in a given season, its more useful.

The biggest uncertainty in WAR IMO is actually the position adjustments. Those adjustments are the ones that tell us Hanley should be league average LF'er because he played SS and that 15 run spread in position adjustments between the 2 positions means Hanley will be15 run better in LF than SS. But I don't think they are too far off for CF. Its mainly an issue to me since I feel it has a middle IF bias at the expense of corner OF, 1B and DH. A recent study suggested there is room for improvement there.

http://www.hardballtimes.com/re-examining-wars-defensive-spectrum/

Alcohol&Overcalls · Jun 24, 2015

Snodgrass'Muff said:
What you are describing in your examples, SM, would be better captured by oWAR and citing oWAR isn't likely to inspire any disagreement as it's based on statistics that are much better at working with smaller samples. When you lump defense into the mix, you dilute the value of that estimation.

I think this is kind of ironic, because you're being a little loose with what you mean by "value" here - it dilutes the predictive value going forward, and even dilutes how representative the numbers are in viewing true talent level, but the estimation is still really what was measured over the relevant time period. Which means it's still the "value" over that period.

I know your white whale is eradicating small-sample UZR/dWAR values from the board, but sometimes they're all we have, and the value dilution isn't across the board - it's mostly projecting forward, right? While we're requiring full confidence intervals or caveats ... maybe you should apply that one too?

Snodgrass'Muff · Jun 24, 2015

Alcohol&Overcalls said:
I think this is kind of ironic, because you're being a little loose with what you mean by "value" here - it dilutes the predictive value going forward, and even dilutes how representative the numbers are in viewing true talent level, but the estimation is still really what was measured over the relevant time period. Which means it's still the "value" over that period.

I know your white whale is eradicating small-sample UZR/dWAR values from the board, but sometimes they're all we have, and the value dilution isn't across the board - it's mostly projecting forward, right? While we're requiring full confidence intervals or caveats ... maybe you should apply that one too?

Just because these stats are the best we have, that doesn't mean we shouldn't acknowledge and adjust for the inherent weakness in them. I probably shouldn't be quite so forgiving of oWAR, either as it includes a base running component which is just as problematic as the defensive component, if not more so. I'm more familiar with Fangraphs' UBR, though, so I don't want to dig any deeper without looking at it more closely first. I will say that even in just measuring what happened on the field, these stats are still highly subjective and have large enough error bars that I'm not comfortable trusting them in small samples. I mean, do we really believe that Brett Gardner saved 26.1 runs in left field in 2011?

The WAR Discussion

Banned

oppresses WARmongers

Banned

oppresses WARmongers

Banned

oppresses WARmongers

Banned

oppresses WARmongers

Member

Member

Member

Member

oppresses WARmongers

captain obvious

oppresses WARmongers

fanboy

Banned

too many flowers

Member

oppresses WARmongers

Dorito Dawg

Banned

Member

captain obvious

Banned

captain obvious

too many flowers

oppresses WARmongers

Banned

oppresses WARmongers

Member

Banned

Banned

holden

Member

Member

Member

Banned

loves the secret sauce

Well-Known Member

too many flowers

Member

Member

Member

Member

loves the secret sauce

Member

Member

Member

oppresses WARmongers