Personal findings of Improving Baseball Savant: Where to go with my findings?

Brand Name

make hers mark
Moderator
SoSH Member
Oct 6, 2010
4,397
Moving the Line
Hi everyone, kind of an odd way to start, but this topic is kind of odd, if awfully fun, too. So here goes.

I've been looking at some Baseball Savant stuff over the past six months or so, which sounds pretty normal, right? Thing is, I've realized through a bit of in-depth look at it just how basic some of the coverage it brings is, nice as it is to have. I started with rotations per minute which is harmless enough until you realize someone who lead the league in RPM didn't necessarily lead it in Rotations Per Pitch (or appearance, if that's your preference, but I much prefer pitch), due to the Bauer unit, so named after Cleveland's Trevor. Velocity plays way too much in this stat, and it also lacks a practicality issue. From my findings, someone like Chapman would have to throw roughly 150 (I have the exact number, don't want to share just yet) fastballs for said fastballs to elapse a Statcast's choice of one minute, which, well, he's not going to do in one outing, nor would any starter. After all, Bumgarner's league leading 3791 pitches last season only equated to an average of 111.5 pitches in his 34 starts.

What I found interesting is the case of Jeff Manship, who had a really strong case about his story on The Ringer, and somehow is now with the NC Dinos of the KBO league instead of the MLB. Three pitches isn't a remotely great sample, but as I tweeted out to author Ben Lindbergh, the difference between this guy and Chapman in RPP is incredible if it's over a longer duration:


This isn't where RPP stops, since we've seen RPM be a big organizational component to what certain teams (Philadelphia, Houston) value in given pitchers when better information sample exists of what is happening each time a guy actually throws a pitch.

On the other side of things, what about when the bat actually connects the bat with the ball? Given batters control their BABIP more than pitchers, this applies to them materially more so. Anyway, I was looking to improve wRC+ tonight, realizing it gets its roots from wRAA, and ultimately wOBA with seasonal and park weights in various places of respective equations. I was looking at the fact that, say, a single last season was worth weighted to be worth 0.878. Brief aside: That's a new major league-low from 1871-present, breaking last year's .881, with the all of the top (bottom) 10 coming in the last 14 seasons in wOBA. But as we know, not all singles are created equal. Some are absolute rockets off the Green Monster by a slow runner that nobody would catch, other singles could be a bunt single that maybe the third baseman waited to see if they rolled foul, but has a decent chance of being fielded to throw out, say, the opposing pitcher. But yet, those are weighted entirely the same in the backbone of how we view baseball stats, even good ones like wRC+. It makes ZERO sense to me, as it should to you.

I haven't remotely completed this yet, having just thought of it in the past 12 hours, but what about if we look at this in terms of exit velocity? Barrels aren't going to be the whole equation because those are the by far the minority of plate appearances. More specifically, among batters with 190+ Batted Ball Events (BBEs), even MLB's 2016 leader in them, Khris Davis, was only at 10.7% of his plate appearances. What about the other 89.3% of his PAs? Let alone these three specific players: Billy Burns (220 BBEs), Dee Gordon (234), or Jed Lowrie (256)? What do they have in common? They were the only three to have 190 BBEs, yet zero barrels. Given this page on Savant, we have the exit velocity and angle for literally every batter, which is outstanding but not enough, in light of the fact we can compare those two items on a graphs page with any exit velocity and launch angle of our choosing, giving us a batting average and spray map for literally every kind of ball and track 100% of PAs, instead of a fraction of them. Basically, I think we should rewrite how we view the basic fundamentals of basic batting stats, working them into the more complex sabermetrics once the core/basic/fantasy stats like BA are recorded, at least to us. Unfortunately, the natural flaw of this is we can only go back so long as Savant as been around, but thankfully I think it's here to stay as an outstanding foundation resource for better statistics. There's also the issue of shifts, to name a prominent example here as it impacts given batter resulting outs, and I had no luck finding specifics of what percentage of the time a batter faces a specific defense (infield in, shifts, no doubles, OF shaded a certain direction, etc.) which would be wonderful to have, both for the sake of the batters but also manager evaluation. If someone could steer me with further info for player (not league) specifics and percentages, I am indescribably thankful to you.

I only work three hours a week on Thursday, and only have one major project to finish up, some writing for Inside the Pylon for this Friday so the goals are to get this done sooner than later. Given the Statcast pages I've listed only cover the 2016 season by its results to my knowledge, I'd like to run this project next week (12th-18th) for literally every single batter from last season, break it into XBH, types of batted ball (LD, FB, GB), and rewrite honestly flawed (if very solid in conceptualization) metrics we run/use today and take too much for granted. BABIP especially came to mind here and was the basis for this project. Naturally, I don't think one season of player statistics would be enough, but it would serve as a baseline to lay longer term groundwork, as supported by BP's Russell A. Carleton who noted various, specific stabilization rates while also avoiding (and thus helping me avoid) the specific flaws Pizza Cutter ran into. This, in turn, should have actual findings of undervalued players as early as next winter, using Carleton's sample size minimums. I've solved for the equation I want to run on this, so that won't be an issue. The only potential problem is just how dang long this is going to take, but given its impact it should have for the entirety of specifically the MLB, that's a positive, even if it's something only I find interesting. Knowing how we usually operate here, kinda doubt I'll be the only one.

Where do I go with these findings, as projected to be done by hopefully month's end? I think it dramatically impacts how accurately we evaluate certain players. Far as I know, this isn't being done elsewhere, and haven't the slightest thing to a connect in this sport, be it my local SABR (no transportation), FO folks, or otherwise. Thanks. This is my new pet.
 

charlieoscar

Member
Sep 28, 2014
1,339
I can't say that I am following everything here. You obviously don't want to lump all the pitches together that a pitcher throws to come up with an average RPM as different pitches have different spin rates and pitchers may also have a couple of types of fastballs or curves, etc. that they use. I think what you would want to do is track the RPM rates for the various pitches over periods of time to see if there is an indication of the pitcher tiring, whether it be by the number of pitches in a game or over the season.

Also, you talk about a major-league low for wRC+ from 1871 to present. Isn't some of this stuff meaningless given the rules in early baseball? For example, in addition to pitchers throwing underhand from 50 feet, there was a while when batters could ask for the height of the pitch. Until 1877 batted balls that hit in fair territory then rolled foul were counted as fair. Some players, most noticeably Ross Barnes, became expert at bunting balls that rolled foul. There are games from the 1970s for which no play-by-play data is available. PitchF/X and StatCast are revolutionizing the way baseball can be examined but you can only take those technologies back to the point when they were introduced.
 

Brand Name

make hers mark
Moderator
SoSH Member
Oct 6, 2010
4,397
Moving the Line
I can't say that I am following everything here. You obviously don't want to lump all the pitches together that a pitcher throws to come up with an average RPM as different pitches have different spin rates and pitchers may also have a couple of types of fastballs or curves, etc. that they use. I think what you would want to do is track the RPM rates for the various pitches over periods of time to see if there is an indication of the pitcher tiring, whether it be by the number of pitches in a game or over the season.
Yep, my bad for not explaining this better, though I completely agree. Like RPM does, RPP differentiates between pitches, not just a player on the whole which would be flawed on a number of different mechanical levels, let alone statistical. This especially applies to a guy like Syndergaard who had a top 10 year in RPM with both his four and two-seamers (need to check my RPP) last season. I like your idea and application of RPP though, thank you!

Also, you talk about a major-league low for wRC+ from 1871 to present. Isn't some of this stuff meaningless given the rules in early baseball? For example, in addition to pitchers throwing underhand from 50 feet, there was a while when batters could ask for the height of the pitch. Until 1877 batted balls that hit in fair territory then rolled foul were counted as fair. Some players, most noticeably Ross Barnes, became expert at bunting balls that rolled foul. There are games from the 1970s for which no play-by-play data is available. PitchF/X and StatCast are revolutionizing the way baseball can be examined but you can only take those technologies back to the point when they were introduced.
The 1871 bit is with respect to the weight of a single in wOBA, wOBA is used to calculate wRAA, and then wRAA is used to calculate wRAA. A lot of it is meaningless, yeah, certainly, but even if you bump it up to the modern era, like 1903-present, the single has never been less valuable. It doesn't mean anything in terms of how I'd re-calculate these stats, as these variables would not be reweighted, but I did find it personally fascinating as I was searching through Fangraphs' Guts! portion of the page, only reason I put that in that great wall of text. Now you've got me curious on Ross Barnes though, appreciate the piquing of my curiosity.

Agree on your second point though, yeah, that's a drawback. I knew I was forgetting something in my post, this must have been it. Like Statcast, given it is based on it for its adjustments of batters, this can only go back as far as Statcast's existence which of course is fairly recent. Far as I know, the earliest season this data could be actually started, never mind anything conclusive, was last season, though I'm not sure how far this link goes back, and could be quite important. As such, these angle/velocity adjusted stats would need to be prospective, rather than retrospective, at least initially (eventually today's data will be historical, after all), with respect to what and/or when they can actually analyze.

Not sure I know where you're going with this either, but the .com just put out a call for article submissions.

http://sonsofsamhorn.net/index.php?threads/win-sox-tickets.18124/
Sweet! I was just looking at Sox tickets the other day, realized I probably won't have enough for a dream (against the Cubs) game this season all considered, but this is more than enough motivation to get cracking next week. Got the just the idea to work in these stats and make it an article based on these adjusted numbers, although even with a full season, the sample size is still going to be probably to be less than ideal in many respects. The only question is which player to write it about, though I have it down to two (Mookie or Ortiz, leaning the former).
 
Last edited:

charlieoscar

Member
Sep 28, 2014
1,339
though I'm not sure how far this link goes back, and could be quite important.
The link takes one to a chart of "Hit Probability Breakdown Based on Exit Velocity & Launch Angle." Leaving out the fact that exit velocity has probably risen over the years because of changes in pitching, bats, and player conditioning, one could simply consider that if the exit velocity were faster than the speed of light, fielders would not see the batted ball and only catch if if the ball somehow went right into their glove.

An exit velocity of 60 mph (ignoring outside factors such as gravity, friction, distance) is equivalent to 88 feet per second: 90 mph is 132 fps; 120 mph is 176 fps. At the latter velocity a corner infielder playing 100 feet from the plate would have ~568 milliseconds to react (this is assuming a launch angle of zero). Usain Bolt averaged about 2.45 meters per stride in his record setting 100-meter races and about 9.63 seconds. This means his averages stride was about 8 feet and he would be able to take about 2.4 of them in 568 msecs with a range of just over 19.3 feet.

You are not going to have infielders who can compete with Bolt: you will have some with quicker reaction times than others, which should translate into more range for given exit velocities. But I think the question that needs to be asked is about pitchers and which ones have a tendency to give up fewer balls hit with high exit velocity, especially those with launch angles that would result in a smash at the feet or a line drive to an infielder

the single has never been less valuable
Quite coincidentally I started looking at the value of basehits from an RBI perspective the other day but I haven't finished the work. What I am doing is looking at what percentage of all RBI are accounted for by singles, doubles, triples, and home runs. For example,

1950 (play-by-play data missing for 184 games)
1B -- 41.24% of RBI
2B -- 17.48%
3B --- 5.53%
HR -- 35.75%

2015 (haven't added 2016 to my database yet)
1B -- 38.70% of RBI
2B -- 19.96%
3B --- 3.06%
HR -- 38.28%

As you can see, singles still account for more of the RBI than other hits; although, doubles and home runs are increasing (the decrease in RBI from triples is due at least in part to smaller stadia).
Note: data from Project Retrosheet