Do pitchers actually want to aim for a donut hole?

CSteinhardt

"Steiny"
Lifetime Member
SoSH Member
Dec 18, 2003
3,202
Cambridge
I've been playing around with some of the statcast data and trying to figure out how well we can predict the success of a pitch from the tracking data alone. More broadly, the idea behind FIP was to recognize that pitchers do not control the defenses behind them, so that any attempt to measure pitcher skill should take the "luck" of whether their defense was successful out of the metric. However, there are other forms of luck as well. Sometimes a hanging breaking ball doesn't get punished. And sometimes an excellent pitch ends up over the fence.

So, the idea is to train models to predict what the typical outcome of a pitch should be. I've ended up taking an approach similar to pitching+, but with a few modifications because I wanted to arrive at the same idea independently. In particular, I feel strongly that the release point cannot be used in trained algorithms like this, because the release point ends up being too identifiable to a specific pitcher in many cases. Thus, you end up using the actual results of some of a pitcher's pitches to predict the others, which is a much different problem than using the results of similar ones. Or, to put it another way, pitches thrown with precisely deGrom's release point are really, really good, because many of those are thrown by deGrom, who is really, really good. But if you took another pitcher and adjusted their release point to match deGrom's, they wouldn't be expected to have similar success.

I thought it might be fun to put a thread together with some of the stuff I've been learning, as well as some of the things which I'm still trying to understand. Feel free to either move the thread or tell me to add it to existing threads someplace if there's a better fit.

I thought I'd start out with something which I really didn't expect to find here and which goes against what I learned about pitching as a kid: the whole donut hole thing isn't actually quite right. What I thought I knew about pitching was that the couple of inches around the border of the strike zone belonged to the pitcher and everything else belonged to the hitter. That is, the value of a pitch might look something like this (produced from the model I've been working on):

78618

Here, the pitch score is approximately in runs saved compared with an average MLB pitch in 2023, and a positive score is good for the pitcher. This is from the catcher's point of view to a right-handed hitter, and zone_z the actual height of the pitch scaled to the hitter's strike zone. Something similar is shown in the Fangraphs version of pitching+; here is their equivalent location for a 3-2 sinker (they break this up by count and pitch type):

1708873161420.png

Very few actual MLB pitchers, including the excellent ones, actually make a donut hole like this. For example, here are the locations of all MLB pitches in 2023 vR along with Kevin Gausman and Aroldis Chapman (just a couple of examples):
2023location_pitch_countALL_vR.jpg7862178620
Chapman in particular just takes incredible stuff and tries to hit the zone with it. But even good MLB pitchers seem to be mostly capable only of hitting the zone, not also of consistently hitting the corners. The model still thinks he's an excellent pitcher, basically rating his stuff as being so valuable that even when he throws the ball down the middle, it's about an average MLB pitch in terms of expected result.

Anyway, a donut hole appears to be the goal when we assume, like FIP does, that the pitcher has no control over balls in play. It's still important to miss the center of the zone, because balls that are throw middle-middle tend to end up disproportionately over the fence. But if we look at what happens based on the tracking after a ball is put in play, something different pops up:
78623
At first I was a bit puzzled by this, but in retrospect it actually seems obvious what is happening. The barrel of the bat is a fixed distance from the hitter's shoulders, so when a MLB hitter makes contact, the distance from the shoulders is essential. There's a velocity dependence as well, because of how quickly a hitter can turn on a pitch. For example, this is the same thing for fastballs and for changeups:
7862478625
But the idea is the same. And if we look at the same thing for lefties, of course, it's just a mirror image:
78626
And I think realizing this also helps to explain why a pitcher like Blake Snell is so effective. Here's his pitch map vR in 2023:
78627
When I tried to project his success using the donut hole idea, it thought Snell was around the 50th best SP in MLB in 2023 on expected results. Including the effect of command on pBABIP, Snell ends up 8th. So whether you want to sign Snell is also something of a test of whether you treat pBABIP as a skill or purely as luck. This same metric also really likes Pivetta, by the way.

Anyway, I'd love to get some feedback on things I could try with this model, both in terms of improving it and also what might be useful questions to ask. This post is a big long as is, so one that I'll leave to a future post is that I found the first 10 or so pitches of a SP's outing are potentially predictive of the rest of the outing in terms of pitch quality. So, I wonder whether teams could use real-time tracking data to occasionally decide that a pitcher just doesn't have his best stuff that day, turn the outing into a side session, and bring him back in a few days.

Oh, and the Devers HR above is one of the best pitches by this metric to have been hit for a HR in 2023. And here is the pitch with the highest pitch score (since the count is included in predicted runs gained/lost, it's really the nastiest 3-2 pitch).
 

zenax

Member
SoSH Member
Apr 12, 2023
360
A problem with calling balls and strikes is the plate. It is an imaginary rectangle, 17 inches square in the horizontal direction but the actual plate is only 17 inches wide from its front portion to points 8.5 inches up each side. In its final 8.5 inches, it tapers to a point. So a late-breaking pitch could be visibly outside the plate for the first half of its path, then move into the invisible part of the zone in the final part. Or, a pitch over the top part of the zone could dip into it in the invisible part. A 90-mph pitch would only be in the invisible part for about 5.4 milliseconds.

Another problem with calling balls and strikes is that umpires cannot get an accurate view of most pitches because they are looking at them at an angle. For example, an umpire looks down to see a low pitch and how well can he tell if the top of the ball hits the zone at the front end of the plate?
 

CSteinhardt

"Steiny"
Lifetime Member
SoSH Member
Dec 18, 2003
3,202
Cambridge
Not sure how much interest there is, but thought I'd share another conclusion I was drawing from playing around with this. Please feel free to let me know if I should move this somewhere else / take to another board.

I was looking at an effect which has been known for a couple of decades, where pitchers gets less effective the third time through the batting order. There has been considerable discussion and debate over which of two possible explanations is the primary cause:

Explanation 1: Hitters learn from their previous PAs and pitchers only have a limited number of tricks, so by the third time through the order, the hitters are better at handling that particular pitcher. If this is correct, then there is a big difference between the 18th and 19th hitters, and removing the pitcher after 18 makes a significant difference. This feels a bit counterintuitive to me at the major league level, since the hitters get so much scouting information and even have been using pitching machines that try to replicate individual pitchers. But it seems a lot more intuitive at lower levels.

Explanation 2: This isn't really about the hitters. It's just that the pitcher has thrown another 30-45 pitches following their previous time through the order, so the pitcher is a lot more tired. If this is correct, then pitch count is very important, but there's nothing magical about the third time through the order, just that the third time through the order is later in the game than the second time.

My understanding is that the early results on this seemed to favor the first explanation. However, reading more recent stuff seems to favor the second explanation. The author of the previous article I linked has a followup a few years later arguing that it's entirely pitcher fatigue. And there's a more academic study from last year which draws the same conclusion.

However, a big part of the problem is being able to separate out pitch quality from pitch outcomes which can depend upon luck and small samples. So, I tried attacking this with the pitch score metric I described in the first post of the thread.

I looked at all outings of at least 90 pitches,to try to eliminate the survivor bias that you the data on, e.g., a 25th hitter only comes from outings which were good enough to face that many hitters.

78957

Like before, this pitch score is runs saved per pitch, so a positive number is better for the pitcher. I should probably do a more careful version of this and try to get the uncertainties right at some point as well. But the point is clear -- pitchers do tire, and over the course of an appearance, they get worse.

However, unlike recent results, looking at it in terms of the pitch quality finds that the hitter effect is more important. Here's the same thing (note the change in y axis) with the difference between actual runs saved per pitch and expected overlaid in red:
78958

The hitter here is far more significant (this are the same blue values as above). As you might imagine, the top of the order is generally harder for pitchers to face and they benefit from facing the bottom of the order. But if we compare the times through the order, we can also see that there indeed seems to be a significant penalty. And it's not just the third time through the order -- it's the second time as well.

Moreover, it's not for every hitter. The top of the order gets significantly better, much more so than the dropoff in pitch quality. But the bottom of the order is pretty much just always bad, and doesn't really improve with extra at-bats. Or, to put it another way, it seems that good hitters make good adjustments and bad hitters don't. Which does make sense, I guess. But that adjustment is much larger than the difference in pitch quality, so that it's really the hitter who is most responsible.

That only some hitters seem to make the adjustment might also explain why different versions of the analysis seem to give different answers. What this suggests is actually a bit complex. Perhaps the answer is that the right strategy is to use an opener to get through the top of the order, but that the follower can go through the bottom of the order three times if they're effective. Or, maybe even the second time through the order penalty means that the optimal usage of a pitching staff is something much different. Maybe the ideal setup is an opener for around 5 hitters, then a follower 1.5 times through the order who can throw every 3rd/4th day, and then a third pitcher who faces around 9 hitters?

Anyway, hoping to get some thoughts on both whether this makes sense / what I might be missing and also other questions worth exploring with a metric like this.
 

Njal

New Member
Apr 23, 2010
14
This is super interesting, thanks! One comment and some questions/suggestions.

1. I initially found the analysis confusing because the wOBA graphs are colored oppositely from the rest -- there, the red corresponds to pitcher failure, whereas in the others, red corresponds to pitcher success. It might be clearer to recolor, even at the expense of changing the red/large linkage.

2. The Snell map seems to suggest that at least some pitchers can hit a corner pretty consistently. If they can do that, it's not crazy to think they can make a donut hole, it's just that we might need to look a bit harder to find it in the data.

3. The general analysis seems to suggest that for a given batter handedness and pitch type, there are 1-2 corners that are good places for a pitcher to put the ball. What if you made a dataset consisting of (a) good pitchers (e.g. FIP < some cutoff), (b) throwing a particular pitch to (c) batters of a given handedness? If you got Snell-type graphs that would be cool, particularly if the hot corner moved around according to your model.
 

BaseballJones

ivanvamp
SoSH Member
Oct 1, 2015
24,787
Very interesting stuff. I'm going to offer this hypothesis to @CSteinhardt 's question: It is due to pitcher fatigue and, specifically, the style of pitching we now see in MLB (guys throwing with greater spin and with much higher average velocity). Starting pitchers used to throw slower than they do now, and that was partially because they weren't as good athletes in general as they are now, but partly because starters knew they were expected to go deep into games, which meant they couldn't max effort each pitch, which meant that they "pitched to contact" more, hoping to induce weak contact rather than just strike guys out all the time, which meant less stress on their arms, which meant that they had more control over their pitches late in games.

Guys today max effort seemingly every pitch, and that wears them down in a shorter time frame, which means that even if they can hump it up to max velocity in the 6th inning, they generally don't have as much control, and balls get hit harder when there is worse location.

Maybe all this pitching adjustment is due to the quality of hitting - they feel like they HAVE to get strikeouts. Or maybe it's just how they want to pitch. But clearly pitchers are throwing more pitches over fewer numbers of innings than they used to, and when you look at a bunch of starters in the 70s and 80s, you don't see the radical differences pitching the third time through the order compared with the first.
 

SirPsychoSquints

Member
SoSH Member
Jul 13, 2005
5,146
Pittsburgh, PA
Very interesting stuff. I'm going to offer this hypothesis to @CSteinhardt 's question: It is due to pitcher fatigue and, specifically, the style of pitching we now see in MLB (guys throwing with greater spin and with much higher average velocity). Starting pitchers used to throw slower than they do now, and that was partially because they weren't as good athletes in general as they are now, but partly because starters knew they were expected to go deep into games, which meant they couldn't max effort each pitch, which meant that they "pitched to contact" more, hoping to induce weak contact rather than just strike guys out all the time, which meant less stress on their arms, which meant that they had more control over their pitches late in games.

Guys today max effort seemingly every pitch, and that wears them down in a shorter time frame, which means that even if they can hump it up to max velocity in the 6th inning, they generally don't have as much control, and balls get hit harder when there is worse location.

Maybe all this pitching adjustment is due to the quality of hitting - they feel like they HAVE to get strikeouts. Or maybe it's just how they want to pitch. But clearly pitchers are throwing more pitches over fewer numbers of innings than they used to, and when you look at a bunch of starters in the 70s and 80s, you don't see the radical differences pitching the third time through the order compared with the first.
In 2023, the league had a tOPS+ of 113 facing a pitcher the third time through the order.
2013: 113
2003: 111
1993: 109
1983: 110
1973: 106
1963: 109
1953: 107

https://www.baseball-reference.com/leagues/split.cgi?t=b&lg=MLB&year=2023#all_times

So there has been an exaggeration over the years, but it's always been there.
 

BaseballJones

ivanvamp
SoSH Member
Oct 1, 2015
24,787
In 2023, the league had a tOPS+ of 113 facing a pitcher the third time through the order.
2013: 113
2003: 111
1993: 109
1983: 110
1973: 106
1963: 109
1953: 107

https://www.baseball-reference.com/leagues/split.cgi?t=b&lg=MLB&year=2023#all_times

So there has been an exaggeration over the years, but it's always been there.
Yes and even back in the day, starters still got tired. But the exaggeration you mention here is what I would have expected. So this checks out to me.
 

jarules1185

New Member
Jul 14, 2005
577
But if we look at what happens based on the tracking after a ball is put in play, something different pops up:
View attachment 78623
At first I was a bit puzzled by this, but in retrospect it actually seems obvious what is happening. The barrel of the bat is a fixed distance from the hitter's shoulders, so when a MLB hitter makes contact, the distance from the shoulders is essential. There's a velocity dependence as well, because of how quickly a hitter can turn on a pitch. For example, this is the same thing for fastballs and for changeups:
So this is essentially saying that if a pitch has been hit (disregarding swinging strikes, which I assume are still more frequent on the red corners here than middle-middle), that up-and-away and down-and-in pitches are very nearly as bad outcome-wise (from a pitcher's perspective) as middle-middle ones, right?

Essentially that all possible swing sweet spots form an arc with a midpoint of the hitter's shoulder, and that arc intersects the middle of the zone, the up-and-away corner of the zone, and down-and-in corner. Meaning up-and-in and down-and-away pitches are quantitatively objectively superior to up-and-away and down-and-in pitches, in a very general sense. Obviously pitch type and a million other variables come into play on any given pitch.

Definitely has an intuitive physiological feel, and I've never thought about it this way before.
 

CSteinhardt

"Steiny"
Lifetime Member
SoSH Member
Dec 18, 2003
3,202
Cambridge
In 2023, the league had a tOPS+ of 113 facing a pitcher the third time through the order.
2013: 113
2003: 111
1993: 109
1983: 110
1973: 106
1963: 109
1953: 107

https://www.baseball-reference.com/leagues/split.cgi?t=b&lg=MLB&year=2023#all_times

So there has been an exaggeration over the years, but it's always been there.
Thanks for sharing that. It's interesting that it seems to have sped up a bit recently.

This makes a lot of sense if the primary reason is pitcher fatigue, since pitchers are now tiring a bit earlier than in earlier eras. But if my pitch score metric is correct in suggesting that the main effect is hitter familiarity, then this doesn't fit as well. Maybe if the effect is about 80% hitter and 20% pitcher, it's only the pitcher side which has grown? Or perhaps it's actually still on the hitter side, and the change is the ability of hitters to go back to the dugout and watch video between at-bats? It would be interesting to try to grab statcast data from the minors and look for something similar at a level where the hitters don't quite have the same tools - anybody know how to grab that?


So this is essentially saying that if a pitch has been hit (disregarding swinging strikes, which I assume are still more frequent on the red corners here than middle-middle), that up-and-away and down-and-in pitches are very nearly as bad outcome-wise (from a pitcher's perspective) as middle-middle ones, right?

Essentially that all possible swing sweet spots form an arc with a midpoint of the hitter's shoulder, and that arc intersects the middle of the zone, the up-and-away corner of the zone, and down-and-in corner. Meaning up-and-in and down-and-away pitches are quantitatively objectively superior to up-and-away and down-and-in pitches, in a very general sense. Obviously pitch type and a million other variables come into play on any given pitch.

Definitely has an intuitive physiological feel, and I've never thought about it this way before.
Yeah, I think that's the conclusion. Pretty much just that it comes down to the distance from the shoulders to the barrel of the bat, I guess? So it's really hard to get the good part of the bat on a ball that's up and in (typically jams the hitter) or low and away (hits the end of the bat) even if you hit it, but a pitch thrown on the other two corners is still hard to hit, but if you do reach it, you usually get good wood on it. Definitely not the way I was taught as a kid, though!
 

simplicio

Member
SoSH Member
Apr 11, 2012
5,318
I'm curious about how the projected pitch score graph would map against a pitch characteristics metric like Stuff+.

Also: why are we calling it a donut hole? Isn't it just a donut cause the hole is the middle-middle we're hoping to avoid?
 

CSteinhardt

"Steiny"
Lifetime Member
SoSH Member
Dec 18, 2003
3,202
Cambridge
I'm curious about how the projected pitch score graph would map against a pitch characteristics metric like Stuff+.

Also: why are we calling it a donut hole? Isn't it just a donut cause the hole is the middle-middle we're hoping to avoid?
That's a good question. In many ways it should be similar to Stuff+, since it's working similarly. Or, I guess to be more specific, it should be similar to Pitching+, since that combines both Stuff+ and Location+. However, because I did this independently, there are going to be several differences in methodology which might end up being important. There's one which I know of that ends up being very important - Stuff+ includes information about the release point.

This is actually a complex problem, so let me be a bit more (but hopefully not too) technical for a moment. The way that a model like this is put together involves a training set, where you get to see both pitch characteristics and the outcome, and then a test set, where you get to see characteristics but don't get to see the outcome. A successful model involves learning enough from the training set that you can do a good job of predicting the outcomes for the test set.

As you allow the model to be more complex, it can do a better and better job of describing the training data. So, for example, it might "learn" that the further a pitch is from the strike zone, the more likely it is that the outcome will be a ball, and that the closer a pitch is to the center of the strike zone, the more likely that the batter will swing at it. As the model learns this, it should become better both at the training set and the test set, since it's actually learning something about baseball.

However, there is a danger: if I allow the model to be too specific, it actually starts to overdescribe the training set in a way that no longer looks like baseball. For example, "if the pitch is closer to the center of the strike zone, it is less likely to be called a ball" is useful. But "if the pitch was thrown on May 6 by Adam Wainwright to Javier Baez on a 0-2 count in the 4th inning, it is likely to have been hit into play", while it does a better job of describing the training set, is very unlikely to help you model the test set. Often, this sort of overtraining actually makes the description of the test set worse, since you're replacing learning about baseball with learning about the details of the dataset.

The problem is that separating the two can be particularly tricky. For example, clearly it makes sense to evaluate knuckleballs differently than other pitches given their characteristics. However, in 2023 nearly all knuckleballs were thrown by Matt Waldron. And like with any pitcher, variation in the biomechanics is bad. So, if Matt Waldron misses his release point, it is a good predictor of a poor knuckleball. But if another knuckleball pitcher misses Matt Waldron's release point (since their best release point is likely different), that isn't an indication of a poor outcome. Trying to figure out how to separate these sorts of things is a real pain and much of the work you do in machine learning.

From the description given of Stuff+, release point is a big part of their algorithm. I experimented with using it, and found that for what I was doing, it typically resulted in overtraining. That is to say, every pitch thrown with Tim Hill's release point was thrown by Tim Hill. And more generally, the release points are specific enough that this is true for more typical pitchers as well. I already know that 100% of the pitcher with deGrom's release point are excellent because deGrom is excellent. But if you took another MLB pitcher and adjusted their release point to make it more deGrom like, it wouldn't make them better, presumably. Being able to reproduce the combination of speed, movement, and command on his slider, on the other hand...

So for those reasons I do end up with something which is similar to Pitching+ but very much not identical. I threw some of the info up here if you want to take a look: 2023 data for SP and for min. 300 pitches thrown. I'm still playing around with some of the training methodology and also very much with the presentation, so I apologize in advance for making a mess of the website and layout. But feel free to take a look and I'm happy to discuss possible improvements to the model (and presentation).

As for the donut hole, I did some looking and what I thought was a standard term is apparently something specific to the coaches I had as a kid. So perhaps if somebody knows how, we could change the thread title?
 

ShawnDingle

New Member
Apr 10, 2023
6
That's a good question. In many ways it should be similar to Stuff+, since it's working similarly. Or, I guess to be more specific, it should be similar to Pitching+, since that combines both Stuff+ and Location+. However, because I did this independently, there are going to be several differences in methodology which might end up being important. There's one which I know of that ends up being very important - Stuff+ includes information about the release point.

This is actually a complex problem, so let me be a bit more (but hopefully not too) technical for a moment. The way that a model like this is put together involves a training set, where you get to see both pitch characteristics and the outcome, and then a test set, where you get to see characteristics but don't get to see the outcome. A successful model involves learning enough from the training set that you can do a good job of predicting the outcomes for the test set.

As you allow the model to be more complex, it can do a better and better job of describing the training data. So, for example, it might "learn" that the further a pitch is from the strike zone, the more likely it is that the outcome will be a ball, and that the closer a pitch is to the center of the strike zone, the more likely that the batter will swing at it. As the model learns this, it should become better both at the training set and the test set, since it's actually learning something about baseball.

However, there is a danger: if I allow the model to be too specific, it actually starts to overdescribe the training set in a way that no longer looks like baseball. For example, "if the pitch is closer to the center of the strike zone, it is less likely to be called a ball" is useful. But "if the pitch was thrown on May 6 by Adam Wainwright to Javier Baez on a 0-2 count in the 4th inning, it is likely to have been hit into play", while it does a better job of describing the training set, is very unlikely to help you model the test set. Often, this sort of overtraining actually makes the description of the test set worse, since you're replacing learning about baseball with learning about the details of the dataset.

The problem is that separating the two can be particularly tricky. For example, clearly it makes sense to evaluate knuckleballs differently than other pitches given their characteristics. However, in 2023 nearly all knuckleballs were thrown by Matt Waldron. And like with any pitcher, variation in the biomechanics is bad. So, if Matt Waldron misses his release point, it is a good predictor of a poor knuckleball. But if another knuckleball pitcher misses Matt Waldron's release point (since their best release point is likely different), that isn't an indication of a poor outcome. Trying to figure out how to separate these sorts of things is a real pain and much of the work you do in machine learning.

From the description given of Stuff+, release point is a big part of their algorithm. I experimented with using it, and found that for what I was doing, it typically resulted in overtraining. That is to say, every pitch thrown with Tim Hill's release point was thrown by Tim Hill. And more generally, the release points are specific enough that this is true for more typical pitchers as well. I already know that 100% of the pitcher with deGrom's release point are excellent because deGrom is excellent. But if you took another MLB pitcher and adjusted their release point to make it more deGrom like, it wouldn't make them better, presumably. Being able to reproduce the combination of speed, movement, and command on his slider, on the other hand...

So for those reasons I do end up with something which is similar to Pitching+ but very much not identical. I threw some of the info up here if you want to take a look: 2023 data for SP and for min. 300 pitches thrown. I'm still playing around with some of the training methodology and also very much with the presentation, so I apologize in advance for making a mess of the website and layout. But feel free to take a look and I'm happy to discuss possible improvements to the model (and presentation).

As for the donut hole, I did some looking and what I thought was a standard term is apparently something specific to the coaches I had as a kid. So perhaps if somebody knows how, we could change the thread title?
Wow, thanks for the info.