Websites for baseball data, analysis and visualization

Sprowl

mikey lowell of the sandbox
Dope
SoSH Member
Jun 27, 2006
34,625
Haiku
Baseball data have proliferated since the last version of this thread in 2007, so I'd like to update the SoSH portfolio of websites for baseball information. What are your favorite sites for data and visualization? Include a link to the site, and describe a highlight or two -- does the site offer something unique not available elsewhere? Does it give you useful options for displaying comparisons or trends? Does it get you access to scarce data?
 
For example, in his thread on Hanley Ramirez, pantsparty included a link to Baseball Savant, which gives access to batted ball velocities (by the way, welcome new member pantsparty to SoSH):
 
pantsparty said:
According to this year's PitchFX data, which now includes batted ball velocity, Hanley Ramirez destroys baseballs. He also has a .346 ISO, which is currently 4th best in baseball.

It was reported that during this offseason, knowing that he would no longer need to play shortstop, he worked at putting on muscle. Are these results something we can expect to continue going forward, or it just a random blip and he'll return to his career average?

Related: is there a correlation between having your helmet fall off and hitting the ball hard?
 
This thread is on the main board to get more views and participation -- this is first and foremost a Red Sox site, and the main board is where the action is. It might eventually migrate to the MLB forum.
 

OnWisc

Microcosmic
SoSH Member
Apr 16, 2006
6,912
Chicago, IL
Couple more comments on Baseball Savant (apologies if this is all common knowledge)-

For the Excel minded, the PitchFX search feature ( http://baseballsavant.com/pitchfx_search.php ) allows one to download spreadsheets including all the data for every pitch thrown by a pitcher in a particular season. Using the "Batter Stands" feature when setting up the search allows for analysis of vL and vR. The data includes, among many other things, pitch type, location (by x,y coordinates as well as a simplified zone classification with the strike zone broken down into 1-9 and four zones for high/low/inside/outside balls numbered 11-14), situation and result, and can be useful for determining what a pitcher throws when and what the result tends to be.

The pitch classifications are raw, so may not tie to certain other sites that actually scrub the data and reclassify certain pitches. Additionally the time coding on some of the pitches is sometimes off (at least in prior years- I haven't looked at any 2015 data yet), so using that to sequence pitches could occasionally result in pitches showing up out of order. I also found at least one situation where an entire game was missing for one pitcher.

While the query functions on both the Savant site and others such as Brooks Baseball allow for a quicker analysis, for anyone who prefers to build their own analysis off of raw data, these spreadsheets are pretty invaluable.
 

Sampo Gida

Member
SoSH Member
Aug 7, 2010
5,044
Other than Brooks Baseball and Baseball Savant, and Hittracker, there does not seem to be much change in the stats landscape since 2007.  Both B-Ref and Fan Graphs have expanded on what they had then and I would bet 90% of us get most of our non-pitch f/x  stats from those 2 sites.  I use baseball savant mainly for matchups and now batted ball velocity data thats available
 
I would like to know where to get statcast raw data, or know if its even available (seems not from what I can gather.  Right now it seems more of a marketing tool to drive people to watch MLB network.  Baseball Savants data seems to be the only data available unless you consider selected individual video clips out there as good data, but where are they pulling their data from?.
 

Snodgrass'Muff

oppresses WARmongers
SoSH Member
Mar 11, 2008
27,644
Roanoke, VA
Sampo Gida said:
I would like to know where to get statcast raw data, or know if its even available (seems not from what I can gather.  Right now it seems more of a marketing tool to drive people to watch MLB network.  Baseball Savants data seems to be the only data available unless you consider selected individual video clips out there as good data, but where are they pulling their data from?.
 
It's derived of HITf/x so it's not publicly available at this point, sadly. I'd love to have access to it as well.
 
And not to toot our own horn, but www.sonsofsamhorn.com has some fantastic statistical and visual analysis work being done on some fairly cutting edge topics, like catcher framing.
 

Jnai

is not worried about sex with goats
SoSH Member
Sep 15, 2007
16,143
<null>
Baseball Savant is scraping from the text strings that are being pushed to gameday. We aren't doing it yet because they're very likely to release an API in the near future once they feel comfortable with the state of the product. The system is still spitting out some funny numbers and is only spitting out numbers for a subset of batted balls, and while those can be cleaned, it's hard to know what is a good representation of reality yet.
 
As an aside, none of that data is HITf/x, to the best of my knowledge: it's all being pulled from the Trackman system that's part of Statcast. My understanding is that HITf/x has never been a live product because of postprocessing that Sportvision needs to do.