Get Sandlot Stats on your phone.
Scan in the code above.
About the Book
Goals | Advantages | Real-world Applications|
Descriptive Vs. Inferential | Baseball Data Sources
". . . A week after graduation in 2008, I landed a job at The Hartford, where I've been working my way up the ladder, all the while using what I learned in your classes to help me stand out from the rest. If it had not been for all of your extreme formulas derived from baseball stats and me having to learn how to formulate them in Excel, I wouldn't be where I am right now . . . "
Read Gregory's entire letter in the Testimonials section.
The overall goal of the book, Sandlot Stats, is to provide students with a Liberal Arts course in the discipline of mathematics. Specifically, the area of mathematics presented in this book is called Statistics. What skills should a student obtain from a Liberal Arts course in mathematics? A partial list includes:
- Critical Thinking Skills to:
- Define and formulate problems.
- Interpret and evaluate results.
- Understand patterns and structures.
- Develop quantitative thinking skills.
- Communication Skills to:
- Write concisely.
- Explain complex concepts.
- Develop precise and logical arguments.
- Computer Skills to:
- Use spreadsheets to analyze and summarize data.
- Create tables and their corresponding graphs.
- Create documents which include text, graphs and tables.
- Use PowerPoint software to create oral presentations.
- Research Skills to:
- Use original sources.
- Formulate research problems.
- Find valid information on the Internet.
- Apply theoretical approaches to research problems.
- Mathematical Skills to:
- Demonstrate a basic understanding of percentages, ratios, rates, probability, and statistics.
- Translate real-life problems into mathematical terms.
- Extract information from tables and graphs.
- Introduce problem solving skills.
- Develop fluency with numbers and an ability to judge the reasonableness of a numerical calculation.
- Apply problem solving techniques to real life problems.
- Apply physical and theoretical models to real world situations.
The goal of this course is to teach a traditional one semester statistics course using data acquired from baseball. Throughout the course, the students will also learn critical thinking skills, communciation skills, computer skills, research skills, and of course, mathematical skills.
The student, with the help of a calculator, will be required to perform each of the statistical processes by hand (without the aid of any statistical software package). By performing these tasks by hand, the student will gain an understanding and appreciation of statistical concepts. However, we don’t want to exclude statistical software from this course. Microsoft Excel will be the statistical software package of choice for this course. It is not the strongest statistical package; however, it is available on almost all computers and has enough statistical features to handle most personal and job related tasks.
For a description of what each chapter in Sandlots Stats contains, look at the Chapter By Chapter section.
Sandlot Stats provides an introductory course in Statistics. The underlying data for the study of Statistics in this course comes from baseball records which is what makes it different from other introductory statistics courses. Business statistics courses use data from the world of business and economics. Scientific statistics courses, often called biostatistics, use data from the world of biology, pharmacology, and the health sciences.
Suppose a person was told to analyze the following data: 28, 42, 34, 37, 10, 19, 33, and 44. This data could be the number of products sold for the last eight months, the number of patients in eight studies, the number of yearly home runs hit by a player, the number of people answering “yes” to eight questions in a survey, etc. The point is if you were not told the meaning of the numbers the same basic techniques from Statistics could be used. Since all areas of study are involved with data, all areas need the tools from Statistics to analyze the data.
Many students who take an introductory course in business statistics and biostatistics:
- Have difficulty understanding the concepts of statistics applied to their discipline.
- Are not familiar with the underlying subject matter which produces the data.
- Are not interested in the underlying subject matter.
- Take the course because it is required and not as an elective.
It is postulated by many educators that learning a new discipline is enhanced when a student understands and is interested in the underlying subject matter.
Therefore, applying statistics to the world of baseball has these advantages:
- Many students, both male and female, are familiar with the terminology and rules of baseball.
- Students who would not choose to take a statistic course are motivated to take this course because of their love of baseball.
- A great amount of baseball data is easily obtained from many websites.
- Since the game of baseball spans over 150 years, a great amount of data can be connected to the history of the United States.
- Students see how mathematicians design quantitative models which are specific to a certain field of study.
- Students who are interested in baseball are already familiar with such statistics as mean, proportion, and weighted mean through their knowledge of such baseball statistics as batting average, on-base percentage, and slugging percentage.
- Using statistics to analyze baseball issues illustrates the value of statistics.
- There is an easy transference of statistical techniques from baseball to the job environment, to political and social issues, and to consumer decisions.
Baseball fans debate many issues in baseball. Some of the issues debated are:
- Which player was the best offensive player of all time?
- Which baseball team was the best team of all-time?
- Which league was better in a given year?
- Which offensive baseball statistics are the most important for the production of runs scored by a player or by a team?
- At what age does a Major League player reach his peak as a batter?
- Will a player ever break Joe DiMaggio’s 56-game hitting streak?
- What does it take for a player to hit .400 in today’s game?
- Which baseball strategies do not work?
- Does a certain player belong in the Hall of Fame?
- Is there a home field advantage in baseball?
- How do we go about comparing two players?
- Which batting feat is the toughest to duplicate today?
- Which batting statistics can be used to best predict a won-lost record for a team?
Throughout this book, the techniques of statistics will be used to address some of the questions above along with many other issues in the world of baseball. John Thorn and Pete Palmer, in their book The Hidden Game of Baseball, write: “Baseball may be loved without statistics, but it cannot be understood without them.”
There are several books on the market that use statistics to analyze topics in baseball. However, these books do not teach the subject matter of Statistics. Instead, they apply statistics to baseball issues.
The philosopher Fan Li said, “Give a fish to a man; he has food for a day. Teach a man to fish; he learns a skill for life.” Sandlot Stats follows that philosophy. By teaching the subject matter of Statistics, a student will be able to apply the concepts of statistics to his job, to his health decisions, and to his consumer decisions. The chapters of this book will include most of the topics covered in any introductory statistics course. What makes this statistic course different from other statistic courses is the data comes from the world of baseball.
From the script of the movie “Bull Durham” a down-on-his-luck catcher Crash Davis asks “know what the difference between hitting .250 and .300 is? It is 25 hits. Twenty-five hits, in 500 at-bats, is 50 points. OK? There is just six months in a season that is about 25 weeks. That means if you get just one extra flare a week, a ground ball, a dying quail… you’re in Yankee Stadium.”
The answers to the following questions are important to a general manager. Which batting statistics are most important in determining a player’s ability as a hitter? Which team batting statistics best predict the number of runs scored by a team? Based on a player’s current age, how many years should be included in a player’s new contract?
Many areas outside of baseball are also looking to separate chance outcomes from meaningful outcomes caused by ability. For example, in finance, when comparing two money managers, can we conclude one manager has more ability based on their performances? In the study of cancers, researchers are looking at which genes are altered in cancer patients. Again, genes could be different simply by chance. To be successful, we wish to separate the genes that are different by chance from those that cause the cancer. Many other fields of study are interested in the same question. Can we separate a chance outcome from an outcome due to the ability of a person or a product?
Assuming certain conditions are true, a sample result is termed statistical significance if its occurrence strictly by chance has a very small probability (usually less than 5%) of occurring.
One difficulty in writing this book is keeping the length of the book to a reasonable number of pages. Introducing all the necessary topics presented in any statistics course requires an entire book. An entire book is also necessary to cover several topics in baseball research. My goal is to incorporate both these topics into one book.
A second difficulty in writing a book about baseball is that baseball records are continually changing. Some of the records and player data mentioned in this book are no longer current. Of course, this is one of the beautiful things about baseball. As I say to my wife many times, “What makes baseball so fascinating to me is that no matter how many baseball games I have seen in over 50 years; I never know when I will see an event I have never seen before.”
Descriptive Vs. Inferential
The term statistics can have two meanings. It can refer to the discipline itself or to various descriptive measures such as batting average, on-base percentage, and slugging percentage.
Of all sports, baseball has the most statistics (descriptive measures) applied to it. While watching a game you are exposed to all sorts of baseball statistics. As soon as a player approaches home plate, his batting average, on-base percentage, number of home runs, and number of runs-batted-in are displayed. Since there is an approximate delay of 45 seconds between pitches, it gives the commentators an opportunity to bombard you with such statistics as his batting average against this pitcher, his batting average against this team, his batting average during the last week, and so on.
For a baseball player to be successful as a hitter, he must make contact with the ball and then direct the ball to a place on the field where it becomes a hit. Many sports people believe the toughest thing to do in any sport is to hit a baseball. If a player is successful 3 out of every 10 times, there is a good chance that player will be elected into the Hall of Fame. There aren’t many pursuits where being successful 30% of the time makes you an elite performer. If you make contact with the ball and get a hit or make contact and make an out, how much of that should be attributed to your skill as a hitter and how much should be attributed to chance or luck? A ball could be hit “right on the screws” but directly at a player for an out; on the other hand, a ball could be mishit and travel 10 feet and become a hit. How is a player’s skill as a hitter separated from chance or luck?
Every plate appearance affects a baseball player’s statistics and can be affected by chance. A player’s plate appearances are summarized by using certain baseball statistics such as batting average, on-base percentage, slugging percentage, number of home runs, and number of runs-batted-in. These baseball statistics are used to measure his performance as a hitter. In baseball, a player’s performance is measured through the use of baseball statistics applied to the player’s actual plate appearances. Applying statistics to actual baseball data falls under the area of Statistics called Descriptive Statistics.
Inferential Statistics is the area of Statistics that is used to measure a player’s true batting ability. A player batting ability is defined by the special characteristics he either was born with or developed through training. Some of these characteristics include his vision, his hand-eye coordination, his height and weight, his strength, and his work ethic. We estimate a player’s batting ability by using the player’s actual batting performance. The techniques used to estimate a player’s batting ability from his batting performance fall under the area of Statistics called Inferential Statistics. Inferential Statistics attempts to decide how much of the variation of baseball data is due to chance and how much is due to the ability of the player. Inferential Statistics is used to compare the batting ability between two or more players, two or more teams, and two or more leagues.
The study of Statistics is thus divided into two areas, Descriptive Statistics and Inferential Statistics.
Descriptive Statistics deals with the collection, organization, summarization and graphical presentation of data. After applying the techniques of descriptive statistics to the collected data, we are in the position to draw subjective conclusions about the data. The conclusions made are limited to only the collected data.
Inferential statistics attempts to use the collected observed data to predict (infer) results for a much larger group of uncollected data. The collected data is called the sample data and the larger group of uncollected data is called the population data. Applying the results of a small sample to make decisions about a population is used in many disciplines. Some examples of this are:
- the testing of a new drug on a small sample before approving it for use
- the use of surveys, administered to a small number of households, to estimate public opinion
- the use of exit polls to predict the winner of an election.
Both areas of Statistics attempt to make sense out of data.
Sabermetrics is defined as the mathematical and statistical analysis of baseball records.
Baseball Data Sources
The major source for our baseball data are websites. The five most used websites throughout this book are: