In my book “Sandlot Stats: Learning Statistics with Baseball”, I do exactly what Professor Ellenberg preaches. My book focuses on high school and college students learning statistics through games. The game I use to coach my students to understand the subject of statistics is also baseball. But there is no reason why adults can’t also use my book to explore the important world of statistics through the game of baseball.
In Part 1, I talked about the board game AllStar Baseball which introduced me, as a child, to the fun and excitement mathematics offers. As I explained in Part 1 the game consists of disks whose sectors simulate a player’s reallife statistics. The areas of the sectors represent the probabilities of that player hitting a single, double, triple, home run, getting a BB, getting HBP, hitting a SF, or making an out.
So how do I use the AllStar Baseball game in my book? In the first chapter the students are instructed to pick one pair of players from a list I give them (of course they can choose their own pair). Each pair contains one player in the Hall of fame (HOF) and one player that is a future candidate. They will compare their two players, using what they learn in statistics, throughout the course. One of the methods used for comparison in the book is to have each student create an AllStar baseball disk for each of their chosen players. Fortunately, there is free software on the internet that allows a student to create a disk after they calculate the needed probabilities. For example, a student can make a free spinner on this website: http://illuminations.nctm.org/Activity.aspx?id=3537.
Once the disks are made the student will play a 9inning game between their two players. Each player will occupy the nine positions in the batting order for their team. The internet software supplies the spinner and off they go. At the end of the course, each student will present a PowerPoint Presentation on whether the HOF candidate should or should not be admitted to the HOF. The class will act as a jury and make a decision on whether they agree or disagree with the student’s argument.
Professor Ellenberg ends his NYT article with these statements about coaching math. “There are many things we’d like to coach our kids to do. And we can’t help playing favorites to some extent. I’ll admit, I’d rather C. J. aimed to be a mathematician than a shortstop. I tried to open his eyes to some more realistic careers that could still satisfy his hunger for the major leagues. “You know,” I told him, “you really like math, and all the teams now have people who work for them analyzing the players’ statistics. You’d probably enjoy that! At this suggestion he became agreeably eager. “Daddy, that’s a really good idea,” he said. “Because almost all major league players have to retire by the time they’re 40 — so then I could get a job analyzing the statistics!”
If you have an interesting story about coaching your child in mathematics through games, please share it with us.
]]>Professor Ellenberg began his article by asking the question many parents have asked him. How can I get my kids excited about mathematics? Then he presented an example of what you do not do. His example involves the child prodigy Norbert Weiner who got a Ph.D. from Harvard at the age of 18. Describing the process his father used to develop his mathematical skills, Norbert said, “He would begin the discussion in an easy, conversational tone. This lasted exactly until I made the first mathematical mistake. Then the gentle and loving father was replaced by the avenger of the blood. … Father was raging, I was weeping, and my mother did her best to defend me.”
If the above sounds familiar when helping your child with their math, the approach suggested by Ellenberg will be more successful and humane. So, how can we present mathematics in such a way that children can find it enjoyable?
In his article Ellenberg says, “I found an answer in something my 8yearold son, C. J., likes even better than math: baseball. C. J. is a baseball fanatic. He lives and dies with the Milwaukee Brewers. He plays Little League with a fierce concentration I seldom see at home. And I’ve learned a lot about what kind of math parent I want to be from an unexpected source — his coaches. Baseball is a game. And math, for kids, is a game, too. Everything for them is a game. That’s the great thing about being a kid. In Little League, you play hard and you play to win, but it doesn’t actually matter who wins. And good coaches get this. They don’t get mad and they don’t throw you off the team. They don’t tell you that you stink at baseball, even if you do — they tell you what you need to do to get better.”
He then gives an example of what it means to coach math instead of teaching it. He says, “For C. J., it means I give him a mystery number to think about before bed. I’m thinking of a mystery number, and when I multiply it by 2 and add 7, I get 29; what’s the mystery number? And already you’re doing not just arithmetic but algebra.” As I am writing this blog my 5 yearold granddaughter is watching my wife put fishsticks on a plate. She put 6 fishsticks on the plate and asked my granddaughter how many more fish sticks do we need to have 8 of them? My wife is coaching math.
Ellenberg cites many games that are math related. Such older games include chess, which builds the ability to follow a series of logical steps and Monopoly, which requires basic arithmetic and probability reasoning. He also suggests newer games which include Rush Hour , a board game about search algorithms; Set, a study in higherdimensional geometry in the form of a viciously competitive card game; and DragonBox, an app for phone or tablet that teaches the formalisms of algebra. If you research online you will be able to find many more games that have mathematical concepts built into them.
My own personal favorite as a child was the board game AllStar Baseball, a spinner game with player disks divided into sectors. The area of each sector was based on the probability of the player’s batting outcome using his reallife stats. For example, the area of the sector numbered 1 represented the probability of that player hitting a home run. Of course, the Babe had the largest sector 1. As a GM I drafted the teams for my league. Then I played the games keeping the team standings and calculating the players’ statistics. Not realizing it, I was doing mathematics and enjoying it. This game exposed me to probability, statistical measures, and basic arithmetic.
To be continued:
]]>I wanted to compare the ESPN top ten players to my list of top ten players, as they appeared in Chapter 18 of my book Sandlot Stats. ESPN’s top ten list was based on the subjective opinion of their chosen committee of experts. However, they were encouraged to use advanced metrics. My top 10 was based on nine quantitative statistics which included AVG (batting average), OBP (onbase pct.), SLG (slugging pct.), OPS (onbase plus slugging), BRA (OBP*SLG), HRA (home run average), H (Number of Hits), HR (number of home runs), and Runs Created for their team [(H+BB)*TB]/[AB+BB]. Also, credit was given for winning a Triple Crown, a Career Triple Crown, and ranking in the top 10 in either Bill James’ Black or GrayInk Test. In Chapter 18, you can read about my 26 finalists and their total points. Like the ESPN list, I only looked at what the players did between the lines. Since my list only considered positional players, I only chose ESPN’s top ten positional players. There was one other difference. The ESPN list considered hitting, fielding, and baserunning whereas my list only considered hitting. Therefore, Rickey Henderson, who finished number 11 on the ESPN list, did not make my list of 26 finalists. My list was based on player accomplishments before 2009 (when Chapter 18 was written).
Notice how similar the two lists are. This shows how important hitting is in the evaluation of positional players. Of course, both lists have “The Babe” as number 1. I can understand the difference in rank 2 between the two lists. Willie Mays was a fivetool player in the important position of center field; whereas Ted Williams was an adequate left fielder. In fact, the Yankees turned down a proposed trade of Ted Williams for Joe DiMaggio because they considered the center field position much more valuable than the left field position. Taking into account fielding and running, I can see why ESPN put Mays in front of Williams.
Except for the order of the 21 players on the two list, the only five players not on both lists are Nap Lajoie, Honus Wagner, Rogers Hornsby, Mickey Mantle, and Albert Pujols. Wagner and Mantle are on the ESPN top 10 list but not on my list. However, Wagner ranks 12^{th} and Mantle 13^{th} on my list of 26 players. The actual difference between the 8^{th} ranked Lajoie and the 13^{th} ranked Mantle is a total of five points in my scoring system. Excluding pitchers, Hornsby ranked 12^{th} , Pujols ranked 15^{th} , and Lajoie ranked 34^{th} on the ESPN list My major beef with the ESPN list is the extreme difference in rank between Lajoie (rank 34) and Wagner (rank 9). Both players played in the same era (18961917) and both were infielders. Lajoie’s career AVG was .338 compared to Wagner’s.327. I gave the edge to Lajoie because in 1901 he was a Triple Crown winner. I guess ESPN liked the fact that Wagner played shortstop while Lajoie played second base. I could have easily called it a tie between the two players.As I mentioned before my AllTime favorite player was Mantle. In my opinion if it wasn’t for his reckless lifestyle and unfortunate knee injury, he would have been in the top 5 on both lists
]]>.
In Chapter 17 titled “Mission Impossible: Batting .400 for a Season” of my book Sandlot Stats, I use sabermetrics to analyze what I believe is necessary for a player to bat .400 for a season today. Using regression analysis, based on those players who either have batted .400 or very close to .400 for a season, a typical .400 hitter has an inplay batting average IPBA > .427 and a strikeout average SOA < .066. The IPBA is H/(ABSO). The SOA is SO/AB. A player’s batting average BA = H/AB. It can be shown that BA = IPBA*(1SOA). Each increase of .010 (10 points) in his IPBA raises his BA by approximately 7 points and each decrease of 10 points in his SOA increases his BA by approximately 3 points.
Since 1913, the .400 hitters club include Harry Heilmann (.403 in 1923), Rogers Hornsby (.401 in 1922, .424 in 1924, .403 in 1925), George Sisler (.407 in 1920, .420 in 1922), Ty Cobb (.401 in 1922), Bill Terry (.401 in 1930) and Ted Williams (.406 in 1941). This elite club, since 1913, includes just six players since Hornsby did it three times and Sisler did it twice. What these players had in common was the ability to make contact with the ball and not strikeout. Their SOA ranged from a low of .024 to a high of .080. Only two SOA were above .066. In 1922, Hornsby had a SOA of .080 but his IPBA of .436 gave him a BA = .401. In 1923, Heilmann had a SOA = .076 but his IPBA of .436 gave him a BA = .403. Their IPBA ranged between .420 and .450with only two less than .427. In 1920, Sisler had an IPBA of .420 but his SOA of .030 gave him a BA = .407. In 1922, Cobb had an IPBA of .420 but his SOA of .046 gave him a BA = .401.
Since 1913, of the ten times a player was able to hit .400 for a season all but Ted Williams did it between 1920 and 1930. The era from 1920 to 1930 was called the “lively ball era” due to the fact that a new tighter wrapped ball led to higher batting averages. Also, before 1930 since pitchers were expected to pitch the entire game the strikeout was deemphasized. Like marathon runners, pitchers wanted to pace themselves by throwing fewer pitches and letting their fielders create their outs.
The four players, since 1941, who came the closest to the magic .400 average were Ted Williams (1957, BA = .388), Rod Carew (BA =.388 in 1977), George Brett (BA = .390 in 1980) and Tony Gwynn (BA = .394 in 1994), Williams, Carew, Brett and Gwynn had corresponding IPBA of .432, .426, .410 and .413. Their corresponding SOA were .102, .089, .049 and .045. Williams and Carew failed to bat .400 because their SOA were too high. Brett and Gwynn failed because their IPBA were too low.
Since 1913, the Triple Crown winners include Rogers Hornsby (1922 and 1925), Jimmie Foxx (1933), Chuck Klein (1933), Lou Gehrig (1934), Joe Medwick (1937), Ted Williams (1942 and 1947), Mickey Mantle (1956), Frank Robinson (1966), Carl Yastrzemski (1967) and the newest member Miguel Cabrera (2012). The exclusive Triple Crown Club, since 1913, includes just 10 members.
Please read my next posting where I examine what it will take for Miguel Cabrera to accomplish both of these feats in 2013 and whether I think Cabrera can and will do it. News Flash: Cabrera is now on pace for 198 RBI which would break Hack Wilson’s singleseason record of 191.
]]>The numbers 714, 755, and 762 are instantly recognizable to many Americans as the lifetime home run totals hit by Babe Ruth, Hank Aaron, and Barry Bonds. Point 406 or just 406 evokes the name Ted Williams, the last player to average more than four hits in every 10 atbats over a full season. Even a rather ordinary number like 56 has baseball significance to it—for Joe DiMaggio’s 1941 hitting streak, a 71year record that no one in the major leagues has ever come close to breaking. The bestselling book, 56: Joe DiMaggio and the Last Magic Number in Sports by Kostya Kennedy, provides a day by day account of Joe’s streak with the buildup to WW II in the background. In the history of baseball (from 1876 to today) only six men: three college players, 2 minor league players, and one ML player have hit safely in at least 56 consecutive games. Of course, Joe was the only ML player but he also had a 61game hitting streak in the minors. The two closest ML players to Joe’s streak were Pete Rose (1978) and Willie Keeler (1897). Both had 44game hitting streaks.
Using my probability formula , the odds of any player duplicating DiMaggio’s 56game hitting streak can be calculated. In 1941, the odds of Joe achieving his streak were 1 in 9545. In spite of batting .406 in 1941, Williams’ odds of duplicating Joe’s streak was 1 in 50,000. The main reason why Joe was 5 times more likely is their difference in walks. In 1941, Joe had 76 walks and Ted had 147 walks. Unfortunately, every walk hurts your chances for a getting a hit. Pete Rose’s odds in 1978 were 1 in 100,000 and Willie Keeler’s odds in 1897 were 1 in 40. Keeler had an AVG of .424 in 1897; whereas Rose’s AVG in 1978 was .302. The modern day player with the best odds was Ichiro Suzuki who in 2004 had a 1 in 274 chance of duplicating the streak. After the success of the movie 42 the time has come for a new movie called 56.
]]>My wakeup call at 5 AM sounded like this: “It’s great to be a soldier”, “Be sharp”. Father Costa picked Tara and I up at the hotel at 6:15 AM. Upon arriving at the classroom building, I went to the men’s room to change into my baseball uniform. I then proceeded to Father Costa’s office but made a mistake. I entered the wrong office and encountered a colonel putting on his boots. He looked at me and was shocked to see a man in an old baseball uniform at 7 AM. He later told Father Costa that he thought he had died. I said to Father Costa he should have said “I guess I was right. God does play baseball.” While in his office we took several pictures. One special picture included Father Costa and me with a fullsize cutout of Babe Ruth. Father Costa and I agree the greatest baseball player of all time was Babe Ruth. We are both huge Yankee fans. Father Costa and I then proceeded to his class which starts at 7:30. As I expected, all the cadets arrived ontime to class. One cadet then announced to Father Costa, “all the cadets are accounted for.” Father Costa then introduced me to the class.
My talk was on my published research on various batting streaks in baseball. Of course, the most famous and wellknown streak is Joe DiMaggio’s 56game hitting streak. I developed a new probability formula to predict the probability of any player duplicating this and other batting streaks. For example, Dale Long (who I talked about in a previous posting), Don Mattingly, and Ken Griffey Jr. share the record of eight consecutive games with at least one home run. At the end of class,two of the cadets presented Tara and I with gifts. Later, I sat in on Father Costa’s differential equations course.
Father Costa took Tara and I for a tour of West Point. One of the most interesting buildings was the dining hall where all the cadets eat at the same time. The dining hall seats 4200 cadets. It is enormous. There was a board listing all the activities. One activity that surprised me was ballroom dancing (which Tara and I do). We then had a super lunch at the Officers Club. At noon, Tara and I left West Point and headed back to Quinnipiac University so Tara could teach her class at 3. As we traveled up the Merritt Parkway they were cutting trees and we encountered several delays. At 2:30 we arrived at the QU campus and my 27 hours with two special people ended.
]]>
Dale long was born in 1926 and died in 1991. He played major league baseball for 10 years. He was 6’4” tall and weighed just over 200 lb. He batted left handed and threw left handed. He played a couple of games in the outfield but was predominately a first baseman. However, he played two games as a left handed catcher, one of a handful of major league players to do so. He played for the Pittsburg Pirates, Chicago Cubs, New York Yankees, and others. His grandson told me an interesting story which occurred before he signed his contract with the Yankees in 1960. At that time Casey Stengel was the manager of the Yankees. Long was told by Casey he had three jobs with the Yankees. He would substitute at first base, be used as a pinchhitter, and finally he would accompany Mickey Mantle and Whitey Ford whenever they went out at night to keep them out of trouble. From 1944 to 1954, besides a brief stint with the Pirates in 1951, Long bounced around the minor leagues. He played 131 games with Pittsburg in 1955 and his .291 batting average was the team’s second best. He tied Willie Mays with a league leading 13 triples.
Why is Dale Long mentioned in my book? Chapter 16 in my book details my research on batting streaks. In that chapter, I present a probability formula I developed which uses a player’s batting statistics to estimate his probability of duplicating any batting streak. Of course, Joe DiMaggio’s 56game hitting streak is the most notable. However, many other special batting streaks are also mentioned. One of those batting streaks deals with hitting a home run in each of eight consecutive games. Dale Long enters the discussion because he was the first player to accomplish this streak. On May 26, 1956, Dale Long tied the existing record of a home run in six consecutive games. This record was held by five other major leaguers including Willie Mays and Lou Gehrig. On May 27th, Dale Long established a new major league record by hitting a home run in his seventh consecutive game. This home run was hit in his last atbat after swinging and missing on two pitches. The home run ball was sent to Cooperstown. On May 28, Dale hit number eight off of Carl Erskine of the Dodgers. The fans did not stop cheering until Dale came out for a curtain call. This record was later tied by Don Mattingly (1987) and Ken Griffey Jr. (1993). For 1956, Dale Long played in 148 games hitting a total of 27 home runs. Using my probability formula, the probability of Dale accomplishing his streak was 0.00008. By comparison, the probability of Joe DiMaggio accomplishing his 56game hitting streak was 0.00010. Observe that Dale’s streak was less likely to occur than Joe’s streak .Finally, I wish to thank the Long Family for providing me with their memories of their father and grandfather.
Original Comments:
4 Comment(s):
Stan “The Stats Man” said…
Dear MKR: You raise a very good point. When I said Long’s streak was less likely than Joe’s streak my probability formula is based on Dale Long’s statistics versus Joe’s statistics. So what I am saying is that based on Dale’s batting statistics for 1956 the probability of him doing his streak is less than the probability of Joe achieving his streak, based on his 1941 statistics.This is explained in Chapter 16 of my book. Mattingly’s probability is .00023 and Griffey’s probability is .00214 compared to Long’s probability of .00008 and DiMaggio’s probability was .0001.I hope this answers your question. Again, thank you for asking that excellent question. December 4, 2012 03:50:52

MKR said…
Interesting that you conclude that Dale’s consecutive home run streak was actually less likely to occur than that of the great Joe DiMaggio’s hitting streak. But how do you explain the fact that this streak has actually been accomplished 3 times in the past 56 years (and twice in a 6 year span for that matter) while no one has even come close to tying Joe D.’s 56 game hitting streak in the 71 years since he accomplished it (with the closest being Pete Rose’s 44 game hitting streak in 1978)? December 4, 2012 12:45:24

Martin E. Cobern said…
As Yogi said, “It’s deja vu all over again!” Where have I heard that story? Oh yeah, at your book signing party yesterday!” Still a great story! … and a great party! December 2, 2012 01:31:23

Neal Meyer said…
Great stuff Stan…I especially like the lefthanded catcher memory. December 2, 2012 10:27:29 
]]>
I wish to now use James’ Pythagorean Theorem to look at the years 2001 and 2002 for the Oakland Athletics which were featured in the book Moneyball. At this point it might be worthwhile for the reader to review my last posting called Moneyball Revisited. For the year 2001, the exponent turned out to be 2.113 and for the year 2002 the exponent was 1.901. In 2001, Oakland’s actual record was 10260. Applying the Pythagorean Theorem to their actual runs scored of 884 and runs allowed of 645, Oakland’s expected record would have been 10755. In 2002, Oakland’s actual record was 10359. For 2002, their actual runs scored were 800 and runs allowed were 654. Again, applying the Pythagorean Theorem, we would have expected their record to be 9666. The loss of Jason Giambi and Johnny Damon for the 2002 season had a minor effect on their runs scored. The loss of Jason Isringhausen as their closer had no effect on their runs allowed. Using the results of the Pythagorean Theorem, Oakland would have finished second to Seattle in 2001 and fourth behind Anaheim, Boston, and New York in 2002. Again, using the results of the Pythagorean Theorem, the 2002 team would have finished 11 games behind the 2001 team.
Comparing their actual won and loss records for the years 2001 and 2002, the 2002 team would have finished one game ahead of the 2001 team. The 2001 Oakland team had the second best record in the AL behind the Seattle Mariners. The 2002 Oakland team had the second best record in the AL behind the New York Yankees. The sidebyside bar graph below compares the statistics batting average (BA), onbase percentage (OBP), slugging percentage (SLG), and onbase plus slugging (OPS) for 2001 and 2002. The winloss records and the graph below show that replacing Jason Giambi by Scott Hatteberg and replacing Johnny Damon by David Justice along with other changes allowed the 2002 Oakland team to be as successful as the 2001 team. Billy Beane accomplished this successful metamorphism without spending $33 million on Giambi. Damon, and Isrinhausen In fact, the Oakland payroll for 2002 was only $39 million. Billy Beane’s use of sabermetrics works and keeps on working today. Just look at Oakland’s record for 2012.
]]>
In reading this book for a second time it became clear to me that even though this book involved baseball, there are lessons to be learned that can be used in almost any business. Yes, baseball is a business. Before the 2002 season, Oakland could not afford to resign three of their best players. These include 2000 AL MVP Jason Giambi, outfielder Johnny Damon, and closer Jason Isringhausen. The three of them signed with other teams for a total of $33 million. This was $6 million less than Oakland’s entire payroll for 2002. Giambi alone would have cost the A’s over $16 million.
Because of his low payroll, Beane realized that he could not compete with teams like the Yankees and Red Sox in the traditional way of signing expensive free agents. He hired a Harvard economics major named Paul DePodesta, who used his computer to find players based on certain baseball statistics. These baseball statistics came from his reading articles written by Bill James. For example, in the 2002 amateur draft the computer flushed out Kevin Youkilis. Scouts from the other major league teams classified Youkilis as a fat third baseman who couldn’t run, throw, or field. Paul did not care about anything but Youkilis’ high onbase percentage, OBP.
How was Beane going to replace the three stars they lost? The loss of Jason Giambi was the most serious. Jason had a very high OBP. Beane focused on this particular statistic. Using Bill James’ theories he concluded a high OBP was the best way to produce runs and runs were the best way to produce wins. So he went out and signed Scott Hatteberg, a catcher for the Red Sox with a damaged throwing arm and made him into a first baseman, despite the fact that he had never played first base before. During the spring training of 2002, Ron Washington was given the task of making Hatteberg into a first baseman. What Beane loved about Hatteberg was his ability to get on base and see a lot of pitches in each of his atbats. Because he was damaged goods he was cheap. His high OBP replaced Giambi’s high OBP. Hatteberg’s salary for 2002 was $900,000. Another example of Beane’s thinking was he did not believe in paying a high price for a closer. Instead, he would find a young pitcher who had good control and the ability to throw in the low 90s. He would then for almost the minimum salary turn him into a closer.
This leads me to talk about my original premise. What lesson can we learn from Billy Beane? The answer is: Do not follow the herd. Suppose you were a small retailer and Walmart opened a store nearby. Clearly, you could not compete with Walmart on price. However, you can emphasize the statistic measured by customer service. In the same way that the statistic OBP led to runs which led to wins; customer service can lead to a happy customer followed by repeat business from that customer.
My book Sandlot Stats not only teaches the concepts of descriptive and inferential statistics but shows the mathematics Bill James used to make baseball decisions on strategy and player personal. Part 2 explores some of Bill James’ theories.
]]>