Resurrecting Signal/Noise


It’s been over two years since anything new has appeared in this space. And while I find myself even busier than I was at that time, I’ve decided to breathe some new life into the site.

At this point, I would be shocked if anyone is still lurking. (It’s not as if the entire world was beating a path to my virtual door two years ago.) But for those of you still willing to spend some of your valuable time reading (and for those that might stop by in the future), I’ll be using this space to post my quick thoughts and reactions to things I come across. Most will probably involve news or research.

Like when I started, this will really serve as a place for me to quickly reflect on things that interest me. Whether they interest others is another question entirely. For those who choose to follow along, I thank you and hope you get something useful for your troubles.

New Year, New Writing Gig



Just wanted to let everyone know that starting January 4th I will be writing a weekly baseball column (sometimes twice weekly if I am feeling especially opinionated) at Beyond the Box Score.

Beyond the Box Score is a fantastic site, examining baseball from an analytical perspective.  The authors definitely embrace sabermetrics, but they don’t beat readers over the head with complex statistics.  As with most things that I do, the subject of my columns will vary quite a bit.  Generally speaking I’ll likely focus on team performance, player valuation, and lots of exploratory questions about the game.  Oh, and you can be sure there will be lots of pretty visuals and laments about the NY Mets.

Be sure to stop by if you are interested.  You can read and subscribe to my entries here, but I encourage you to subscribe to the site as a whole (RSS feed here).


Visualizing Major League Baseball: 2001-2010


, ,

(This article originally appeared at Beyond the Box Score, where I am now a regular contributor)

2010 marks the end of the “ought” decade for Major League Baseball.  I thought I would take the opportunity to analyze the last 10 years by visualizing team data.  I used Tableau Public to create the visualization and pulled team data from (on-field statistics) and USA Today (team payroll).

The data is visualized through three dashboards.  The first visualizes the relationship between run differential (RunDiff) and OPS differential (OPSDiff) as well as the cost per win for teams.  The second visualization looks at expected wins and actual wins through a scatter plot.  The size of each team’s bubble represents the absolute difference between their actual and expected wins.  Teams lying above the trend line were less lucky than their counterparts below the trend line.The final tab in the visualization presents relevant data in table form and can be sorted and filtered along a number of dimensions.

The first visualization lists all 30 teams and provides their RunDiff, OPSDiff, wins, and cost per win for 2001-2010.  The default view lists the averages per team over the past 10 years, but you can select a single year or range of years to examine averages over that time frame.  The visualization also allows users to filter by whether teams made the playoffs, were division winners or wild card qualifiers, won a championship, or were in the AL or NL.  The height of the bars corresponds to a team’s wins (or average wins a range of years).  The color of the bars corresponds to a team’s cost per win–the darker green the bar the more costly a win was for a team.  Total wins (or average for a range of years) is listed at the end of each bar.  In order to create the bar graph I normalized the run and OPS differentials data (added the absolute value of each score + 20) to make sure there were no negative values.  For the decade, run differential explained about 88% of the variation in wins and OPS differential explained about 89% of the variation in run differential.

The visualization illustrates the tight correlation between RunDiff and OPSDiff, as the respective bars for each team are generally equidistant from the center line creating an inverted V shape when sorted by RunDiff.  In terms of average wins over the decade, there are few surprises as the Yankees, Red Sox, Cardinals, Angels, and Braves round out the top 5.  However, St. Louis did a much better job at winning efficiently, as they paid less per win than the other winningest teams (<$1M per win).


(click for larger image)

The viz also illustrates the success of small market teams such as Oakland and Minnesota who both averaged roughly 88 wins while spending the 3rd and 4th least respectively per win.  If you filter the visualization for teams that averaged over 85 wins during the decade, it really drives home how impressive those two teams’ front offices have been at assembling winning ball clubs with lower payrolls.  No other team that averaged >85 wins paid less than $975K per win.  Oakland looks even more impressive when you isolate the data for years that teams qualified for the playoffs.  Oakland averaged 98.5 wins during seasons they made it to playoffs, and did so spending only $478K per win. Continue reading

The Accomplishments of Bob Feller


I am sure many people will be writing and speaking about Bob Feller this morning, as the baseball hall of famer passed away last night at the age of 92.  (Here is some great old black and white footage of Feller).  Feller was blessed with arguably the greatest fastball in major league history, breaking into the big leagues as a 17-year-old phenom with the Cleveland Indians.  In his first start (7th appearance overall) he struck out 15 batters in a complete game, 6-hitter.  A 17-year-old striking out 15 men–not just men, but major league hitters (granted, the Browns weren’t that good in 1936, but they had four players with an OPS over .800 in the lineup that day).  Think about that for a moment.  He also lost roughly four seasons during his prime (age 23-25) fighting in World War II (he was the first major leaguer to enlist after Pearl Harbor).

While Feller was a Hall of Fame player his performance relative to other greats can be debated.  Certainly, Feller had impressive traditional statistics.  He averaged close to 15 wins a season over 18 years and had a .621 winning percentage.  Had he not lost those prime years to the war he very well could have amassed between 340 and 350 wins for his career.  He also averaged 143 strikeouts per season, finishing with almost 2600 for his career, leading the league seven times and striking out an amazing 348 batters in 1946.  However, he also had the 5th most walks allowed in history (1764), walking an astounding 208 batters in 1938.  His penchant for walks earned him a career WHIP (Walks + Hits per Inning Pitched) of 1.32, good for 527th all time, easily one of the worst for a Hall of Fame pitcher.  Despite having a remarkably powerful arm, he finished his career with a strikeout-to-walk ratio of only 1.46, 650th all time, again one of the worst for a Hall of Famer.

But despite a high WHIP and less than stellar K-to-BB ratio, Feller managed to accumulate 66 Wins-Above-Replacement (WAR) over his career, good for 31st all time.

Feller’s accomplishments, however, cannot be summed up by any statistical analysis of his performance on the diamond (traditional, sabermetric, or otherwise).  Feller was an innovator, a game-changer in the business of baseball. Continue Reading

Do Hedge Funds Create Criminals?



Lynn Stout, a law professor at UCLA, says yes:

Why does a large slice of the hedge fund industry seem to have succumbed to illegal behavior?

I would argue that it’s not so much about misaligned incentives, as we might guess from standard economic theory, but rather because, from a behavioral perspective, hedge funds are “criminogenic” environments that can turn even ethical people into conscienceless sociopaths.

What does this environment look like?  Stout highlights three environmental features of Hedge Funds:

  1. Authority Doesn’t Care About Ethics
  2. Perception that Other Traders Aren’t Acting Ethically
  3. Perception that Unethical Behavior Isn’t Harmful

To some extent I can buy all three of these as creating criminogenic environments, but what I fail to see is how any of these three apply disproportionately to Hedge Funds versus other investment houses.  The emphasis placed on profit and returns; the perception that other traders are acting unethically–beating the market through aggressive information gathering; the distance between the traders and the investors harmed by insider trading.  Stout fails to demonstrate that these things take place disproportionately in the Hedge Fund world versus the trader community at large.  If there is little variation between Hedge Funds and other investor firms on these three variables something else must be responsible for the rash of illegal behavior.  It’s also possible that the clustering of Hedge Fund troubles could be random, or a function of the Department of Justice’s selection of cases to pursue.  It isn’t that other dogs didn’t bark–we just haven’t heard them yet.

Mimicking Predators


, ,

A while back I stumbled on the video below about Thaumoctopus mimicus, or the Mimic Octopus.  Discovered in 1998, the Mimic Octopus is unique in that it doesn’t simply manipulate its physical features to blend in to its surroundings in order to escape predators.  Instead, the Mimic Octopus manipulates it’s physical appearance in order to look like its predators’ predators.

We tend to think of physical appearance as a reliable signal, particularly in the animal world.  Humans can manipulate their physical appearance quite readily, either through cosmetics or surgery.  But animals are generally more restricted.  Some have evolved with physical markers that are difficult to manipulate.  However, I am not aware of many animals that can mimic the physical appearance of their predators’ predators.  It’s amazing to think that the octopus not only has this incredible physical power, but the mental ability to think strategically–to match their physical appearance with a specific predator based on what adversary they are dealing with.

Fascinating stuff.

Ignorance = Innovation?



Bob Sutton says the answer can be yes:

[…] radical innovations do often come from people who don’t know what has been or can’t be done. I once had a student who worked as an earlier employee at Invisalign (those clear braces that replace the ugly wire things), and he told me that none of the members of the original design team had any background in traditional braces or dentistry.

He goes on to mention specific benefits of ignorance, particularly when you are dealing with a well-worn domain of knowledge.

I am generally sympathetic to this argument, given the importance of “social bumping” (the unintentional exposure to diverse ideas and perspectives) to problem solving and creativity.  Think about the radical innovation in the music and mobile communications industries brought about by Apple.  Radical change did not come about by sticking a bunch of industry veterans in a room and asking them to rethink the very foundation of their business.  It came because smart, talented people on the outside reconceptualized those industries.

I especially like Sutton’s suggestion that companies think of problems in terms of their general type instead of the specific industry they are in.  So rather than assemble a team of seasoned experts in retail apparel to solve the problem of declining market share in the face of lower priced competitors, you would assemble individuals who have grappled with the general problem of lower priced competition in multiple industries and domains.  The idea here is that other approaches may have been successful outside of the retail industry that are nevertheless applicable.  Since these solutions come from outside of retail they could represent a “radical innovation” once imported, giving a company a significant advantage (at least in the short term).

This isn’t to say that domain expertise is worthless or counterproductive.  I think the distinction can be made in terms of incremental change and radical change (which Sutton makes in his Weird Ideas That Work)–similar to Kuhn’s distinction between normal science and revolutionary science.  In the incremental area, domain expertise is quite helpful and the distribution of domain expertise to ignorance should be weighted towards the former.  When it comes to radical change (or what Sutton might term radical innovation), however, that distribution needs to shift to at least 50/50, if not skew more heavily towards ignorance and outsiders.  Often times companies manage this by creating separate work streams for normal and innovative operations, with Research & Development fitting into the latter area.  The trick is to not wall-off normal and innovative folks.  Complete separation means you will miss opportunities to mix the two knowledge bases together.

Right-sizing the Use of Data



John Kotter over at HBR argues that we should use less data and evidence in our presentations and Q&A:

[M]ost people respond to a critical question by arguing against the reasoning of whoever asked the question. They offer all of the evidence they can think of, hoping to make their case overwhelming. They shoot at an attack sixteen times with bullets of data to make sure it is dead. But in so doing, they are arguing not on their own but on the naysayer’s territory, opening themselves up to counter-attacks with each piece of evidence they dispense — and simultaneously putting other listeners to sleep!

I have seen far more success when people offer a quick, direct, common sense answer that shows respect for the naysayer but moves the discussion along. It is important to strike a balance between addressing a naysayer’s concern and keeping each question-and-answer brief in order to hold your audience’s full attention. To use economics terms, there are diminishing marginal returns to data-dumping in your answers. Great leaders throughout history, from Gandhi to Sam Walton, have always employed this principle to maximum effect. They knew the power of clarity and simplicity. And they found that using it allowed them to connect with more people and win more hearts and minds.

The next time you present an idea on an important new marketing campaign, for example, and someone rebuts it by citing five previous times that your company tried a new marketing campaign and it was unsuccessful, you have two options. You could go through each of the five examples, explain their flaws in detail, and demonstrate how each of those flaws does not apply to your idea. Or you could say, “There are always examples of failed attempts to do anything of real importance, and we did indeed learn from the experiences you cite. But we cannot allow these past failures to keep us from adapting to a changing world or else we would never move forward on anything.”

I am sympathetic to the notion of using less data in presentations as well as being less verbose, but Kotter seems to be conflating the two.  Short, concise statements that lack adequate evidence and data are just as likely to get shot down as long, laboring statements that include reams of data.

Kotter is right that data dumps are a bad thing, but his example has it’s risks.  If your audience includes folks with great BS detectors they won’t let you get away with that statement.  Sure, it’s short.  But it lacks any rationale for why the campaign being pitched shouldn’t be judged by those previous failures.  How hard would it be to briefly state that the factors that led to the previous failures do not apply to the current case?  You don’t need to get into the weeds on each factor, but you have to give people a reason to buy in to your current proposal other than “we need to keep trying new things and taking risks”.  Statements devoid of evidence are just as useless as data without purpose or context.

For me, it’s about right-sizing and being selective with the data you use, not banishing data and evidence in favor of simple statements or platitudes out a fear of alienating your audience.

Most Viewed Posts: November 2010

Here are the posts that garnered the most views during November.

Remember, you can follow Signal/Noise by RSS feedemail, or by liking the the Facebook page.

As always, thanks for reading!

  1. “Statistics is the New Grammar”
  2. Counter-signaling in the Luxury Brand Market: Snookie edition
  3. Book Review: The Bottom Billion
  4. Open-ended vs. Scale Questions: A note on survey methodology
  5. In Praise of Falsification
  6. Has revenue sharing impacted the competitive balance in Major League Baseball?
  7. Evaluating Human Capital Investments Through the Prism of Baseball
  8. Leveraging Social Networks in the Workplace
  9. Structural explanations are not always sexy or gratifying, but they typically explain a lot
  10. Book Review: Codes of the Underworld

Book Review: The Numbers Game


, , , ,

Alan Schwarz’s The Numbers Game is an indispensable look at how the numbers that have come to define the game of baseball came to be.  The book is less about the hallowed numbers that even casual fans can identify; Aaron’s 755 home runs, DiMaggio’s 56 game hit-streak, Nolan Ryan’s 5714 strikeouts, Cy Young’s 511 wins, Pete Rose’s 4256 hits, Rickey Henderson’s 1406 stolen bases, etc.  Instead, Schwarz looks back over time to reconstruct how specific statistics were created and how those statistics were subsequently accepted as the definitive measurements of player performance.  The book will definitely appeal to diehard fans of baseball and those that love to analyze the game. However, much like its contemporary, Michael Lewis’ Moneyball, Schwarz’s book provides insights into the management and analysis of any organization.

Schwarz traces the history of baseball’s obsession with statistics to Henry Chadwick, a journalist and baseball writer widely acknowledged as the grandfather of baseball statistics.  Chadwick’s work in the mid- to late-19th century laid the foundation for much of the statistical framework through which we appreciate the game today.  Chadwick was adamant that the new game of baseball required a fair accounting of player performance:

In order to obtain an accurate estimate of a player’s skill, an analysis, both of his play at bat and in the field, should be made, inclusive of the way in which he was put out; and that this may be done, it is requisite that all…contests should be recorded in a uniform manner.

Anyone who has paid even scant attention to the debate regarding traditional and sabermetric approaches to the game of baseball will recognize Chadwick’s logic–not only do position players contribute to their team’s success by creating runs through their offense; they can also prevent runs by the other team through their defense.  Traditionally, players came to be valued and compensated based largely on their offensive production.  When it came time to arbitrate salaries or negotiate free agent contracts, offensive statistics carried the most weight.  (Whether or not the offensive statistics being used were the most accurate is a much larger debate, and Schwarz gives ample space to this history as well).

As the analysis of baseball became more sophisticated, analysts were finally able to measure a player’s total value by incorporating runs produced through offense as well as those saved by defense.  Rather than relying on the traditional fielding percentage (which simply measured the number of chances a fielder converted to an out), more sophisticated measures allowed for talent evaluators to look at how many runs a player saved in the field. Power hitting shortstops surely contributed to their team’s success by creating runs, but their light hitting counterparts could conceivable contribute just as much by saving runs.

Chadwick’s quote highlights two critical issues for any organization, and it is a theme that runs through The Numbers Game; ensuring that the metrics you rely on account for all of the ways a person contributes to success and that the data used to calculate those metrics is collected in a consistent, uniform manner. Continue reading