Businesses Need 'Superforecasters,' Not Crystal Balls

Having watched the volume of technology-forecasting literature expand over the  years, our curiosity was naturally piqued when we read Barbara Mellers’ claim that companies can still make better predictions. “Companies can do better at forecasting the future by creating their own set of superforecasters,” Mellers, a Wharton School of Business marketing professor, told Knowledge@Wharton.

Mellers was being interviewed about her experience leading a research team with Philip Tetlock, also a professor at Wharton, in four years of forecasting tournaments sponsored by the U.S. Intelligence Advanced Research Projects Activity (IARPA). Learning that her team had won the competition four years straight, CIMS IMR invited her to discuss how businesses in a variety of sectors, from finance to retail to R&D, can make better forecasts. Here is her account, which appears in the January/February 2016 issue of the IMR.

The IARPA competition involved five university-industry teams competing to elicit probability judgments about geopolitical events from diverse crowds around the world and aggregate those probability judgments in as accurate a way as possible. Our team was called the Good Judgment Project. We had no silver bullets, so we ran experiments to find out what worked best. We recruited thousands of forecasters from professional societies, research centers and word of mouth and randomly assigned them to different conditions (or what economists call “treatments”).

The author with fellow Wharton professor Phillip Tetlock. Photo courtesy University of Pennsylvania

The author with fellow Wharton professor Philip Tetlock. Photo courtesy University of Pennsylvania

Forecasters were given questions on a website roughly every two weeks for about nine months. Questions covered a wide range of topics, including military conflicts, international treaties, financial markets, pandemics, elections, and refugee flows. They were written clearly enough that outcomes were easy to resolve. For example, one question asked, “Will Saddam Hussein still be in power on December 31, 2014?” With questions like these, we could find out who was right and who was wrong– and by how much.

The Good Judgment Project found that three factors had an enormous effect on accuracy. The first was training. People can be taught to reason more probabilistically and make better predictions. Accuracy improved by about 10 percent relative to an unweighted average of crowd forecasts. We developed a training module that was administered once at the beginning of the tournament and took about 45 minutes. Forecasters could refer back to it if they wished. It was very surprising to us, but this relatively simple training had an effect on accuracy. Businesses must keep if they are serious about improving forecasts. That’s the only route to learning.

The second factor was teaming. We randomly assigned people to conditions in which they worked alone or in groups of about 10 to 15 team members. Teams interacted on the Internet; they were never working in the same room together. Putting people to work in collaborative environments turned out to be superior to having them work alone. We originally thought that independent forecasters would do better; independent errors would cancel out and the crowd would become wiser. Instead, teaming improved accuracy by approximately 20 percent. Allowing forecasters to share information and discuss their rationales was much more powerful than the cancellation of independent errors.

The third factor was tracking. We got incredible mileage from this intervention; accuracy improved by 30 to 40 percent. This method was the equivalent of putting the smarter kids together in the same classroom. We selected the top 2% of performers each year, placed them together in elite teams of 10 to 15 members, and gave them the title of superforecasters. They were phenomenal! They showed no regression to the mean in follow-up years; in fact, they got even better by working in such an enriched environment. Tracking turned out to be our most successful intervention. In fact, the superforecasters were more accurate than analysts who were making predictions on the same questions inside the intelligence community. The combination of these interventions, plus excellent aggregation algorithms, allowed us to win the tournaments four years in a row.

We now know much more about creating more accurate forecasts, and we believe it applies to businesses. We have better rules for combining multiple forecasts. We know more about which people are more likely to succeed. We know more about conditions that bring out the best in forecasters. Furthermore, we understand when survey formats (i.e., Gallup-like prediction polls) do better and when they do worse than prediction markets. Interaction is essential. In survey formats, team members cooperate with each other and compete against other teams. In prediction markets, competition is the driving factor. However, when we created a market with teams, predictions were more accurate than the market with individuals. We suspect the combination of cooperation and competition is an excellent environment to generate better forecasts.

The financial industry is one domain that could greatly benefit from superforecasters. Financial analysts understand the benefits of having greater precision in their predictions. If analysts think an event is 60 percent likely to occur, they might buy (or sell) fewer stocks than if they believe the event’s likelihood is 80 percent. The difference in probabilities translates directly into money. In the world of politics, the translation is not as clear. Probabilities are important, but would a change from 60 to 80 percent mean a different decision? It’s hard to say.

Retail is another area where better forecasting could be extremely helpful. Companies could target customer segments and ask them which products people will like the most. Companies can ask employees to help forecast manufacturing and distribution questions, possible delays, and eventual sales. There is a great deal of information that could be obtained from both employees and customers, and that information could be gathered with surveys or prediction markets that incentivize the right people. Forecasting is also essential in companies with intensive R&D units, such as pharmaceuticals. Experts and other knowledgeable people could forecast the success of different drugs, and those forecasts could be aggregated to make even more accurate predictions. Forecasts will be much improved by polling multiple informed opinions rather than relying on advice of one or two alleged gurus.

Overwhelming evidence for at least 50 years of research shows that when intuitive predictions (human judgments) are compared to statistical predictions based on simple algorithms with identical information, there is a clear winner. The predictions of simple algorithms outperform people virtually every time. This claim applies to many domains. Who should be let out on parole? Who should be hired for the job? Who should be accepted to graduate school? Who is likely to become violent in high schools? What is the survival time for a leukemia patient? These are tough questions because the world is complex. Even so, intuitive predictions are reliably worse than simple algorithms.

When it comes to geopolitical events, the types of questions we presented to our forecasters were far too varied and complex for us to build simple effective models. That is why we use human forecasts. And if we use humans, we need behavioral interventions (e.g., training, teaming and tracking) to make them better, as well as effective statistical aggregation rules. It is the combination of psychology and statistics that seems to be the most effective. In the future, it is likely that human predictions and statistical predictions will be used together. We need to better understand what people do well and what role they should play. We also need to understand what role computers should play. We aren’t there yet, but we will be. It will take many efforts involving trials and errors in which we keep careful score.

Our best participants—superforecasters—turned out to be people with open-minded cognitive styles and a strong desire to take on challenges and solve problems. They were also less likely to fall prey to the confirmation bias; that is, they were more willing to seek out information that was contrary to their favored beliefs. Superforecasters also tended to think about probabilities in a more nuanced or granular way. They used many more numbers to express their uncertainties. They were, of course, intelligent. They were extremely knowledgeable about political and world events. They were mostly men, but there were some women in the mix. And finally, they were very ambitious; they wanted to win. You can learn more about these fascinating people in a new book by Philip Tetlock and Dan Gardner called Superforecasting: The Art and Science of Prediction. Their book tells the story of how these talented volunteers learned to be extraordinary.

The most important message for businesses is that if they are serious about improving forecasts, they have to keep score. Questions must be clearly resolvable without arguments about who is right or wrong. Then they need to experiment with what works best and who is capable of putting the right skills to work. That is the only route to learning; unfortunately, there is no royal road to better forecasts.–Barbara Mellers, I. George Heyman University Professor, The Wharton School, University of Pennsylvania

 Want more articles like this one delivered to your inbox every other month? Sign up for the IMR newsletter.

Comments are closed.