Having watched the volume of technology-forecasting literature expand over the years, our curiosity was naturally piqued when we read Barbara Mellers’ claim that companies can still make better predictions. “Companies can do better at forecasting the future by creating their own set of superforecasters,” Mellers, a Wharton School of Business marketing professor, told Knowledge@Wharton.
Mellers was being interviewed about her experience leading a research team with Philip Tetlock, also a professor at Wharton, in four years of forecasting tournaments sponsored by the U.S. Intelligence Advanced Research Projects Activity (IARPA). Learning that her team had won the competition four years straight, CIMS IMR invited her to discuss how businesses in a variety of sectors, from finance to retail to R&D, can make better forecasts. Here is her account, which appears in the January/February 2016 issue of the IMR.
The IARPA competition involved five university-industry teams competing to elicit probability judgments about geopolitical events from diverse crowds around the world and aggregate those probability judgments in as accurate a way as possible. Our team was called the Good Judgment Project. We had no silver bullets, so we ran experiments to find out what worked best. We recruited thousands of forecasters from professional societies, research centers and word of mouth and randomly assigned them to different conditions (or what economists call “treatments”).
Forecasters were given questions on a website roughly every two weeks for about nine months. Questions covered a wide range of topics, including military conflicts, international treaties, financial markets, pandemics, elections, and refugee flows. They were written clearly enough that outcomes were easy to resolve. For example, one question asked, “Will Saddam Hussein still be in power on December 31, 2014?” With questions like these, we could find out who was right and who was wrong– and by how much.
The Good Judgment Project found that three factors had an enormous effect on accuracy. The first was training. People can be taught to reason more probabilistically and make better predictions. Accuracy improved by about 10 percent relative to an unweighted average of crowd forecasts. We developed a training module that was administered once at the beginning of the tournament and took about 45 minutes. Forecasters could refer back to it if they wished. It was very surprising to us, but this relatively simple training had an effect on accuracy. Businesses must keep if they are serious about improving forecasts. That’s the only route to learning.
The second factor was teaming. We randomly assigned people to conditions in which they worked alone or in groups of about 10 to 15 team members. Teams interacted on the Internet; they were never working in the same room together. Putting people to work in collaborative environments turned out to be superior to having them work alone. We originally thought that independent forecasters would do better; independent errors would cancel out and the crowd would become wiser. Instead, teaming improved accuracy by approximately 20 percent. Allowing forecasters to share information and discuss their rationales was much more powerful than the cancellation of independent errors.
The third factor was tracking. We got incredible mileage from this intervention; accuracy improved by 30 to 40 percent. This method was the equivalent of putting the smarter kids together in the same classroom. We selected the top 2% of performers each year, placed them together in elite teams of 10 to 15 members, and gave them the title of superforecasters. They were phenomenal! They showed no regression to the mean in follow-up years; in fact, they got even better by working in such an enriched environment. Tracking turned out to be our most successful intervention. In fact, the superforecasters were more accurate than analysts who were making predictions on the same questions inside the intelligence community. The combination of these interventions, plus excellent aggregation algorithms, allowed us to win the tournaments four years in a row.
We now know much more about creating more accurate forecasts, and we believe it applies to businesses. We have better rules for combining multiple forecasts. We know more about which people are more likely to succeed. We know more about conditions that bring out the best in forecasters. Furthermore, we understand when survey formats (i.e., Gallup-like prediction polls) do better and when they do worse than prediction markets. Interaction is essential. In survey formats, team members cooperate with each other and compete against other teams. In prediction markets, competition is the driving factor. However, when we created a market with teams, predictions were more accurate than the market with individuals. We suspect the combination of cooperation and competition is an excellent environment to generate better forecasts.
The financial industry is one domain that could greatly benefit from superforecasters. Financial analysts understand the benefits of having greater precision in their predictions. If analysts think an event is 60 percent likely to occur, they might buy (or sell) fewer stocks than if they believe the event’s likelihood is 80 percent. The difference in probabilities translates directly into money. In the world of politics, the translation is not as clear. Probabilities are important, but would a change from 60 to 80 percent mean a different decision? It’s hard to say.
Retail is another area where better forecasting could be extremely helpful. Companies could target customer segments and ask them which products people will like the most. Companies can ask employees to help forecast manufacturing and distribution questions, possible delays, and eventual sales. There is a great deal of information that could be obtained from both employees and customers, and that information could be gathered with surveys or prediction markets that incentivize the right people. Forecasting is also essential in companies with intensive R&D units, such as pharmaceuticals. Experts and other knowledgeable people could forecast the success of different drugs, and those forecasts could be aggregated to make even more accurate predictions. Forecasts will be much improved by polling multiple informed opinions rather than relying on advice of one or two alleged gurus.