Forecasting Future Innovation Pathways with Big Data
Alan Porter of Georgia Tech has for years believed that a form of Big Data analysis called tech mining–when applied to R&D literature, patent and business intelligence–can pay off in anticipating future pathways for innovation in order to make better decisions. Porter is a Professor Emeritus of Industrial & Systems Engineering and Public Policy at Georgia Tech, where he continues to co-direct the Technology Policy & Assessment Center. In an article in the July/August issue of CIMS’ newsletter, the Innovation Management Report (IMR), Porter and co-author Ying Huang, a Ph. D. candidate at Beijing Institute of Technology, describe their progress to date. Here is an excerpt:
Why address Big Data? In increasingly complex socio-technical-economic environments, access to vast amounts of data can help organizations make better policies, predictions and decisions. Across diverse fields, enhanced data compilation, processing, access, analytics, and visualization are opening vast opportunities. Understanding prospective Big Data developmental paths can help private and public organizations realize those opportunities.
The Forecasting Innovation Pathways (FIP) approach seeks to anticipate potential applications of an emerging technology by combining data mining with the opinions and ideas of experts and stakeholders in the related fields. The approach entails four stages:
- Understand the emerging technology and its technology delivery system for pursuing applications.
- Perform tech mining to analyze R&D activities and associated contextual information (e.g., commercial activity), identify key actors and topical interests, and anticipate potential applications.
- Forecast likely innovation paths with expert/stakeholder assistance, identifying leverage points to promote innovation and potential impact of new applications.
- Synthesize and report to technology management and/or policy makers.
With National Science Foundation support, our Georgia Tech group set out to advance FIP by working through Big Data development prospects. Getting underway in Spring 2015, we sought to complement the U.S. Government Accountability Office (GAO) technology assessment of “21st Century Data.” The intent was to apply FIP to generating policy-relevant business and technology intelligence.
To understand the technology (Big Data), we analyzed 249 reviews identified in our Web of Science search and forecasts. We used our collegial networks to help us identify 18 Big Data innovation target applications and, in Stage 2, to review our empirical analyses.
These are the target applications:
- Cloud manufacturing
- Mining e-medical records (“EHR” health records)
- Autonomous transport
- Guiding financial markets
- Precise agricultural field micromanagement
- Sharing police data
- Energy management
- Environmental monitoring
- IoT — Internet of Things
- Smart city
- Real estate
- Government oversight
- Synthetic biology
- Big Brother’s Video Family
To prepare for a September 2015 conference, we devised a general Technology Delivery System (TDS) model for Big Data innovation. During the conference workshops, we used the TDS to identify key players, issues and other important contextual forces for two of our 18 target applications: mining electronic health records (EHR) and cloud manufacturing. We were able to identify a rich set of important actors and interests.
For cloud manufacturing, we examined the supply chain all the way to consumers. For EHR, we identified major stakeholders: doctors, medical associations, industry associations, patients, hospitals, insurance companies, nurses, regulators. We also identified “boundary personnel” who potentially play key roles, including IT consultants, patient advocates, employers, Medicare, the Centers for Disease Control, medical boards, and pharmacies.
In Stage 2, we tech-mined databases covering various aspects of Big Data development. These included R&D publications (Web of Science, INSPEC), R&D funding (National Science Foundation awards), patents (Derwent Innovation Index), and commercial activities (ABI Inform).
Not surprisingly, Big Data research concentrates in the computer science neighborhood. But what is amazing is the breadth of papers addressing Big Data in other fields. Social scientists and engineers, management and environmental science scholars, biomedical and physical scientists are all pursuing Big Data.
Another insight from the WoS records is that two countries dominate Big Data-related research: the U.S. and China. Of the 7,186 papers we retrieved, 32 percent have a U.S.-based author and 25 percent have at least one Chinese author; Germany and the U.K. each accounted for five percent of the research. If one were seeking particular Big Data expertise to tap, 14 of the leading 30 research-publishing organizations are American and 14 are Chinese.
It is instructive to compare the Big Data R&D funding by those two leading countries. Both have ramped up support sharply. The NSF allocated some $2.95 million to such projects for 2009-11, increasing to $374.66 million for 2012-2015. During the same periods, China’s NSFC allocated $0.12 million initially, increasing funding to $66.83 million. Clearly, both China and the U.S. have made Big Data a research priority.
ABI Inform consolidates information from trade journals (47 percent of our Big Data set) and wire feeds (42 percent), enriched by content from newspapers (6 percent), magazines (4 percent) and reports (1 percent), thereby offering a window on commercial interests. Our Big Data search yielded 9,696 records as of Spring 2015. This show enormous growth—we found just 52 records in 2010 IBM, Microsoft, Google, and Facebook are the companies named most frequently in these records.
We went deeper to analyze Big Data topical themes. Using VantagePoint desktop software, we extracted noun phrases from WoS titles and abstracts; cleaned and consolidated those; then clustered the top 60 phrases. We grouped these into six main themes: social media, healthcare, business intelligence, cloud-based services, web services, and customer relationship related applications.
Building on the tech mining data, we are currently striving to forecast innovation pathways. This stage integrates empirical and qualitative information. As we mentioned, our September 2015 workshop modeled Technology Delivery Systems to affect two of our target applications, considering 10 policy-relevant factors: description, key actors, Big Data roles & issues, benefits, standards & regulations, privacy, security, external forces, impacts, and potential policies to advance development. Our process continues to span the 18 target “apps,” seeking to identify potentially effective policy actions federal officials might consider.
Applying the Emergence Indicator
We’d like to share our “emergence indicator.” Its aim is to identify key players operating at the frontier of an emerging technology. Our indicator is generated using a VantagePoint macro. The macro calculates the occurrence of topical terms over time, seeking steep growth. These terms may be garnered from any one or a combination of the database search sets.
To illustrate, we considered research activity from Web of Science. The macro first seeks terms whose prevalence has accelerated since a base period (allowing the user control over several parameters). In this case, it nominated 72 “hot topics” in Big Data research; these included phrases such as “Big Data analytics” and “extreme learning machine” as well as applications ( “health care”) and issues ( “proliferation”).
The macro goes on to identify which organizations (or authors or countries) have been addressing those emerging topics most strongly; that is its real focus. Consequently, a company seeking to tap such research frontiers might seek out researchers at these top organizations: Tsinghua University, MIT, Harvard, Stanford, University of Washington, Microsoft, or UCLA.
Two issues—privacy and security—are front and central when analyzing Big Data applications. We are familiar with government efforts to protect our information in electronic health records, but need to figure out how to secure companies’ intellectual property when information is shared openly along key supply chains to ensure interoperability for cloud manufacturing.
Unintended, indirect and delayed data uses present additional challenges. Imagine the potential good in tracing the movements of a terrorist suspect via visual recognition from public camera images compiled at street intersections, hotel lobbies, and the like when combined with other stored data on plane and train ticketing, and credit card purchases. The combination of such discrete chunks of data becomes a potent source of intelligence—albeit a dramatic affront to our privacy.—Alan L. Porter and Ying Huang