FORECASTING FUTURE INNOVATION PATHWAYS WITH BIG DATA ANALYTICS
Tech Mining of R&D literature, patent and business intelligence (1) can pay off in anticipating future pathways for tech innovation in order to make better business decisions, wrote Alan L. Porter and Nils C. Newman in the Spring 201l CIMS Technology Management Report (2). After relating several business decision success stories, they announced initiation of research at Georgia Tech to develop the data analytics for exploiting the potential of Big Data to forecast innovation pathways. The article below highlights progress to date in this “Forecast Innovation Pathways” (FIP) project.
Why address Big Data? In increasingly complex socio-technical-economic environments, access to vast amounts of data can help organizations make better policies, predictions and decisions (3). Across diverse fields, enhanced data compilation, processing, access, analytics, and visualization are opening vast opportunities. Understanding prospective Big Data developmental paths can help private and public organizations realize those opportunities.
The Forecasting Innovation Pathways (FIP) approach (4) seeks to anticipate potential applications of an emerging technology by combining data mining with the opinions and ideas of experts and stakeholders in the related fields. The approach entails four stages:
- Understand the emerging technology and its technology delivery system for pursuing applications.
- Perform tech mining to analyze R&D activities and associated contextual information (e.g., commercial activity), identify key actors and topical thrusts, and anticipate potential applications.
- Forecast likely innovation paths with expert/stakeholder assistance, identify leverage points to promote innovation and potential impact of new applications.
- Synthesize and report to technology management and/or policy makers.
Ref. 5 elaborates on the FIP process for the more complicated case of Hybrid & Electric Vehicles.
The Case of Big Data
With National Science Foundation support, our Georgia Tech group set out to advance FIP by working through Big Data development prospects. Getting underway in Spring, 2015, we sought to complement the U.S. Government Accountability Office (GAO) technology assessment of “21st Century Data.” The intent was to apply FIP to generating policy-relevant business and technology intelligence.
Stage 1—To understand the technology (Big Data) we reviewed 249 reviews identified in our Web of Science search and forecasts (6,7). We used our collegial networks to help us identify 18 Big Data innovation target applications (see Table 1) and, in Stage 2, to review our empirical analyses.
Table 1— Big Data Target Applications
Mining e-Medical Records (“EHR” Health Records)
Guiding Financial Markets
Precise Agricultural Field Micromanagement
Sharing Police Data
IoT -- Internet of Things
Big Brother’s Video Family
In preparation for a September, 2015 conference, we devised a general Technology Delivery System (TDS) model for Big Data innovation. During the conference workshops, we used the TDS to identify key players, issues and other important contextual forces for two of our 18 target applications: mining electronic health records (EHR) and “cloud manufacturing.” We were able to identify a rich set of important actors and interests.
For cloud manufacturing, we examined the complete supply chain, all the way to consumers. For EHR, we identified major stakeholders as: doctors, medical associations, industry associations, patients, hospitals, insurance companies, nurses, regulators; and “Boundary personnel” who play key roles potentially, e.g., IT consultants, patient advocates; also, employers, Medicare, CDC, medical boards (standards), and pharmacies.
Stage 2—We tech-mined databases covering various aspects of Big Data development. These include: R&D publications (Web of Science, INSPEC), R&D funding (National Science Foundation awards), patents (Derwent Innovation Index), and commercial activities (ABI Inform).
Attention to Big Data has exploded in multiple domains since a Nature special issue in 2008. Unlike most emerging technologies, hyper-exponential growth rates appear almost concurrently in R&D, business and general-interest compilations. We devised an intricate search algorithm (8) and applied variants to retrieve abstracts from each of these databases. Papers on Big Data indexed by Web of Science (WoS) made an astounding increase from 29 in 2011 to 1,544 published in 2014 (6).
The illustration on the next page overlays the Big Data papers indexed in WoS on a base map locating some 224 WoS categories as nodes based on one year of WoS publications. More related categories (based on cross-citations in the respective papers) appear close together.
Not surprisingly, Big Data research concentrates in the Computer Science neighborhood. But what is amazing is the breadth of papers addressing Big Data in other fields across this map of science. Social scientists and engineers, management and environmental science scholars, biomedical and physical scientists are all pursuing Big Data! Researchers are excited about exploring how to advance their research domains through use of Big Data and targeted analyses thereof.
Another insight from the WoS records is that two countries dominate Big Data-related research: the U.S. and China. Of the 7,186 papers retrieved, 32% have a U.S.-based author and 25% have at least one Chinese author; Germany and the UK trail at 5%. If one were seeking particular Big Data expertise to tap, 14 of the leading 30 research-publishing organizations are American and 14 are Chinese (6).
It is instructive to compare the Big Data R&D funding by those two leading countries (9). Both have ramped up support sharply. Table 2 illustrates this in terms of the number of projects funded. In fiscal terms, the U.S. NSF allocated some $2.95 million for 2009-11, rising to $374.66 million for 2012-2015; China’s NSFC allocated $0.12 million, rising to $66.83 million. Clearly, both China and the U.S. have made Big Data a research priority.
Table 2— Number of Big Data Research Projects Funded by NSF and NSFC
Table 3 tallies 2013—2014 patenting as found by a Big Data search. The computing focus is strong. Patenting is not overly concentrated in a few companies. The right-hand column lists leading corporate patent assignees by topic.
Table 3— Big Data Patenting Concentration
ABI Inform consolidates information from trade journals (47% of our Big Data set) and wire feeds (42%), enriched by content from newspapers (6%), magazines (4%) and reports (1%), thereby offering a window on commercial interests. Our Big Data search yielded 9,696 records as of Spring, 2015. These show enormous growth, from 52 records in 2010 to 3,791 in 2014. IBM, Microsoft, Google, and Facebook are named most.
We went deeper to analyze Big Data topical themes. Using VantagePoint desktop software (www.theVantagePoint.com), we extracted noun phrases from WoS titles and abstracts; cleaned and consolidated those; then clustered the top 60 phrases. These grouped into six main themes: social media, healthcare, business intelligence, cloud-based services, web services, and customer relationship related applications (10).
Stage 3—Building on the tech mining data, we are currently striving to “Forecast Innovation Pathways.” This stage integrates empirical and qualitative information. As noted, our September, 2015 workshop modeled Technology Delivery Systems to affect two of our target applications, considering 10 policy-relevant factors: description, key actors, Big Data roles & issues, benefits, standards & regulations, privacy, security, external forces, impacts, and potential policies to advance development. Our process is continuing to span the 18 target “apps,” seeking to identify potentially effective policy actions the U.S. Government might consider.
Applying the Emergence Indicator
We conclude by sharing our “emergence indicator.” The aim is to identify key players operating at the frontier of an emerging technology. Our indicator is generated using a VantagePoint macro. The macro calculates the occurrence of topical terms over time, seeking steep growth. These could have been garnered from any one or a combination of the database search sets.
To illustrate, we consider the research activity from Web of Science. The macro first seeks terms whose prevalence has accelerated since a base period (allowing the user control over several parameters). In this case, it nominated 72 “hot topics” in Big Data research; for example, “Big Data analytics” and “extreme learning machine” as well as applications (e.g., “health care”) and issues (e.g., “proliferation”).
The macro goes on to identify which organizations (or authors or countries) have been addressing those emerging topics most strongly; that is its real focus. Consequently, a company seeking to tap such research frontiers might seek out researchers at these top organizations: Tsinghua University, MIT, Harvard, Stanford, University of Washington, Microsoft, UCLA, UCSD, or USC.
Stage 4—We have prepared a variety of articles and other reports (6, 8, 9, 10) as well as enriched our matrix of 10 policy-oriented factors X 18 (or so) applications. Two issues—privacy and security—are pervasive in analyzing Big Data applications. We are familiar with government efforts to protect our information in “Electronic Health Records” but need to figure out how to secure companies’ Intellectual Property as information is shared openly along key supply chains to ensure interoperability for cloud manufacturing.
Unintended, indirect and delayed data uses present additional challenges. Imagine the potential good in tracing the movements of a terrorist suspect via visual recognition from public camera images compiled at street intersections, hotel lobbies, etc. when combined with other stored data on plane and train ticketing, credit card purchases, and the like. The combination of such discrete chunks of data becomes a potent source of intelligence—albeit a dramatic affront to our privacy!
- Porter, A.L. and Cunningham, S.W. Tech Mining: Exploiting New Technologies for Competitive Advantage, Wiley, New York, 2005; Chinese edition, Tsinghua University Press, 2012.
- Porter, A.L. and Newman, N.C. “Tech Mining Success Stories,” CIMS Technology Management Report, NC State Center for Innovation Management Studies Spring, 2011, pp. 17-19.
- Hogarth, R. M. and Soyer, E. “Using simulated experience to make sense of big data,” MIT Sloan Management Review 56(2), 2015, pp. 49ff.
- Robinson, D.K.R., Huang, L., Guo, Y., and Porter, A.L. “Forecasting Innovation Pathways for New and Emerging Science & Technologies,” Technological Forecasting & Social Change 80 (2), 2013, pp. 267-285.
- Porter, A. L., Cunningham, S. W. and Sanz, A. “Advancing the forecasting innovation pathways approach: hybrid and electric vehicles case,” Int. J. Technology Management 69(3-4), 2015, pp.275-300.
- Porter, A.L., Huang, Y., Schuehle, J., and Youtie, J. “MetaData: BigData research evolving across disciplines, players, and topics,” IEEE BigData Congress, 2015.
- Manyika, J., M. et al. “Big data: The next frontier for innovation, competition, and productivity,” McKinsey Global Institute, 2011.
- Huang, Y., Schuehle, J., Porter, A. L., and Youtie, J. “A systematic method to create search strategies for emerging technologies based on the Web of Science: illustrated for ‘Big Data’,” Scientometrics 105(3), 2015, pp. 2005-2022.
- Huang, Y., Zhang, Y, Youtie, J., Porter, A.L., and Wang, X. “How does national scientific funding support emerging interdisciplinary research: A comparison study of Big Data research in the US and China,” PLOS ONE, working paper.
- Huang, Y., Youtie, J., Porter, A.L., and Robinson, D.K.R. “Big Data and business: Tech mining to capture business interests and activities around big data,” working paper.
Alan Porter is Professor Emeritus, Industrial & Systems Engineering and Public Policy, and Co-Director, Technology Policy & Assessment Center, Georgia Tech; and Director of R&D at Search Technology, Inc. mailto:email@example.com, mailto:firstname.lastname@example.org
Ying Huang is a PhD student at Beijing Institute of Technology, China; mailto:email@example.com