Posing the right questions is central to the pursuit of successful R&D. So too must organizations hoping to capitalize on the promise of Big Data Analytics recognize that it’s NOT about the technology; it’s about the question. That’s evident in a pilot program Air Products and Chemicals has conducted with North Carolina State University and CIMS to determine the value of using Big Data and Unstructured Text Analytics (UTA) for targeted market research applications. Here’s what the project team reports they learned:
Our study focused on demonstrating the feasibility of utilizing the CIMS Big Data Analytics Platform (BDAP) developed with IBM to collect terabytes of data from any public web source, apply Natural Language Processing (NLP) models as data filters, and present relevant information to the research staff and marketing managers at Air Products and Chemicals Inc.
The specific goal was to implement a proof-of-concept of Big Data and Unstructured Text Analytics to:
• Identify companies, laboratories, universities, and other prospective commercial customers for Air Products’ gases.
• Display relevant information about three core markets: metals processing, healthcare and food processing, as well as develop models that could extract information about possible new customer leads.
• Identify potential Air Products markets from the text.
Big Data Process
The first step was to generate a business question specific to Air Products’ core strategies: “How do we effectively and efficiently identify potential customers when our products and services are highly transferable among many markets?”
Big Data and UTA can address our question by providing information that identifies new customers or potentially under-served markets more accurately. Better information allows Air Products to allocate resources efficiently, reduce uncertainty and construct a more complete market map. The information also allows the marketing managers to assess market demand more easily.
Our second step was to identify the relevant information sources. For example, we conducted a large-scale web crawl using more than 300 URLs to create a multi-million-document corpus of text representing the metals processing market. Each website was crawled to a depth of 16 layers, “URL links,” thereby expanding the database exponentially.
Once the database was formed using IBM Content Analytics software, CIMS data scientist Tim Michaelis worked one on one with each functional manager and senior technologist Mark Listemann to develop semantic word lists that frame context around each market segment.
For example, Lisa Mercando, Air Products metals processing market manager for North America, was able to create key word lists such as “Metals Processing Gas, Metals Processing Equipment, Metals Fabrication, and Metals Processing Techniques.” To identify potential customers, each word list was then combined multiple ways with different text algorithms to uncover those companies associated with only the terms she was interested in.
The text algorithms are created by Natural Language Processing (NLP), a field in computer science, artificial intelligence and linguistics in which computers derive meaning from human language. In the following example, the bracketed items represent semantic word lists and text algorithms designed to find specific trends in text:
[Metal Processing]x[Company Finder Algorithm]x[Metals Process Gases]x[North America]
This example would return only the text that displayed a metals processing term, a metals processing gas, a company name, and a geographic reference to North America. With this algorithm, we can search multiple millions of websites and return only the documents and text that mention companies using the gases Air Products sells in the metals processing market.
There are many ways to update word lists and combine them with text algorithms; however, one must be careful that these combinations always refer back to the initial business question, or else the collected data will not be relevant. After all, the purpose of adding Big Data to a business portfolio is to use those data to make business decisions; if one’s data does not help answer the question, then what is the point of asking?
CIMS Key Learnings
Institutionalizing Big Data capabilities is no simple task. Combining the right resources takes time and cooperation among many individuals. Furthermore, adopting a disruptive innovation is difficult. One frustrated manager who cannot easily identify or extract value from Big Data can make or break the adoption process. (In person, action learning, workshops can help remove the abstractness of data analytics.)
First and foremost, Big Data Analytics is not about the technology; it’s about the question. Thus, Air Products was able to identify potential new customers and underserved markets for industrial gases because it could frame a Big Data question about what drives its business growth.
Second, UTA is not like a Google search. For example, Google tags only the key words written in the search bar and their spatial relationships in text: how far each word is from another in a web page. Unlike Google, the CIMS Big Data Platform with UTA allows combining multiple semantic keyword lists in order to return only those context-specific results that can be located within a document, paragraph or a single sentence. As Mark Listemann stated, “UTA is unlike traditional keyword searches which usually return 80% garbage; now all of the hits are on topic.”
Another critical part of introducing Big Data and UTA into a business unit is to have subject matter experts work directly with the technology. According to the McKinsey Global Institute, “1.5 million more data-savvy managers will be needed in order to take full advantage of Big Data in the U.S.” (1) These experts must be a part of the process of developing semantic word lists and using data to influence decisions.
Logically, the more experienced managers will be able to better articulate the current trends associated with a market and can more accurately define what is and is not good information. Introducing Big Data and UTA into an organization will not be successful without data-savvy managers: individuals who can both identify the core business question and use Big Data technologies to drive their decisions with data.
Air Products Key Learnings
Our ultimate goal is to remain current on numerous markets. Although the idea of Big Data sounded appealing, we really didn’t know what that would mean in practice for the day-to-day work of our market managers. We started by assembling lists of our commonly used websites and other data sources. This proved a useful exercise because it got the managers thinking about everything they read on a regular basis, and most of us don’t step back very often to take inventory of our sources.
Another benefit from this exercise was a heightened awareness of “orthogonal” sources ¾ not those directly related to your industry segment but new sources or other indirect indicators of interest. This is something we will explore much more as we update our crawls.
The first challenge we faced was what it means to build a semantic word list. This is about more than just keywords; it really forces you to think about how you describe your own thought process when prospecting for new information. Managers need to think of synonyms for their favorite topics, as well as antonyms that effectively exclude related but irrelevant topics.
Once we started breaking the dictionaries down by topic—for example, different types of industrial processes, words and phrases relating to mergers, and a dozen more — everyone became much more comfortable with thinking about their own market research styles in a sufficiently “granular” fashion to construct the word lists. The lists are then combined to run the actual queries and review the preliminary search results in order to identify deficiencies in the original word lists.
The managers caught on to this aspect quickly. The process is iterative, similar to hypothesis testing, and permits trying new variations in order to more effectively target topics of business interest.
We have demonstrated our ability to extract value for Air Products from Big
Data. Nevertheless, we definitely need more experience with this, and consequently we are continuing to develop our capabilities in Big Data and Unstructured Text Analytics. We intend to expand the pilot to more detailed company site-specific searches, and we plan to explore text mining and other approaches to capture detail from the individual records in a more searchable format.
1. McKinsey Global Institute. Big Data: The next frontier for innovation, competition, and productivity. New York: McKinsey & Company, 2011.
Mark Listemann (LISTEMML@airproducts.com), Lisa Mercando, Theresa Camilli, and Chris Johnson from Air Products and Chemicals, Inc., Allentown, Pennsylvania.
Tim Michaelis (firstname.lastname@example.org), Michael Kowolenko, Fred Renk, Stephen Markham, and Vincent Freeh from North Carolina State University and the Center for Innovation Management Studies (CIMS), in Raleigh, North Carolina.