For the past several years CIMS has been using big data analytics to address the strategic challenges confronting companies. By analyzing the publicly available data residing on the worldwide web we have been able to answer such strategic questions as:
• What major trends are impacting our industry? • What market opportunities do these trends present to our company? • With whom might we partner to deliver solutions to these customers?
CIMS Executive Director Paul Mugge and Chief Evangelist Dick Kouri demonstrated this last scenario at the CIMS Fall 2015 Meeting, which focused on “Open Innovation Revisited.” Locating eligible and qualified business partners is one of the principal tasks confronting companies attempting to build open innovation business models.
In their article below, Professors Mugge and Kouri explain how CIMS research was able to answer a leading pharmaceutical firm’s question, “Who are the key opinion leaders in personalized medicine? Where are they located?”
This was a real case presented to us by a pharmaceutical leader headquartered in the United States that desired to start a new line of business in the treatment of breast, lung and prostate cancers. The company recognizes that personalized medicine, a new approach to health care based on each person’s unique genetic makeup, represents a major breakthrough in the treatment of these difficult cancers. However, they don’t believe they have the skills in-house to understand and respond to this advance, nor do they have the time to train their own R&D employees.
Consequently, the company wants to form partnerships quickly with the leading thought leaders in oncology personalized medicine (especially breast, lung and prostate).
As Henry Chesbrough, the widely accepted father of Open Innovation at Haas School of Business, emphasizes, “in today’s information-rich environment, companies can no longer afford to rely entirely on their own ideas to advance their business, nor can they restrict their innovations to a single path to market.”
This could not be truer than for Big Pharma companies caught up in a desperate search for people, and ideas, around which they can build new sustainable business models. The field of personalized medicine holds such promise, but how to identify and locate the “best of the best” of these partners, anywhere in the world, is the challenge.
To tackle the problem, we used IBM Watson Explorer with its ability to “read” and decipher massive amounts of unstructured data. To enable Watson to identify these people we used a search technique called “follow the money” (see Following the Money, at left).
We know that the National Cancer Institute (NCI) is a leader in cancer research and makes substantial annual grants to deserving faculty across the world, including those at the top U.S. medical schools. However, before searching NCI.gov for this information, Watson has to be trained to recognize the specific words and phrases that describe terms, like “medical schools,” “issued grants,” “personalized medicine,” and “oncology.”
These words and phrases are captured in dictionaries that often contain hundreds of entries. For example, to build the Oncology dictionary we downloaded its definition from the NCI Dictionary of Terms and all of the descriptors (synonyms) for lung, breast and prostate cancers from the NCI Thesaurus.
In order to follow the money, we created a special annotator, called “$money_finder,” which we set to identify those medical centers receiving NCI grants greater than $1.5 million. We were looking for large grants that NCI would likely only issue to prominent researchers at the top medical schools. The answer to this first sub-question was that the NCI had issued S.P.O.R.E. grants (Special Programs of Research Excellence) to 52 medical centers in 21 states (See Following the Money).
From there we moved to the second sub-question, “Which of these centers’ grants were focused on breast, lung or prostate cancer?” By using the oncology dictionary to search the S.P.O.R.E grants, we identified 10 medical centers researching the use of personalized medicine methods to treat patients with these specific cancers.
The third sub-question asks for the key investigators on these 10 grants. Again, we built another special annotator, called “name finder” to extract the investigators’ names. Watson found 143 investigators working on these grants.
These 143 investigators—all U.S. residents—are some of the most knowledgeable people in the world in the biology of breast, lung and prostate cancers. It stands to reason that these people would likely possess the skills that a company, like the one in our case, is looking for to help jumpstart its new line of business in personalized medicine in oncology.
Going Even Farther!
The case up to this point could have been solved eventually—albeit with a tremendous amount of hunting and pecking—by using Google and the search engine intrinsic to NCI’s website. Here is what could not be done using these tools:
Using Watson, CIMS crawled the Web and created a custom dataset containing the entire websites of the top 398 venture capital (VC) firms in North America that focus on developments in the life sciences. By mining this huge corpus of information—estimated to exceed 170 million web pages—we were able to answer the fourth sub-question, “Which of these key investigators are linked with startup activity?”
From this analysis, we get a good sense of how the VC community values these people and their technologies.
Only 13 of these people are engaged in startup activities and/or sit on the science advisory boards of the VC firms. For our pharma company, which is looking to partner with the “best of the best” of these people, this last test, or screen, yields this critical information.
The last sub-question can be equally valuable to the company. By searching the investment portfolios of the 398 VC firms with the dictionaries—personalized medicine + oncology—we are able to identify the six firms making investments in new ventures in this field. Possibly the company in the case would like to co-invest along with the VCs in these new ventures and their emerging treatments for breast, lung and prostate cancers. This represents another way of creating a new line of business while using VCs to help lessen the risk.
Publicly available data such as illustrated in this case contains a tremendous amount of raw intelligence. The trick is to extract this information using logical arguments the rest of the organization understands. In this case we used the editors of US News and World Report, the U.S. National Cancer Institute, and almost 400 VC portfolio managers to do the due diligence of locating special people with great expertise in a specific critical area.
For business people faced with making complex strategic questions, we believe “following the money” represents such an approach. Readers can learn more about this technique and how it was developed in our September/October 2014 IMR article: “Why Big Data Is Not All Hype: The Power of Unstructured Text Analytics.”
If your organization is looking for business partners, or you are just interested in learning more about advanced data analytics, please feel free to contact us.
Paul Mugge is Executive Director, Center for innovation Management Studies (CIMS), and Innovation Professor of the Poole College of Management, NC State University; email@example.com
Richard Kouri is Chief Evangelist, CIMS, Adjunct Professor, Jenkins Graduate School of the Poole College of Management, of the College of Engineering (Dept. of Biomedical Engineering) NC State University; firstname.lastname@example.org
Following the Money
Who are the key opinion leaders in personalized medicine?
Where are they located?
1. Sub Question: To which medical centers has the National Cancer Institute issued grants, in excess of 1.5 million dollars, for developing personalized medicine for oncology?
• Dictionaries were created for underscored terms, for example medical centers contained a list of the top 50 (U.S. News & World Report)
• Likewise, dictionaries were created for personalized medicine, issued grants, oncology, and we developed a special annotator, called $money_ finder
• Searched NCI.gov using a series of query rules, e.g. (medical centers + issued grants + personalized medicine)
Answer = NCI issued S.P.O.R.E grants to 52 medical centers in 21 states.
2. Sub Question: Which of these centers’ grants were focused on breast, lung, or prostate cancer?
• Used the oncology dictionary to search the S.P.O.R.E grants identified through the first search
Answer = 10 medical centers have treated these cancers as the foci of their grants
3. Sub Question: Who were the key investigators on the S.P.O.R.E grants?
• Created a dictionary for investigators and a special annotator, called name_finder
Answer = Identified 143 investigators
4. Sub Question: Which of these key investigators are linked with startup activity?
• Used collection of “Top 398 life science-focused VC firms in North America”
• Searched the VC data set using the list of 143 key investigators (new dictionary)
Answer = Identified 13 key people who are engaged in start ups
5. Sub Question: Which VC firms are funding personalized medicine in oncology?
• Used personalized medicine and oncology dictionaries to search VC data set
Answer = 6 VC’s are investing in the field