IT Skills + Business Acumen= Savvy Data Scientist
Editor’s note: This is part of an occasional series profiling people who work for or on behalf of CIMS and its mission. This profile focuses on Rakesh Ravi, a recent Master of Science graduate of the N.C. State Department of Computer Science, who has, for the past two years, served as a CIMS data scientist.
When Rakesh Ravi began studying for his Master’s degree in computer networking in 2014, he was new to N.C. State, new to Raleigh, new to the country—and certainly hadn’t heard of CIMS. Two years later, Rakesh had secured an internship with CIMS and began putting his enviable combination of technical skills and business acumen to work as the lead data scientist for CIMS. And early this year, he published an article on unstructured text analytics that was even excerpted here in the blog.
Even though his internship is officially over now that Rakesh has his degree, he continues in his role helping CIMS faculty and researchers analyze big data for clients who want to gain a competitive edge via business intelligence. Several ongoing projects will likely keep him busy for the next year.
As a data scientist, Rakesh works primarily with IBM’s Watson program—courtesy of an arrangement between CIMS and N.C. State’s Department of Computer Science—to make sense of astronomical amounts of unstructured data, which includes text found in press releases, magazines, websites, blogs, social media platforms and the like. Unstructured data has become a goldmine of information for companies and organizations looking to surmount challenges or exploit opportunities. A data scientist not only writes the code that retrieves the data; they also know how to explain the results to CEOs and middle managers alike.
Take Rakesh’s first big project with CIMS. He was tasked with helping the American Coatings Association, a CIMS corporate sponsor, provide critical information to its members about possible health hazards associated with the chemicals they use so that they could assured of producing safe products. Working with Watson, Rakesh wrote code and devised “dictionaries” that ultimately led to the successful scanning of more than a million URLS and the downloading of more than 700,000 documents. The goal of all this crawling? To find the answers to critical questions including: What Centers for Disease Control (CDC)-listed “materials of concern” are used in coatings? Which of these CDC-listed materials are referenced in public forums as potentially harmful to human health? And which of the CDC-listed materials are being researched or considered for substitution?
While the project has moved on to the next phase, Rakesh was motivated by the reaction of ACA leaders to the initial results of the project. “They were really appreciative,” he says.
After his work on current CIMS Big Data projects wraps up, he’d like to stay in data science—a field that didn’t really exist when Rakesh was getting his undergraduate degree in engineering back home in Bangalore, India, and certainly wasn’t an occupation when Rakesh first discovered his love of science as a child.
“I really liked to work on circuits,” he recalls with a laugh. “I enjoyed connecting things.”
So it was no wonder that Rakesh focused on telecommunications while at the BMS College of Engineering. When he graduated in 2010, he took a job as a software engineer with Tecnotree, a Finland-based provider of IT solutions for the management serving communications service providers. Although he was based in Bangalore, Rakesh spent time in Lima, Peru, working on one of the company’s projects.
When he decided he wanted to expand his skill set, he learned about N.C. State from a friend in Bangalore who was a Wolfpack alumnus. Initially, he’d hope to complete his Master’s in three semesters, but that timeline proved unrealistic. The extra semester he ended up taking afforded him more time to devote to his internship. “CIMS was kind of a blessing for me,” he says. “Everyone’s been very, very helpful to me every step of the way.”
To learn more about Rakesh’s work on the ACA Big Data project, read his IMR article.