With all of the excitement about using “big data analytics” in the cloud, there’s been little guidance about what it will cost. CIMS has been exploring this issue, which can be boiled down to how often, how much, how fast. In this article, CIMS Industry Fellow Michael Kowolenko presents a snapshot of our thinking to date.
The Big Data Analytics Platform (BDAPTM), described by Paul Mugge on page 2, lends itself to the cloud computing environment. The elastic nature of the cloud is an ideal environment for data capture and analysis. However, the volume of data, the amount of processing, bandwidth, and storage will vary from business to business, all of which will affect cost. In order to determine the cost-effectiveness of such a solution, it is necessary to understand the components that are required within this environment.
Speed, Volume, Users=Cost
Pricing within the cloud is based on the number of processors required, how quickly you need your results, the volume of data stored within the cloud, how often you’ll access that data, how secure you want that data, and finally whether or not you’ll need support services for your particular application. As the owner of the application, you can control all of these variables. Ultimately it is speed, volume and the number of users that will determine your cost.
The extensive volume of data and its analysis requires the use of servers with multiple processors in local memory. Determining the number of servers and their concomitant processors is analogous to setting the f-stop on a camera. The wider the aperture (the processor), the greater the amount of light (data) that passes onto the film (storage). Each one of these factors is priced accordingly.
One must also consider the applications that are required to run on the hardware. Many times the software vendors price applications are based on the number of processors and not on the number of servers. Servers can contain multiple processors, i.e., quad core or dual core where each core is treated as one machine by the software vendor. Operating system software is often priced as part of the hardware costs and is minimal when compared to the application.
Pricing on applications often depends upon the relationships an organization has with a given vendor. Pricing can be quite flexible for purchased software; however, as CIMS began to explore cloud-based applications for software normally used exclusively on in-house purchased systems, we found that the concept of “renting” in a cloud environment has not been extensively commercialized by many software vendors. Pricing in this environment will likely require custom solutions negotiated between the user and the vendor of the software.
The amount of local memory required for use in the BDAP will require several powerful servers that contain multiple cores and extended local memory. The user must determine the tradeoff between the speed with which an answer is required versus the cost of high-performance computing. Slower processors and less local memory are cheaper than fast processors and more memory but may be able to achieve the same goals if time to results is not critical.
Bandwidth will be an additional charge in the cloud. Pricing reflects the volume of use. Like storage, the more you use the cheaper it gets. In estimating cost, the two variables that must be considered are the size of the database being queried and the number of users accessing that data. Large databases with frequent activity by users will drive costs higher.
Storage requirements are determined by several factors. Will all the data, including backups, be kept in a cloud environment? Will the user keep backup in a separate location locally? All of these can affect cloud computing costs. Like bandwidth, the more storage you use, the cheaper it gets. Unlike computer processing time, storage will require monthly payments to the cloud provider for as long as you keep the data.
Security Is an ongoing concern of those who use the cloud. Depending upon the firm’s IT policies, dedicated encrypted services for data transmission may be required. This will result in additional fees for the use of security software and hardware solutions. Operating in a cloud environment may require the services of IT technical support if the firm does not have the appropriate support personnel for dealing with application software, servers, data storage, and communications. The level of support that the client requires can drive costs. More support equals more money.
Buy or Rent?
As you begin to explore the use of the big data analytics platform, you must consider where the “breakpoint” is with regard to renting or leasing services versus bringing such activities in-house. If a firm does not make BDAP part of its everyday business process but instead relies on infrequent project usage, then operating in a cloud environment may provide the greatest return on investment. Conversely, a company that plans on using this platform for both internal and external data retrieval and analysis may find that bringing the applications in-house is the most efficient use of their dollar.
Most companies will quite likely find a hybrid model most cost-effective. In this case, external cloud services for activities such as web crawls can provide the brute force necessary for data acquisition. The data can then be transferred to where the appropriate analytics are performed.
This solution may be the most practical for many organizations. The model minimizes security concerns, external bandwidth consumption, and storage fees. Because the most time consuming activity is the crawl, a company can “rent” as many processors as necessary to meet the project deadline.
To summarize, the questions that must be considered by a firm wishing to use the big data analytics platform are: 1) how often; 2) how much data; and 3) how fast do I need the information? The answers to these questions will provide significant insight into what the expected cost will be to operate in a cloud environment.
CIMS is in the process of posting a cost calculator on our Web site. If you have additional questions or concerns, please email me.
CIMS Industry Fellow
NC State University
Virtual Cloud Lab