chitkara logo


Vol. 3, Issue 38, October 2017
SHARED FILE AWARE DATA SCHEDULING IN HYBRID CLOUD COMPUTING ENVIRONMENT

As cloud computing has emerged as an important part in almost every sector of IT industry, organizations are starting to setup their private cloud which serves the resource demand from different offices of an organization. But, when demand of resources increased beyond the point where private cloud can handle, these organizations shift their processing to public cloud. This kind of setup is called hybrid cloud computing, as shown in Figure 1.

Figure 1. Hybrid Cloud Setup

Shifting of jobs on public cloud from private cloud can be based on multiple factors such as deadline sensitive jobs moves to cloud if it predicts that deadline can-not be met whereas sometimes cost is also a factor on deciding which job to send to public cloud. Using public cloud will increase the cost of running the applications because it will be charged based on the running time. If organization is low on cost investment, they do not prefer to send jobs to public cloud until and unless it is very urgent.

One of the issues we investigated in this type of scenario is to how to bundle the applications with their data when sending to public cloud for processing. Jobs send to public cloud didn't get started at the instant we send them because it takes time to start the virtual machine and send the required data to the public cloud. Time taken by computation logic to send on virtual machine is negligible as compared to data transfer time in case of big data applications. As most of the organizations are moving toward in-house data mining and analysis so data transfer cost and time cannot be ignored.

Public cloud provides different types of machines ranging from 1 vCPU to 64 vCPUs and even more. Many applications in data mining and analysis use same dataset to mine different relations from it. So, it would be more effective if we bundle jobs which require same data to mine and send them to machine which has appropriate number of vCPUs. This will reduce the data transfer time considerably. For example, if the scheduler must send 8 jobs to public cloud which has 300MB of shared data and if it provisions 1 vCPUs for each job it would send 300*8=2400MB of data. But, if the scheduler is shared file aware and send all 8 jobs to one machine with 8 vCPUs total data transfer will be of only 300 MB.

Making the scheduler aware of shared files and total jobs that will require that shared file it would decrease the transfer time to many folds which will make the user experience more seamless and effective.

By - Dr. Rajinder Sandhu, CURIN, Chitkara University, Punjab

About Technology Connect

Aim of this weekly newsletter is to share with students & faculty the latest developments, technologies, updates in the field Electronics & Computer Science and there by promoting knowledge sharing. All our readers are welcome to contribute content to Technology Connect. Just drop an email to the editor. The first Volume of Technology Connect featured 21 Issues published between June 2015 and December 2015. The second Volume of Technology Connect featured 46 Issues published between January 2016 and December 2016. This is Volume 3.

Previous Issue



Power line Communication
Click here!

Archives - Random Issue from Vol. 1 & 2



Cost Effective Cluster & Cloud Computing
Click here!

Editorial Team

Chief Editor: Sagar Juneja
Members: Gitesh Khurani,
Arun Goyal.

Disclaimer:The content of this newsletter is contributed by Chitkara University faculty & taken from resources that are believed to be reliable.The content is verified by editorial team to best of its accuracy but editorial team denies any ownership pertaining to validation of the source & accuracy of the content. The objective of the newsletter is only limited to spread awareness among faculty & students about technology and not to impose or influence decision of individuals.