Parallel computing is a type of process in which numerous sub processes are carried out concurrently on multiple processor cores in multi-core processors. The concept is based on the theory that huge problems can be divided into smaller ones, and solved simultaneously. There is a tremendous impact of parallel computing on a number of diverse areas that ranges from computational simulations for technical and engineering problems to applications in transaction processing and data mining etc. There are various methods and models to add parallelism in an algorithm or a code.
Loop unrolling also known as loop unwinding is one of the techniques to enhance benefits of parallelization. It is a loop alteration technique that tries to optimize a program's execution speed at the cost of its code size. The conversion can be undertaken manually by the programmer. The objective of loop unwinding is to boost program's speed by dipping instructions that control the loop, for example 'end of loop' test on every iteration, dropping branch penalties, and hiding latencies, particularly the waiting time used to read data from memory. To eliminate this overhead, one can use the mechanism in which loops can be re-written as a repetitive series of similar independent statements.
In this article, static loop unrolling is discussed in which programmer analyzes the loop and convert the iterations into a series of directions which will diminish the loop overhead.
Following is a simple example of static loop unrolling:
A function in a computer program adds 100 items from an array. This is usually done using simple for-loop which calls the function add (item_number). If the size of the loop is 100 then the method "add" is called 100 times. There is loop iteration overhead each time. In order to reduce this loop overhead, loop unrolling can be used.
Normal loop |
After loop unrolling |
Int i;
for (i = 0; i< 100; i++)
{
Add(i);
}
|
Int i;
for (i = 0; i< 100; i+=5){
Add(i);
Add(i+1);
Add(i+2);
Add(i+3);
Add(i+4);
}
|
NAccording to the revision, the new program will make just 20 iterations rather than 100. As a result, merely 20% jumps and conditional branches are required. To generate the maximum benefit, there should be no variable specification in the unrolled code that necessitates pointer arithmetic. This normally requires "base plus offset" addressing mechanism, instead of indexed referencing.
Conversely, it should be observed that the manual loop unwinding expands the size of the source code from 3 lines to 7, that needs to be checked, debugged and compiler will have to assign extra registers to accumulate variables in the extended loop, moreover the control variables and number of operations within the body of the loop have to be elected cautiously so that the result should be same as in the original code. Following figure depicts the process of loop unrolling.
Pictorial representation of normal loop (i = i+1) and unwinding loop (i = i+5)
By Dr. Disha Handa, CSE Department, Chitkara University, H.P.
About Technology Connect
Aim of this weekly newsletter is to share with students & faculty the latest developments, technologies, updates in the field Electronics & Computer Science and there by promoting knowledge sharing. All our readers are welcome to contribute content to Technology Connect. Just drop an email to the editor. The first Volume of Technology Connect featured 21 Issues published between June 2015 and December 2015. This is Volume 2.
Happy Reading!
Disclaimer:The content of this newsletter is contributed by Chitkara University faculty & taken from resources that are believed to be reliable.The content is verified by editorial team to best of its accuracy but editorial team denies any ownership pertaining to validation of the source & accuracy of the content. The objective of the newsletter is only limited to spread awareness among faculty & students about technology and not to impose or influence decision of individuals.