This tute we'll look at bank conflicts. Bank conflicts slow shared memory down, they occur when multiple values are requested from a shared memory bank are requested from a single warp.
I've also introduced the cudaDaviceReset function which is used to write all the performance counters from the GPU for use with a profiler. This function should be at the end of any programs you wish to profile.
The clock() function is used to record how long your code takes. It's extremely accurate and fine
آی-ویدئو