Wow, this has been a tricky tute. I originally tried to cover much more and added some coding at the end but it was too long to be interesting. Then I chopped the coding to be a separate tute and concentrated on the theory side, it was still way too long.
Shared memory is a very intricate topic, it's at the very core of what programming CUDA is all about. I eventually decided that there's no good brushing over this stuff, shared memory deserves more attention. This tutorial is a little intro, it
آی-ویدئو