I recently presented my work at the 2011 International Conference on
Engineering of Reconfigurable Systems and Algorithms (ERSA 2011) conference, held at Las Vegas, NV.
The three keynotes were:
1. How Engineering Mathematics can Improve Software by Prof. David Lorge Parnas
2. The Nature of Cyber Security by Prof. Eugene H. Spafford
3. Changing Lives around the World: the Power of Technology by Dr. Sandeep Chatterjee
All the keynotes were impressive and very diverse, started with applying engineering mathematics to a process; cyber security as a science and method; and finally mobile computing applications in developing countries. Though there are no slides to link for Dr. Chatterjee's talk, his talk had example scenarios where mobile computing was used for banking, construction, etc.
The ERSA sessions started with a bang and the tutorial session on evolvable computing by Prof. Jim Torresen was very interesting. The Evolvable And Bio-Inspired Hardware session by Dr. Eric Stahlberg was interesting where the convergence was on what biological process can inspire computing in general.
Finally, here's a link to my presentation on "Accelerating Real-time processing of the ATST Adaptive Optics System using Coarse-grained Parallel Hardware Architectures". My talk was on Wednesday July 20,2011 at the Gold Room.
I also chaired Session 13-PDPTA: Systems Software + OS + Programming Models + Architecture Issues +
Fault-Tolerant Systems & Tools on Thursday July 21, 2011 from 08:00am - 12:20pm (LOCATION: Ballroom 4) as part of the 2011 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'11).
Next year's ERSA is expected to be bigger and better and promises to have a developers meet/session. Eagerly looking forward to next year's conference.
Showing posts with label Multi-GPU. Show all posts
Showing posts with label Multi-GPU. Show all posts
Sunday, July 31, 2011
Thursday, October 7, 2010
Installing Tesla C2050 and C1060 on Centos
I have been waiting to get a NVIDIA Fermi GPU to prototype my algorithms, but I was waiting to get an optimized implementation on the Tesla C1060 and then scale up to the Fermi. I got a C2050 and set about installing it on my machine. (note: my host machine is a Lian-Li PC with an integrated NVIDIA nForce 980a/780a (8 cores).
Installing the C2050 was easy, it has 2 power connections (6-pin and 8-pin). In most of the cases, connecting the 8-pin connector is more than enough.
After installing the C2050, I had to boot the machine and change the BIOS setting to disable display from external GPUs. The C2050 has a display out and I don't intend to use it now.
My deviceQuery output:
[vivekv@atstgpu release]$ more deviceQuery.txt
./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking)There are 3 devices supporting CUDADevice 0: "Tesla C2050" CUDA Driver Version: 3.10 CUDA Runtime Version: 3.10 CUDA Capability Major revision number: 2 CUDA Capability Minor revision number: 0 Total amount of global memory: 3220897792 bytes Number of multiprocessors: 14 Number of cores: 448 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Clock rate: 1.15 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Concurrent kernel execution: Yes Device has ECC support enabled: NoDevice 1: "Tesla C1060" CUDA Driver Version: 3.10 CUDA Runtime Version: 3.10 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 3 Total amount of global memory: 4294770688 bytes Number of multiprocessors: 30 Number of cores: 240 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 2147483647 bytes Texture alignment: 256 bytes Clock rate: 1.30 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Concurrent kernel execution: No Device has ECC support enabled: NoDevice 2: "nForce 980a/780a SLI" CUDA Driver Version: 3.10 CUDA Runtime Version: 3.10 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 1 Total amount of global memory: 131399680 bytes Number of multiprocessors: 1 Number of cores: 8 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 8192 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 2147483647 bytes Texture alignment: 256 bytes Clock rate: 1.20 GHz Concurrent copy and execution: No Run time limit on kernels: Yes Integrated: Yes Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Concurrent kernel execution: No Device has ECC support enabled: NodeviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs = 3, Device = Tesla C2050, Device = Tesla C1060
PASSEDPress <Enter> to Quit...-----------------------------------------------------------I hope to post the performance comparison of my kernels using the Tesla C2050 sometime soon.
Subscribe to:
Comments (Atom)