Saturday, October 24, 2009

CUDA and Computational Finance

Saw this on my Google Reader list and wanted to share it with everyone. This URL contains some videos/presentations on using CUDA for Computational Finance. (thanks to Argyn)

Lots more CUDA tutorials coming at SC09 !!!

Tuesday, October 20, 2009

Installing Boost C++ libraries

I know that installing Boost C++ libraries has nothing to do with GPGPU and multi-cores. Nevertheless, I have posted my procedure for installing Boost C++ libraries on a Mac.

I use Macports on my Mac and I find that it is the easiest way to install any open-source software on Mac OSX.

1. Search for boost packages
bash-3.2$ sudo port search boost
boost @1.40.0 (devel)
Collection of portable C++ source libraries
boost-build @2.0-m12 (devel)
Build system for large project software construction
boost-gil-numeric @1.0 (devel)
An algorithm extension to boost-gil.
boost-jam @3.1.17 (devel)
Boost.Jam (BJam) is a build tool based on FTJam
py26-pyplusplus @1.0.0 (python, devel)
Py++ is an framework for creating a code generator for Boost.Python library and ctypes package
Found 5 ports.

2. Install
bash-3.2$ sudo port install boost-jam  
---> Fetching boost-jam
---> Verifying checksum for boost-jam
---> Extracting boost-jam
---> Configuring boost-jam
---> Building boost-jam with target all
---> Staging boost-jam into destroot
---> Installing boost-jam

bash-3.2$ sudo port install boost
---> Fetching boost
---> Verifying checksum for boost
---> Extracting boost
---> Configuring boost
---> Building boost with target all
---> Staging boost into destroot
---> Installing boost

bash-3.2$ sudo port install boost-build  
---> Fetching boost-build
---> Verifying checksum for boost-build
---> Extracting boost-build
---> Configuring boost-build
---> Building boost-build with target all
---> Staging boost-build into destroot
---> Installing boost-build

3. Here's my example program:
#include <iostream>
#include <boost/any.hpp>
using namespace std;
int main()
{
boost::any
a(5);
a = 7.67;
std:cout<<boost::any_cast<double>(a)<<std::endl;
}

Now when I tried compiling my example program, I got a lot of errors such as:
example3.cpp:11:25: error: boost/any.hpp: No such file or directory
example3.cpp: In function ‘int main()’:
example3.cpp:18: error: ‘boost’ has not been declared
example3.cpp:18: error: ‘any’ was not declared in this scope
example3.cpp:18: error: expected `;' before ‘a’
example3.cpp:19: error: ‘a’ was not declared in this scope
example3.cpp:20: error: ‘boost’ has not been declared
example3.cpp:20: error: ‘any_cast’ was not declared in this scope
example3.cpp:20: error: expected primary-expression before ‘double’
example3.cpp:20: error: expected `;' before ‘double’

So I googled around and came to this link and tried compiling using the full path:
$ g++ -I /opt/local/var/macports/software/boost/1.40.0_1/opt/local/include/ example3.cpp -o example3_new

and that worked...!!!!

I also did this as a shortcut for compiling my programs that need the Boost libraries:
$ export BOOST=/opt/local/var/macports/software/boost/1.40.0_1/opt/local/include/
$ echo $BOOST
/opt/local/var/macports/software/boost/1.40.0_1/opt/local/include/
$ g++ -I $BOOST example3.cpp -o example3_new
$./example3_new
7.67

Hope this helps :)

Monday, September 28, 2009

OpenCL drivers available from Nvidia

OpenCL drivers are available for download from Nvidia.

You can download the SDK, Best Practices guide from here: Nvidia OpenCL download

Waiting to try some sample code examples using OpenCL and evaluate how it differs from CUDA :)



Wednesday, August 19, 2009

SAAHPC presentation

I recently presented my work at the SAAHPC conference, held at NCSA Urbana, IL. The Keynote talk was by Pradeep Dubey on Massive Data Computing using Intel Larrabee. Excellent overview on why data transfer and management is the challenge in today's computing. I liked the Connected Computing factor - Content, Connect and Compute. I have found out from real-time problems that having the fastest computation platform is not enough, it's even more important to sustain the streaming bandwidth of the data into the platform. Obviously, once the data is inside the memory, computation is fast. But the real bottleneck is in importing and offloading the data and trying to streamline and synchronize the data (some of my PhD dissertation grumble).

I liked the talk by Michael Garland on GPU Computing using CUDA. This was informative in terms of his insight on the Thrust template. Thrust is open source and is hosted on Google Code. I have to start using this for my next CUDA project.

Finally, here's a link to my presentation on "Accelerating Particle Image Velocimetry using Hybrid Architectures".

Friday, July 31, 2009

Installing CUDA 2.3

Installing CUDA 2.3 is pretty easy and straightforward. However, on my Mac, the CUDA SDK examples are now in
/Developer/GPU Computing

In addition to the drivers for Leopard, there's a separate version for Snow Leopard :) New documentation includes the CUDA Best Practices Guide. This is the correct link, thanks to the NVIDIA forums post. The CUDA Resource page does not point you to the correct link.

Have a good time accelerating your apps using CUDA :)

Monday, June 29, 2009

cuda-gdb error on CentOS

CUDA 2.2 comes with its native debugger support through cuda-gdb. However, I had some problems configuring cuda-gdb on my Linux distro (CentOS).
When I execute cuda-gdb fresh after my installation, here's what I see:
$ cuda-gdb
cuda-gdb: error while loading shared libraries: libtermcap.so.2: cannot open shared object file: No such file or directory

I tried searching for the missing library libtermcap.so.2 using
locate libtermcap

A google search led me to http://forums.nvidia.com/index.php?showtopic=96987
and I installed libncurses and linked it correctly.
sudo yum install ncurses.x86_64
sudo ln -s /usr/lib64/libncurses.so /usr/local/cuda/lib/libtermcap.so.2

That seemed to do the trick !!!
$ cuda-gdb
NVIDIA (R) CUDA Debugger
BETA release
Portions Copyright (C) 2008,2009 NVIDIA Corporation
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
(cuda-gdb)

For more info on cuda-gdb refer to: http://developer.download.nvidia.com/compute/cuda/2_2/toolkit/docs/CUDA_GDB_User_Manual_2.2beta.pdf

Tuesday, May 12, 2009

Installing CUDA 2.2 on MacPro running CentOS 64-bit

My test machine is an early 200 Mac Pro with a NVIDIA Tesla C1060. Upgrading to CUDA 2.2 on this Mac was relatively easy. Please read my previous post for installing CUDA 2.1 before attempting to install CUDA 2.2 on a CentOS 64-bit distro.

Step 1: Download packages:
wget http://developer.download.nvidia.com/compute/cuda/2_2/drivers/cudadriver_2.2_linux_64_185.18.08-beta.run
wget http://developer.download.nvidia.com/compute/cuda/2_2/toolkit/cudatoolkit_2.2_linux_64_rhel5.3.run
wget http://developer.download.nvidia.com/compute/cuda/2_2/sdk/cudasdk_2.2_linux.run
wget http://developer.download.nvidia.com/compute/cuda/2_2/toolkit/cudagdb_2.2_linux_64_rhel5.3.run
Change permissions
chmod +x cudadriver_2.2_linux_64_185.18.08-beta.run
chmod +x cudatoolkit_2.2_linux_64_rhel5.3.run
chmod +x cudagdb_2.2_linux_64_rhel5.3.run
chmod +x cudasdk_2.2_linux.run
Step 2: Find your kernel source and install the corresponding kernel-devel and kernel-headers:
uname -r
2.6.27.21-170.2.56.fc10.x86_64

sudo yum install kernel-devel
sudo yum install kernel-headers
Step 3: Install the CUDA driver first, followed by the toolkit, gdb and finally the SDK.
sudo ./cudadriver_2.2_linux_64_185.18.08-beta.run --kernel-source-path /usr/src/kernels/2.6.27.21-170.2.56.fc10.x86_64
sudo ./cudatoolkit_2.2_linux_64_rhel5.3.run
sudo ./cudagdb_2.2_linux_64_rhel5.3.run
./cudasdk_2.2_linux.run
Make sure that your path is configured correctly. (Refer to previous post)

Step 4: Enable the cuda script mentioned in the previous post.
sudo service cuda start
check the nvidia driver status and you should see this:
ls /dev/nv*
/dev/nvidia0 /dev/nvidiactl /dev/nvram
Step 5: Make sure you do a fresh install of the SDK if you are upgrading.
cd NVIDIA_CUDA_SDK/
make
This should make all your projects in the following directory:
$HOME/NVIDIA_CUDA_SDK/bin/linux/release
Executing the deviceQuery executable shows me this:
./deviceQuery
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA

Device 0: "Tesla C1060"
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 3
Total amount of global memory: 4294705152 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.30 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)

Test PASSED

Press ENTER to exit...
Congrats !!! CUDA 2.2 is configured on your 64-bit CentOS Mac Pro and also recognizes the NVIDIA Tesla C1060. Looking forward to post some examples with the zero copy feature :)


Friday, May 8, 2009

Installing CUDA 2.2 on Mac OSX

Installing CUDA 2.2 on Mac OSX is pretty trivial. Only warning/suggestion: manually remove the CUDA 2.1 SDK directory from /Developer/CUDA
Do a fresh install (always recommended).

I tried upgrading without removing CUDA 2.1 SDK and saw my image denoising project failing to compile. This was the error I saw: "ld: duplicate symbol _cutFree in ../../lib/libcutil.a(cutil.cpp.o)"

My deviceQuery output:
hogwarts:emurelease vivekv$ ./deviceQuery
CUDA Device Query (Runtime API) version (CUDART static linking)
There is no device supporting CUDA.

Device 0: "Device Emulation (CPU)"
CUDA Capability Major revision number: 9999
CUDA Capability Minor revision number: 9999
Total amount of global memory: 4294967295 bytes
Number of multiprocessors: 16
Number of cores: 128
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 1
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.35 GHz
Concurrent copy and execution: No
Run time limit on kernels: No
Integrated: Yes
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)

Test PASSED

Press ENTER to exit...



I shall also post my experience installing CUDA 2.2 on a MacPro running CentOS 64-bit.

Monday, March 9, 2009

Learning CUDA

I was planning to share some of my programming exercises here, but then I thought of sharing some of the CUDA resources I have collected in the past two months. This may not be an exhaustive list of all available resources, but they have benefited me immensely.
1. CUDA Programming Guide
2. CUDA Reference Manual
3. cuFFT Library Guide
For the above documents and more, you can always visit the CUDAZone and download the documents for your OS and release version.
4. Learning CUDA from NVIDIA: This page has some podcasts, slides from conferences and a link to Dr. Dobb's article series on CUDA. Another excellent link is the ECE498AL class taught at UIUC. I also liked the SC08 CUDA tutorial/Workshop, which I attended.
5. NVIDIA forums: When in doubt, post your query on the NVIDIA forums. Someone may have seen the same problem and is likely to post a answer.

Send me an email for the SC08 slides if you haven't found them on the Internet already.

Saturday, February 21, 2009

Installing Nvidia Tesla C1060 on a Mac Pro with CentOS

After my last post on getting CUDA 2.0 to run on my Mac, I will go through the steps for installing a Nvidia Tesla C1060 on a early 2008 Mac Pro with CentOS or equivalent RHEL 5.xx Linux distro. The Tesla C1060 has 240 streaming processor cores (1.3 GHz), 4GB onboard memory, requires PCIe x16, occupies two slots and has a 6-pin and 8-pin connector.

Instructions:

1. Physical install: Please download the C1060 User Manual and go through it before installing the C1060 physically. The C1060 has two connectors 6-pin and 8-pin, for a Mac Pro please please order the Booster cable (that's the correct Apple jargon). The booster cable is mentioned in the Mac Pro documentation to connect the Nvidia Quadro FX 5600. After inserting the C1060, refer to page 16 of the manual to see the LED status. Red indicates something wrong in connecting the booster cables. (As an afterthought, if you have a PC, you can buy the standard 6-pin/8-pin connector from Newegg for about $6.99)

2. Software install:
Step 1: Get the latest CUDA drivers, toolkit and SDK from here. I selected the CUDA 2.1 drivers for Linux 64-bit for RHEL 5.xx. (You can paste the code directly to your terminal)

wget http://developer.download.nvidia.com/compute/cuda/2_1/drivers/NVIDIA-Linux-x86_64-180.22-pkg2.run
wget http://developer.download.nvidia.com/compute/cuda/2_1/toolkit/cudatoolkit_2.1_linux64_rhel5.2.run
wget http://developer.download.nvidia.com/compute/cuda/2_1/SDK/cuda-sdk-linux-2.10.1215.2015-3233425.run


Change permissions on the downloaded files
chmod +x NVIDIA-Linux-x86_64-180.22-pkg2.run
chmod +x cuda-sdk-linux-2.10.1215.2015-3233425.run
chmod +x cudatoolkit_2.1_linux64_rhel5.2.run


Step 2: Install the following packages or see if you have them installed on your system already.

sudo yum install freeglut-devel
sudo yum install libXi-devel


Step 3: Install the CUDA driver first, the toolkit and the SDK finally.
sudo ./NVIDIA-Linux-x86_64-180.22-pkg2.run
sudo ./cudatoolkit_2.1_linux64_rhel5.2.run
./cuda-sdk-linux-2.10.1215.2015-3233425.run


The Nvidia driver build the kernel for the Linux distro, if it cannot find the precompiled kernel. Click yes and go on. The toolkit installs cuda to /usr/local/cuda by default. The SDK installs into the NVIDIA_CUDA_SDK directory in your home folder by default.

Paste the following in your terminal (thanks to the Ubuntu instructions here)
echo "# CUDA stuff
PATH=\$PATH:/usr/local/cuda/bin
LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:/usr/local/cuda/lib
export PATH
export LD_LIBRARY_PATH" >> ~/.bashrc
Step 4: On a RHEL Linux distro, a startup script is required to turn on and off the Nvidia device. This startup script is posted on the Nvidia forums.
#!/bin/bash
#
# Startup/shutdown script for nVidia CUDA
#
# chkconfig: 345 80 20
# description: Startup/shutdown script for nVidia CUDA

# Source function library.
. /etc/init.d/functions

DRIVER=nvidia
RETVAL=0

# Create /dev nodes for nvidia devices
function createnodes() {
# Count the number of NVIDIA controllers found.
N3D=`/sbin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l`
NVGA=`/sbin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l`

N=`expr $N3D + $NVGA - 1`
for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i
RETVAL=$?
[ "$RETVAL" = 0 ] || exit $RETVAL
done

mknod -m 666 /dev/nvidiactl c 195 255
RETVAL=$?
[ "$RETVAL" = 0 ] || exit $RETVAL
}

# Remove /dev nodes for nvidia devices
function removenodes() {
rm -f /dev/nvidia*
}

# Start daemon
function start() {
echo -n $"Loading $DRIVER kernel module: "
modprobe $DRIVER && success || failure
RETVAL=$?
echo
[ "$RETVAL" = 0 ] || exit $RETVAL

echo -n $"Initializing CUDA /dev entries: "
createnodes && success || failure
RETVAL=$?
echo
[ "$RETVAL" = 0 ] || exit $RETVAL
}

# Stop daemon
function stop() {
echo -n $"Unloading $DRIVER kernel module: "
rmmod -f $DRIVER && success || failure
RETVAL=$?
echo
[ "$RETVAL" = 0 ] || exit $RETVAL

echo -n $"Removing CUDA /dev entries: "
removenodes && success || failure
RETVAL=$?
echo
[ "$RETVAL" = 0 ] || exit $RETVAL
}

# See how we were called
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
*)
echo $"Usage: $0 {start|stop|restart}"
RETVAL=1
esac
exit $RETVAL


Copy this startup script into a file, "cuda_startup_script".
cp $HOME/cuda_drivers_2.1/cuda_startup_script cuda
cp cuda /etc/init.d/cuda
chmod 755 /etc/init.d/cuda
chkconfig --add cuda
chkconfig cuda on
service cuda start


After this, check if the device is recognized by executing
ls /dev/nv*


You should see the following in your terminal
/dev/nvidia0  /dev/nvidiactl  /dev/nvram


The "/dev/nvidiactl" indicates that the Nvidia card is recognized on your system. This is necessary when you need to compile your projects in the SDK/projects folder.

Also, on executing a lspci, you should see the following:
sudo /sbin/lspci -d "10de:*" -v -xxx
02:00.0 3D controller: nVidia Corporation Unknown device 05e7 (rev a1)
Subsystem: nVidia Corporation Unknown device 066a
Flags: bus master, fast devsel, latency 0, IRQ 193
Memory at 96000000 (32-bit, non-prefetchable) [size=16M]
Memory at 90000000 (64-bit, prefetchable) [size=64M]
Memory at 94000000 (64-bit, non-prefetchable) [size=32M]
I/O ports at 3000 [size=128]
[virtual] Expansion ROM at 97000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Capabilities: [78] Express Endpoint IRQ 0
00: de 10 e7 05 07 00 10 00 a1 00 02 03 00 00 00 00
10: 00 00 00 96 0c 00 00 90 00 00 00 00 04 00 00 94
20: 00 00 00 00 01 30 00 00 00 00 00 00 de 10 6a 06
30: 00 00 00 00 60 00 00 00 00 00 00 00 0b 01 00 00
40: de 10 6a 06 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 00 00 00 01 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 08 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 01 00 e0 84 00 00
80: 10 29 00 00 01 3d 00 08 08 00 01 01 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


Step 5:
cd NVIDIA_CUDA_SDK/
make

This should make all your projects in the following directory
$HOME/NVIDIA_CUDA_SDK/bin/linux/release

Execute the deviceQuery executable should show this:
./deviceQuery
There is 1 device supporting CUDA

Device 0: "Tesla C1060"
Major revision number: 1
Minor revision number: 3
Total amount of global memory: 4294705152 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.30 GHz
Concurrent copy and execution: Yes

Test PASSED

Press ENTER to exit...


Congratulations !!! Your Nvidia Tesla C1060 is configured on your system running CentOS or a similar RHEL distro. I shall post some exercises soon. For further questions, refer to the Nvidia forums.

Friday, February 13, 2009

Installing CUDA 2.0 on Mac OS

I have setup CUDA 2.0 on my MacBook and these are the steps I followed:
1. Download the CUDA driver, toolkit and SDK samples from here: Get CUDA
2. Install the packages. The default location for the SDK is
/Developer/CUDA

3. Update your .profile or .bash_profile with the following lines:
export PATH=/usr/local/cuda/bin:$PATH
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:$DYLD_LIBRARY_PATH

4. Now the SDK comes with sample projects. If your Mac does not have a CUDA supported GPU, you can build the sample projects using the emulation mode.
cd /Developer/CUDA
make emu=1

All the binaries will be located in the folder
/Developer/CUDA/bin/darwin/emurelease

Have fun with the sample projects. I shall soon post my first CUDA program.