Saturday, February 21, 2009

Installing Nvidia Tesla C1060 on a Mac Pro with CentOS

After my last post on getting CUDA 2.0 to run on my Mac, I will go through the steps for installing a Nvidia Tesla C1060 on a early 2008 Mac Pro with CentOS or equivalent RHEL 5.xx Linux distro. The Tesla C1060 has 240 streaming processor cores (1.3 GHz), 4GB onboard memory, requires PCIe x16, occupies two slots and has a 6-pin and 8-pin connector.

Instructions:

1. Physical install: Please download the C1060 User Manual and go through it before installing the C1060 physically. The C1060 has two connectors 6-pin and 8-pin, for a Mac Pro please please order the Booster cable (that's the correct Apple jargon). The booster cable is mentioned in the Mac Pro documentation to connect the Nvidia Quadro FX 5600. After inserting the C1060, refer to page 16 of the manual to see the LED status. Red indicates something wrong in connecting the booster cables. (As an afterthought, if you have a PC, you can buy the standard 6-pin/8-pin connector from Newegg for about $6.99)

2. Software install:
Step 1: Get the latest CUDA drivers, toolkit and SDK from here. I selected the CUDA 2.1 drivers for Linux 64-bit for RHEL 5.xx. (You can paste the code directly to your terminal)

wget http://developer.download.nvidia.com/compute/cuda/2_1/drivers/NVIDIA-Linux-x86_64-180.22-pkg2.run
wget http://developer.download.nvidia.com/compute/cuda/2_1/toolkit/cudatoolkit_2.1_linux64_rhel5.2.run
wget http://developer.download.nvidia.com/compute/cuda/2_1/SDK/cuda-sdk-linux-2.10.1215.2015-3233425.run


Change permissions on the downloaded files
chmod +x NVIDIA-Linux-x86_64-180.22-pkg2.run
chmod +x cuda-sdk-linux-2.10.1215.2015-3233425.run
chmod +x cudatoolkit_2.1_linux64_rhel5.2.run


Step 2: Install the following packages or see if you have them installed on your system already.

sudo yum install freeglut-devel
sudo yum install libXi-devel


Step 3: Install the CUDA driver first, the toolkit and the SDK finally.
sudo ./NVIDIA-Linux-x86_64-180.22-pkg2.run
sudo ./cudatoolkit_2.1_linux64_rhel5.2.run
./cuda-sdk-linux-2.10.1215.2015-3233425.run


The Nvidia driver build the kernel for the Linux distro, if it cannot find the precompiled kernel. Click yes and go on. The toolkit installs cuda to /usr/local/cuda by default. The SDK installs into the NVIDIA_CUDA_SDK directory in your home folder by default.

Paste the following in your terminal (thanks to the Ubuntu instructions here)
echo "# CUDA stuff
PATH=\$PATH:/usr/local/cuda/bin
LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:/usr/local/cuda/lib
export PATH
export LD_LIBRARY_PATH" >> ~/.bashrc
Step 4: On a RHEL Linux distro, a startup script is required to turn on and off the Nvidia device. This startup script is posted on the Nvidia forums.
#!/bin/bash
#
# Startup/shutdown script for nVidia CUDA
#
# chkconfig: 345 80 20
# description: Startup/shutdown script for nVidia CUDA

# Source function library.
. /etc/init.d/functions

DRIVER=nvidia
RETVAL=0

# Create /dev nodes for nvidia devices
function createnodes() {
# Count the number of NVIDIA controllers found.
N3D=`/sbin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l`
NVGA=`/sbin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l`

N=`expr $N3D + $NVGA - 1`
for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i
RETVAL=$?
[ "$RETVAL" = 0 ] || exit $RETVAL
done

mknod -m 666 /dev/nvidiactl c 195 255
RETVAL=$?
[ "$RETVAL" = 0 ] || exit $RETVAL
}

# Remove /dev nodes for nvidia devices
function removenodes() {
rm -f /dev/nvidia*
}

# Start daemon
function start() {
echo -n $"Loading $DRIVER kernel module: "
modprobe $DRIVER && success || failure
RETVAL=$?
echo
[ "$RETVAL" = 0 ] || exit $RETVAL

echo -n $"Initializing CUDA /dev entries: "
createnodes && success || failure
RETVAL=$?
echo
[ "$RETVAL" = 0 ] || exit $RETVAL
}

# Stop daemon
function stop() {
echo -n $"Unloading $DRIVER kernel module: "
rmmod -f $DRIVER && success || failure
RETVAL=$?
echo
[ "$RETVAL" = 0 ] || exit $RETVAL

echo -n $"Removing CUDA /dev entries: "
removenodes && success || failure
RETVAL=$?
echo
[ "$RETVAL" = 0 ] || exit $RETVAL
}

# See how we were called
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
*)
echo $"Usage: $0 {start|stop|restart}"
RETVAL=1
esac
exit $RETVAL


Copy this startup script into a file, "cuda_startup_script".
cp $HOME/cuda_drivers_2.1/cuda_startup_script cuda
cp cuda /etc/init.d/cuda
chmod 755 /etc/init.d/cuda
chkconfig --add cuda
chkconfig cuda on
service cuda start


After this, check if the device is recognized by executing
ls /dev/nv*


You should see the following in your terminal
/dev/nvidia0  /dev/nvidiactl  /dev/nvram


The "/dev/nvidiactl" indicates that the Nvidia card is recognized on your system. This is necessary when you need to compile your projects in the SDK/projects folder.

Also, on executing a lspci, you should see the following:
sudo /sbin/lspci -d "10de:*" -v -xxx
02:00.0 3D controller: nVidia Corporation Unknown device 05e7 (rev a1)
Subsystem: nVidia Corporation Unknown device 066a
Flags: bus master, fast devsel, latency 0, IRQ 193
Memory at 96000000 (32-bit, non-prefetchable) [size=16M]
Memory at 90000000 (64-bit, prefetchable) [size=64M]
Memory at 94000000 (64-bit, non-prefetchable) [size=32M]
I/O ports at 3000 [size=128]
[virtual] Expansion ROM at 97000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Capabilities: [78] Express Endpoint IRQ 0
00: de 10 e7 05 07 00 10 00 a1 00 02 03 00 00 00 00
10: 00 00 00 96 0c 00 00 90 00 00 00 00 04 00 00 94
20: 00 00 00 00 01 30 00 00 00 00 00 00 de 10 6a 06
30: 00 00 00 00 60 00 00 00 00 00 00 00 0b 01 00 00
40: de 10 6a 06 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 00 00 00 01 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 03 00 08 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 01 00 e0 84 00 00
80: 10 29 00 00 01 3d 00 08 08 00 01 01 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


Step 5:
cd NVIDIA_CUDA_SDK/
make

This should make all your projects in the following directory
$HOME/NVIDIA_CUDA_SDK/bin/linux/release

Execute the deviceQuery executable should show this:
./deviceQuery
There is 1 device supporting CUDA

Device 0: "Tesla C1060"
Major revision number: 1
Minor revision number: 3
Total amount of global memory: 4294705152 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.30 GHz
Concurrent copy and execution: Yes

Test PASSED

Press ENTER to exit...


Congratulations !!! Your Nvidia Tesla C1060 is configured on your system running CentOS or a similar RHEL distro. I shall post some exercises soon. For further questions, refer to the Nvidia forums.

17 comments:

  1. Why under CentOS? Can it be installed under MacOSX?

    ReplyDelete
  2. It should be relatively easy to install in Mac OSX. Haven't tried though.

    ReplyDelete
  3. Hey,

    Thanks for the article. I'm trying to do the same thing with my Mac Pro. The graphics card already installed is an ATI Radeon however. Would you happen to know if it would work alongside Tesla? Do you have the NVidia GeForce GT 120 installed in your system?

    ReplyDelete
  4. nstamato: I have an ATI graphics card installed on the Mac Pro as well. That's how the machine was configured from Apple.
    My lspci shows the ATI GPU as follows:
    05:00.0 VGA compatible controller: ATI Technologies Inc RV630 [Radeon HD 2600XT]

    Let me know if you have more questions.

    ReplyDelete
  5. Thanks for the answer. Actually I have the exact same card. After looking around, it seems that on Linux, I have linux installed on my mac as well, you can have the Tesla card along any other video card. NVidia says that it's recommended to use another NVidia card but not required. On windows however, Tesla needs to coexist with another NVidia card. Sucks to be them...;)

    ReplyDelete
  6. Oh, and one more thing: Do you power the Tesla card using two 6-pin power cables?

    ReplyDelete
  7. The NVIDIA manual suggests that you can use the GPU with only one 6-pin power cable. I have used two. The 6-pin power cables are Apple proprietary booster cables.

    ReplyDelete
  8. I havent seen the drivers for OS X. And doesnt the card need EFI support for that?

    ReplyDelete
  9. @schafdog: Installing CUDA on OSX is the easiest :) lemme know what you are trying to do. The Mac Pro dual boots into Centos and OSX and CUDA works fine on both.

    ReplyDelete
  10. So the C1060 can be installed and be used with CUDA under OS X? I have no problem developing with CUDA under OS X, but I want to make sure the C1060 is supported when running under OS X.

    ReplyDelete
  11. @imagingbook.com Have you gone through http://www.nvidia.com/docs/IO/56484/NV_C1060_UserManual_Guide_FINAL.pdf

    I have not used the C1060 GPU on OSX. The Mac Pro is usually booted into CentOS before I run my experiments.

    However I use my Macbook for development with CUDA in emulation mode. Have you tried configuring the C1060 on Mac OSX??

    ReplyDelete
  12. Fantastic! I have been searching for the past two days on ways of getting my CUDA devices (4 Tesla C1060's) recognized by my Colfax headless machine, running CentOS 5.4. Your comments about CUDA startup script were unique to your site and worked perfectly. Now most of the CUDA test codes (those that don't try to open X windows on my headless node...) work perfectly. Thanks!

    ReplyDelete
  13. Nice post... but I'm having a hell of a time finding a vendor of the cable!

    "The 6-pin power cables are Apple proprietary booster cables."

    Any links? I couldn't find anything on the Apple website...

    ReplyDelete
  14. @pdscjr I had a tough time getting those cables. Thankfully we have 2 Mac OSX clusters and so the sysadmin could order it for me. I suggest that you go to an Apple store and enquire or call Apple Tech support. I can send you what my sysadmin gave and then you can print it and take it to the Apple store

    ReplyDelete
  15. Very interesting Vivek! Everybody else says (at least on nvidia forum) that these cards are not supported because of efi?
    Any success on a Mac Pro with other tesla cards like the C2050?
    Thanx

    ReplyDelete
  16. @FLorian: I did read the corresponding post on NVIDIA forums http://forums.nvidia.com/index.php?showtopic=178702
    The Tesla C1060 works fine with CentOS on the Mac Pro and I haven’t upgraded yet to a C2050.

    ReplyDelete