Ultron: New GPU Nodes In Beta Testing

Two new nodes have been added to our HPC cluster, Ultron.

These nodes each contain two Titan V GPUs.  These are the newest V100 core type from Nvidia. Each V100 core has 5120 CUDA cores and 12GB HBM2 memory.  HBM2 memory is significant because of the incredibly fast transfer speed to the compute cores — 1.7Gbps!

This all equates to 110 teraflops of performance per GPU, for a total added compute of 440 teraflops (not including the new CPU cores)!  While not all applications support GPU processing, those that do benefit greatly from the process.

The servers, node11 and node12, have dual 10-core Intel 2.4GHz E5-2640 v4 CPUs.  Each has 128GB RAM and a local SSD for scratch storage (in /local) if needed.

If you wish to test the new nodes, you can submit a job.  I added the “feature” parameter to the queuing system to differentiate between the old GPUs (K80) and the new GPUs (V100).  Note that the nodes are still being performance tuned, so there may be unexpected interruptions during the beta phase.

For example, if you wanted to use 2 V100 gpus on 1 node:

qsub -l nodes=1,ppn=20,gpus=2,feature=V100

If you do submit a job, please email me with any issues and performance feedback.  I you would like me to monitor your job on the backend, send me an email before submitting it.

Here is the total spec sheet for Ultron in its current state:

Ultron Specifications

Head node with 24TB ultra-fast SSD storage for user homes
12 total compute nodes
10 Compute nodes each have dual 14-core Intel 2.4GHz E5-2680 v4 CPU’s
2 Compute nodes each have dual 10-core Intel 2.4GHz E5-2640 v4 CPU’s
10 Compute nodes each have 256GB RAM
2 Compute nodes each have 128GB RAM
Compute nodes each have 512GB SSD’s for local scratch space (/local)
Two Nvidia Tesla K80’s (two cores each) are available for GP-GPU / CUDA calculations (node10)
Four Nvidia Titan V100’s are available for GP-GPU / CUDA calculations (node11 and node12)
Cluster communication is via 56Gb/s FDR Infiniband

In total, there are 320 Xeon cores and 2.82TB RAM (CPU cores and RAM)
In total, there are 30,464 CUDA cores and 96GB GDDR5 (GPU cores and RAM)