| CPUs | GPUs |
| program using stl::thread, et al., for example |
program using CUDA or OpenCL, for example |
cycle1 | - 12 processors, each with 6 NUMA cores
- ~124 GB shared RAM
|
- Quadro 6000
14 multiprocessors (32 cores each) ⇒ 448 cores ~6.5 GB global memory
- Tesla C2075
14 multiprocessors (32 cores each) ⇒ 448 cores ~5.6 GB global memory
|
cycle2 | - 24 processors, each with 6 NUMA cores
- ~124 GB shared RAM
|
- Quadro 6000
14 multiprocessors (32 cores each) ⇒ 448 cores ~6.5 GB global memory
- Tesla C2075
14 multiprocessors (32 cores each) ⇒ 448 cores ~5.6 GB global memory
Details |
cycle3 | - 12 processors, each with 6 NUMA cores
- ~124 GB shared RAM
|
- Quadro 6000
14 multiprocessors (32 cores each) ⇒ 448 cores ~6.5 GB global memory
- Tesla C2075
14 multiprocessors (32 cores each) ⇒ 448 cores ~5.6 GB global memory
Details |