Here it is some test results to compare performance of CPU+GPU vs CPUs in ANSYS Fluent.
Formulation of the task: simulation of the water flow into the circular pipe.
BC: inlet - V=1 [m/s], T=300[K]; outlet - static pressure 0 Pa; sidewall - wall heat flux 1e5 [W/m2].
SIMPLE algorithm.
Mesh size: 2.1, 3.7 or 12.4 mln cells.
Mesh type: unstructured mesh made by Sweep method with Inflation layers near sidewall.
Initial conditions: V=(0,0,1) [m/s], P=0 [Pa], T=300 [K].
Number of iterations: 10.
Performance results (Wall clock time for 10 iterations):
(3.7 mln cells) task results (at this size of task it is enough of GTX 1660 SUPER vRAM in SINGLE precision, but in DOUBLE precision it isn't):
DOUBLE precision solver:
* ANSYS Fluent 2023 R1 Native GPU Solver: AMD Ryzen 5900x 12 cores SMT off (4x32 GB DDR4-3000 MT/s ECC Unbuffered, dual channel) + NVIDIA Geforce 1660 Super 6 GB vRAM (vRAM bandwidth computed by ANSYS Fluent: 320 GB/s): ~7500 sec, vRAM usage 5.9 GB.
* ANSYS Fluent 2023 R1 CPU Solver: AMD Ryzen 5900x 12 cores: 53.43 sec; Peak RAM usage - 10.24 GB.
* ANSYS Fluent 17.0 CPU Solver: 2 servers of dual AMD EPYC 7532, 128 cores total, SMT off; per processor: 8x64 GB DDR4-2933 MT/s ECC Reg, 2R, total 32 memory channels in 4 processors; custom liquid cooling of CPU's: 6.67 sec; Peak Resident RAM usage - 16.6 GB; Peak Virtual RAM usage - 181.2 GB.
* ANSYS Fluent 17.0 CPU Solver: 4 servers of dual AMD Opteron 6380, 128 cores total (64 FPU, 128 IPU); per processor: 4x16 GB DDR3-1600 MT/s ECC Reg, total 32 memory channels in 8 processors: 17.01 sec; Peak Resident RAM usage - 17.8 GB; Peak Virtual RAM usage - 180.8 GB.
* (Coupled/Pseudo transient method) ANSYS Fluent 17.0 CPU Solver: 2 servers of dual AMD EPYC 7532, 128 cores total, SMT off; per processor: 8x64 GB DDR4-2933 MT/s ECC Reg, 2R, total 32 memory channels in 4 processors; custom liquid cooling of CPU's: 13.3 sec.
* (Coupled/Pseudo transient method) ANSYS Fluent 17.0 CPU Solver: 4 servers of dual AMD Opteron 6380, 128 cores total (64 FPU, 128 IPU); per processor: 4x16 GB DDR3-1600 MT/s ECC Reg, total 32 memory channels in 8 processors: 38.2 sec.
SINGLE precision solver:
* Native GPU Solver: AMD Ryzen 5900x 12 cores + NVIDIA Geforce 1660 Super 6 GB vRAM: 17.0 sec.
* Native GPU Solver: AMD Ryzen 5900x 2 cores + NVIDIA Geforce 1660 Super 6 GB vRAM: 7.9 sec, vRAM usage - 4.2 GB, peak RAM usage - 8.2 GB.
* CPU Solver: AMD Ryzen 5900x 12 cores: 77.88 sec; Peak RAM usage - 7.53 GB.
* ANSYS Fluent 17.0 CPU Solver: 2 servers of dual AMD EPYC 7532, 128 cores total, SMT off; per processor: 8x64 GB DDR4-2933 MT/s ECC Reg, 2R, total 32 memory channels in 4 processors; custom liquid cooling of CPU's: 6.33 sec; Peak Resident RAM usage - 13.4 GB; Peak Virtual RAM usage - 177.6 GB.
* ANSYS Fluent 17.0 CPU Solver: 4 servers of dual AMD Opteron 6380, 128 cores total (64 FPU, 128 IPU); per processor: 4x16 GB DDR3-1600 MT/s ECC Reg, total 32 memory channels in 8 processors: 15.12 sec; Peak Resident RAM usage - 14.6 GB; Peak Virtual RAM usage - 177.1 GB.
(2.1 mln cells) task results, ANSYS Fluent 2023 R1 (at this size of task it is enough of GTX 1660 SUPER vRAM in SINGLE and DOUBLE precision):
* Native GPU Solver, SINGLE precision, AMD Ryzen 5900x 2 cores (SMT off) + NVIDIA Geforce 1660 Super 6 GB vRAM: 5.08 sec; Peak vRAM usage by Solver - 2.47 GB; Peak RAM usage by Solver - 5.40 GB.
* Native GPU Solver, DOUBLE precision, AMD Ryzen 5900x 2 cores (SMT off) + NVIDIA Geforce 1660 Super 6 GB vRAM: 8.17 sec; Peak vRAM usage by Solver - 3.54 GB; Peak RAM usage by Solver - 6.01 GB.
* Native GPU Solver, SINGLE precision, AMD Ryzen 5900x 6 cores (SMT off) + NVIDIA Geforce 1660 Super 6 GB vRAM: 7.37 sec; Peak vRAM usage by Solver - 3.66 GB; Peak RAM usage by Solver - 7.29 GB.
* Native GPU Solver, DOUBLE precision, AMD Ryzen 5900x 6 cores (SMT off) + NVIDIA Geforce 1660 Super 6 GB vRAM: 9.40 sec; Peak vRAM usage by Solver - 4.47 GB; Peak RAM usage by Solver - 8.55 GB.
* CPU Solver, SINGLE precision, AMD Ryzen 5900x 6 cores (SMT off): 44.15 sec; Peak RAM usage by Solver - 4.12 GB.
* CPU Solver, DOUBLE precision, AMD Ryzen 5900x 6 cores (SMT off): 30.96 sec; Peak RAM usage by Solver - 5.58 GB.
* CPU Solver, SINGLE precision, AMD Ryzen 5900x 12 cores (SMT off): 35.67 sec; Peak RAM usage by Solver - 4.81 GB.
* CPU Solver, DOUBLE precision, AMD Ryzen 5900x 12 cores (SMT off): 27.39 sec; Peak RAM usage by Solver - 6.22 GB.
(12.4 mln cells) task results, ANSYS Fluent 2023 R1 (at this size of task it is not enough of GTX 1660 SUPER vRAM in SINGLE and DOUBLE precision):
* Native GPU Solver, SINGLE precision solver: AMD Ryzen 5900x 2 cores + NVIDIA Geforce 1660 Super 6 GB vRAM: 23760 sec, vRAM usage - 5.9 GB.
* CPU Solver, SINGLE precision solver: AMD Ryzen 5900x 12 cores: 301 sec, peak RAM usage - 20.1 GB.
In case when vRAM is enough for task, performance of the ANSYS Fluent with GPU solver on low-mid range graphic card in gaming-PC is equivalent to dual server cluster that is based on Zen2 EPYC with system price about 45 times higher!
In case when the task is requested more vRAM than available for GPU into the system, it was observed 30-40% PCI-E bus loading during computation with extremely low overall performance of simulation. Data transfer via x16 PCI-E 4.0 is almost stopping the simulation.
So, new Native GPU solver is aimed, first of all, on modern GPU clusters with multiple NVIDIA H100 80 GB vRAM that are connected via NVLink/NVSwitch [900 GB/s]. In this case performance will be huge.
In case of usage Geforce/Quadro GPU's it is possible to solve small tasks with high performance.
According to information from NVIDIA.COM, compute performance of NVIDIA Geforce GTX 1660 SUPER 6 GB in FP32 - 5.0 TFLOPS, in FP64 - 0.15 TFLOPS. But in ANSYS the performance difference between single and double precision was less than 1.6x.