CUDA

nvvp does not support new GPUs (starting with Volta/Turing, sm_7*). nvvp (GUI) and nvprof (CLI) are marked as „deprecated“ in the documentation, they are being replaced by Nsight Compute (for profiling CUDA kernels) and Nsight Systems (for profiling global things, including CPU and „timeline“). These new tools are finally Java-free, but still have some quirks (e.g. remote connections ignore .ssh/config). I also could not find a way to export the report to PDF. Both of these tools require at least sm_7*. They can be launched by the ncu-ui and nsys-ui commands.

Miscellaneous

CUDA syntax cheatsheet
__ldg() instruction (obsoleted by the „unified cache“ in the Pascal architecture)
Demystifying GPU Microarchitecture through Microbenchmarking
Useful nvidia-smi queries
from http://stackoverflow.com/questions/43953872/need-help-to-deal-with-overlap-of-gpu-and-cpu-execution „unless you put markers in the host code, there is no direct way from the profiler to tell what exactly is overlapping with the kernel call“ markers: http://docs.nvidia.com/cuda/profiler-users-guide/index.html#marking-regions-of-cpu-activity
What is the canonical way to check for errors using the CUDA runtime API?
CUDA toolkit documentation: Incomplete-LU and Cholesky Preconditioned Iterative Methods Using cuSPARSE and cuBLAS
CUDA Driver API vs. CUDA runtime
Reduce and Scan
julia:
- http://mikeinnes.github.io/2017/08/24/cudanative.html
- https://devblogs.nvidia.com/gpu-computing-julia-programming-language/
cuobjdump -res-usage <binary-file-name> dumps resource usage of all kernels in the binary file – see CUDA Binary Utilities
Dan Alcantara - hashing on GPUs (implemented in CUDPP)
A Simple GPU Hash Table (code)
Virtual functions in CUDA:
- https://github.com/mstniy/cuda_virtual_interplay
- https://migocpp.wordpress.com/2018/03/26/virtual-cuda/
ZLUDA is CUDA for AMD GPUs

Parallel forall blog posts

CUDA pro tips

Compilers

Tools

Miscellaneous

Popular articles

Benchmarks