
-analysis-metrics: collect all metrics for import into nvvp.-o: creates an output file which can be imported into nvvp.Bottom half of the profile is runtime measured from the CPU perspective.(PGI) OpenACC kernels will be named after the subroutine name and line number.Memcpy XtoX is memory copy to and from the GPU Top half of the profile is runtime measured from the GPU perspective.vecAdd =34092= NVPROF is profiling process 34092, command.

Name cudaMalloc cudaMemcpy cuDeviceGetAttribute cudaFree cuDeviceTotalMem cuDeviceGetName cudaLaunch cudaSetupArgument cuDeviceGet cuDeviceGetCount cudaConfigureCall INTEGER :: i i = (blockIdx%x – 1) * blockDim%x + threadIdx%x if (i <= n) then c(i) = a(i) + b(i) end if end subroutine vecAdd_GPU Example: vector addition from yesterday’s talkĪttributes(global) subroutine vecAdd_GPU(c, a, b, n) INTEGER, value :: n REAL, device, intent(in) :: a(n), b(n) REAL, device, intent(out) :: c(n).This talk We will focus on nvprof and nvvp nvprof => NVIDIA profiler Command line Integrated into Nsight Eclipse Edition (nsight) Edition (nsight).Profiling Tools Many options! From FromNVIDIA NVIDIA

Advanced GPU Topics #1 Jeremy Appleyard, September 2015
