[OG9, amdgcn, committed] Detect the actual number of hardware CUs
This patch improves out-of-the-box benchmark results by ensuring that we
don't launch 64 gangs on a device that only has 60 compute units, such
as consumer Vega 20.
It's not suitable for upstream mainline yet because we need to update
hsa.h with definitions from Radeon Open Compute Runtime (ROCr), but
there are license issues with that. We could extract them from the
documentation, but this is still on my TODO list.
+/* Additional definitions not in HSA 1.1.
+ FIXME: this needs to be updated in hsa.h for upstream, but the only source
+ right now is the ROCr source which may cause license issues. */
+#define HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT 0xA002
/* These probably won't be in elf.h for a while. */
#define R_AMDGPU_NONE 0
#define R_AMDGPU_ABS32_LO 1 /* (S + A) & 0xFFFFFFFF */
@@ -845,6 +850,14 @@ dump_hsa_agent_info (hsa_agent_t agent, void *data __attribute__((unused)))
HSA_DEBUG ("HSA_AGENT_INFO_DEVICE: FAILED\n");
def->ndim = 3;
- /* Fiji has 64 CUs. */
- def->gdims = (gcn_teams > 0) ? gcn_teams : 64;
+ /* Fiji has 64 CUs, but Vega20 has 60. */
+ def->gdims = (gcn_teams > 0) ? gcn_teams : get_cu_count (agent);
/* Each thread is 64 work items wide. */
def->gdims = 64;
/* A work group can have 16 wavefronts. */
@@ -3308,7 +3333,7 @@ gcn_exec (struct kernel_info *kernel, size_t mapnum, void **hostaddrs,
problem size, so let's do a reasonable number of single-worker gangs.
64 gangs matches a typical Fiji device. */