macOS OpenCL - get the nonrendering GPU in Mac Pro?

ocltypes17 · Jul 10, 2017

Hello,

I am trying to get started with developing some OpenCL codes on a Mac Pro to take advantage of its dual GPUs for calculations (not rendering, just purely mathematical computations). I am following the OpenCL programming guide published by Apple, but ran into a problem with the compiler claiming a device error with the FirePro D500 GPU.

The error returned from the compiler is:
[CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: build program driver returned (-2)

From here this seemed to suggest that by selecting the secondary GPU (I guess the one that is not rendering?), that may resolve the problem:
https://github.com/bulletphysics/bullet3/issues/208

However, following the code tidbit in Apple's technical note (https://developer.apple.com/library/content/technotes/tn2335/_index.html) Xcode 8.3.3 still fails to compile, claiming that I have conflicting types for "CGLQueryRenderInfo" with the code below:

Code:

// look up available GPUs
    cl_uint num = 0;
    clGetDeviceIDs(NULL,CL_DEVICE_TYPE_GPU, 0, NULL, &num);
    cl_device_id devices[num];
    clGetDeviceIDs(NULL,CL_DEVICE_TYPE_GPU,num,devices,NULL);
    cl_context ctx = clCreateContext(NULL,num,devices,NULL,NULL,NULL);
  
    // select non-connected GPU
    CGLRendererInfoObj rend;
    GLint nrend = 0;
    GLint nonDisplayGPURendererID = 0x0;
    CGLQueryRendererInfo(0xffffffff, &rend, &nrend);
  
    for(GLint idx=0; idx<nrend; idx++) {
        GLint online = 1;
        CGLDescribeRenderer(rend, idx, kCGLRPOnline, &online);
        if(!online) {
            GLint accelerated = 0;
            CGLDescribeRenderer(rend, idx, kCGLRPAcceleratedCompute, &accelerated);
            if(accelerated) {
                CGLDescribeRenderer(rend, idx, kCGLRPRendererID,
                                    &nonDisplayGPURendererID);
                break;
            }
        }
    }
    CGLDestroyRendererInfo(rend);
  
    // Converting a renderer ID to a cl_device_id
    cl_device_id gpu = (cl_device_id)(intptr_t)(nonDisplayGPURendererID&~0xff);
  
    // Obtain a dispatch queue for GPU in system
    dispatch_queue_t queue = gcl_create_dispatch_queue(CL_DEVICE_TYPE_GPU, gpu);
  
    // Optional: check device with clGetDeviceInfo
    char name[128];
    // cl_device_id gpu = gcl_get_device_id_with_dispatch_queue(queue);
    clGetDeviceInfo(gpu, CL_DEVICE_NAME, 128, name, NULL);
    fprintf(stdout, "Created a dispatch queue using the %s\n", name);

Can anyone suggest how I should approach this problem? Is the OpenCL driver broken as some have suggested?

ocltypes17 · Jul 10, 2017

Ok, I did some more digging and it seemed that the code was able to compile without a problem, but at runtime OpenCL would still fail to reach neither the GPU nor my CPU, suggesting a driver-level problem?

Can anyone suggest a solution to this?

teagls · Jul 11, 2017

ocltypes17 said:
Ok, I did some more digging and it seemed that the code was able to compile without a problem, but at runtime OpenCL would still fail to reach neither the GPU nor my CPU, suggesting a driver-level problem?

Can anyone suggest a solution to this?

Can you post the actual error you are getting. The driver is supplied by Apple so I highly doubt that is the issue unless something is really messed up with your install.

ocltypes17 · Jul 11, 2017

This is the main C code from Apple's OpenCL Guide:

Code:

#include <stdio.h>
#include <stdlib.h>
#include <OpenCL/opencl.h>
#include "hello.cl.h"
#define NUM_VALUES 1024

int main (int argc, const char * argv[]) {
   int i; char name[128];
   dispatch_queue_t queue = gcl_create_dispatch_queue(CL_DEVICE_TYPE_GPU, NULL);
  
   if (queue == NULL) {
       queue = gcl_create_dispatch_queue(CL_DEVICE_TYPE_CPU, NULL);
   }
  
   cl_device_id gpu = gcl_get_device_id_with_dispatch_queue(queue);
   clGetDeviceInfo(gpu, CL_DEVICE_NAME, 128, name, NULL);
   fprintf(stdout, "Created a dispatch queue using the %s\n", name);
   float* test_in = (float*)malloc(sizeof(cl_float) * NUM_VALUES);
   for (i = 0; i < NUM_VALUES; i++) {
       test_in = (cl_float) i;
   }
  
   float* test_out = (float*)malloc(sizeof(cl_float) * NUM_VALUES);
   void* mem_in  = gcl_malloc(sizeof(cl_float) * NUM_VALUES, test_in, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR);
   void* mem_out = gcl_malloc(sizeof(cl_float) * NUM_VALUES, NULL, CL_MEM_WRITE_ONLY);
  
   dispatch_sync(queue, ^{
       size_t wgs;
       gcl_get_kernel_block_workgroup_info(square_kernel, CL_KERNEL_WORK_GROUP_SIZE, sizeof(wgs), &wgs, NULL);
       cl_ndrange range = {1, {0, 0, 0}, {NUM_VALUES, 0, 0}, {wgs, 0, 0}};
       square_kernel(&range,(cl_float*)mem_in, (cl_float*)mem_out);
       gcl_memcpy(test_out, mem_out, sizeof(cl_float) * NUM_VALUES);});
  
   gcl_free(mem_in); gcl_free(mem_out);
   free(test_in); free(test_out);
   dispatch_release(queue);
}

The kernel code is:

Code:

kernel void square(global float* input, global float* output) { size_t i = get_global_id(0); output = input * input; }

This is the runtime result for the sample code in Apple's OpenCL Guide:

Code:

Created a dispatch queue using the AMD Radeon HD - FirePro D500 Compute Engine [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: build program driver returned (-2) Break on OpenCLErrorBreak to debug. OpenCL Warning : clBuildProgram failed: could not build program for 0xffffffff (Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz) (err:-2) Break on OpenCLWarningBreak to debug. [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log: CVMS_ERROR_SERVICE_FAILURE: CVMS compiler has crashed or hung managing the service. Break on OpenCLErrorBreak to debug. Program ended with exit code: 0

teagls · Jul 12, 2017

There are several issues with your code. Especially the kernel. Are you compiling this in Xcode? What version are you using. I see you use dispatch_release on the GCD queue. That is extremely old and no longer used. I fixed the issues and it compiled and ran correctly for me.

The main issue is that your kernel is wrong. The syntax is not right. Also you can't multiply two pointers together. Just curious are you aware how c pointers work?

Code:

//OLD
kernel void square(global float* input, global float* output) {
size_t i = get_global_id(0);
output = input * input;
}

//NEW
__kernel void square(__global float* input, __global float* output) {
    int i = get_global_id(0);
    output[ i ] = input [ i ]* input [ i ];
}

Also your CPU code is not right either. For loading the data you need to use an index with your c pointer.

//OLD
for (i = 0; i < NUM_VALUES; i++) {
test_in = (cl_float) i;
}

//NEW
for (i = 0; i < NUM_VALUES; i++) {
      test_in[ i ] = (cl_float) i;
}

Lastly to keep it simple before you get into more complex things with work groups use this for the cl_ndrange

//OLD
dispatch_sync(queue, ^{
size_t wgs;
gcl_get_kernel_block_workgroup_info(square_kernel, CL_KERNEL_WORK_GROUP_SIZE, sizeof(wgs), &wgs, NULL);
cl_ndrange range = {1, {0, 0, 0}, {NUM_VALUES, 0, 0}, {wgs, 0, 0}};
square_kernel(&range,(cl_float*)mem_in, (cl_float*)mem_out);
gcl_memcpy(test_out, mem_out, sizeof(cl_float) * NUM_VALUES);});

gcl_free(mem_in); gcl_free(mem_out);
free(test_in); free(test_out);
dispatch_release(queue);
}

//NEW
dispatch_sync(queue, ^{
            cl_ndrange range = { 1, {0}, {NUM_VALUES}, {0} };
            square_kernel(&range,(cl_float*)mem_in, (cl_float*)mem_out);
            gcl_memcpy(test_out, mem_out, sizeof(cl_float) * NUM_VALUES);
            for (int i = 0; i < NUM_VALUES; i++) {
                printf("\n%.2f",test_out [ i ]);
            }
            gcl_free(mem_in); gcl_free(mem_out);
        });
free(test_in); free(test_out);

ocltypes17 · Jul 12, 2017

Yes, I am compiling this in Xcode 8.3.3. I believe that it is the latest version? Admittedly I have only gotten started with C very recently. Prior to this it was mostly Perl and Python so I am still not very familiar with C-type pointers.

I thought that the double underline prefix for kernels was not required in Xcode? Apple did not use those in their guide, which was also kind of old (2013?): https://developer.apple.com/library...ide/ExampleHelloWorld/Example_HelloWorld.html

There was still an issue that I could not resolve, as gcl_create_dispatch_queue() only called CL_DEVICE_TYPE_GPU, but for some reason the binary kept churning out runtime warnings and errors that said,

[CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: build program driver returned (-2)
Break on OpenCLErrorBreak to debug.
OpenCL Warning : clBuildProgram failed: could not build program for 0xffffffff (Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz) (err:-2)
Break on OpenCLWarningBreak to debug.
[CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
CVMS_ERROR_SERVICE_FAILURE: CVMS compiler has crashed or hung managing the service.
Break on OpenCLErrorBreak to debug.

and the same error code -2 from CL_DEVICE_NOT_AVAILBLE and the warning message actually made me wonder if they were actually referring to the same problem, where OpenCL tried to compile the code for the CPU and failed, despite that I asked explicitly only to queue GPU's.

Is there a newer guide than the one Apple supplied that I can refer to as well?

teagls · Jul 12, 2017

ocltypes17 said:
Yes, I am compiling this in Xcode 8.3.3. I believe that it is the latest version? Admittedly I have only gotten started with C very recently. Prior to this it was mostly Perl and Python so I am still not very familiar with C-type pointers.

I thought that the double underline prefix for kernels was not required in Xcode? Apple did not use those in their guide, which was also kind of old (2013?): https://developer.apple.com/library...ide/ExampleHelloWorld/Example_HelloWorld.html

There was still an issue that I could not resolve, as gcl_create_dispatch_queue() only called CL_DEVICE_TYPE_GPU, but for some reason the binary kept churning out runtime warnings and errors that said,

and the same error code -2 from CL_DEVICE_NOT_AVAILBLE and the warning message actually made me wonder if they were actually referring to the same problem, where OpenCL tried to compile the code for the CPU and failed, despite that I asked explicitly only to queue GPU's.

Is there a newer guide than the one Apple supplied that I can refer to as well?

Yeh, unfortunately I think that guide is terribly outdated. I found an example of that guide on GitHub that somebody had put together. It does not compile nor work for me.
https://github.com/danieljfarrell/Xcode-4-OpenCL-Example

While this example does not use Xcode per say it could be put into an Xcode project. It's quite easy to run. Just use terminal and cd into the directory and type make to compile. There is a small guide as well.

https://www.eriksmistad.no/getting-started-with-opencl-and-gpu-computing/
https://github.com/smistad/OpenCL-Getting-Started

Also if you are more familiar with python there are python bindings for OpenCL with examples.
https://mathema.tician.de/software/pyopencl/
https://github.com/pyopencl/pyopencl/tree/master/examples

foobarbaz · Jul 15, 2017

teagls said:
I see you use dispatch_release on the GCD queue. That is extremely old and no longer used.

Careful, that only applies to ARC, not plain C. (And the compiler would complain if that was the case.)

rsacker · Sep 12, 2017

ocltypes17 said:
Hello,

I am trying to get started with developing some OpenCL codes on a Mac Pro to take advantage of its dual GPUs for calculations (not rendering, just purely mathematical computations). I am following the OpenCL programming guide published by Apple, but ran into a problem with the compiler claiming a device error with the FirePro D500 GPU.

The error returned from the compiler is:
[CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: build program driver returned (-2)

From here this seemed to suggest that by selecting the secondary GPU (I guess the one that is not rendering?), that may resolve the problem:
https://github.com/bulletphysics/bullet3/issues/208

However, following the code tidbit in Apple's technical note (https://developer.apple.com/library/content/technotes/tn2335/_index.html) Xcode 8.3.3 still fails to compile, claiming that I have conflicting types for "CGLQueryRenderInfo" with the code below:

Code:

// look up available GPUs cl_uint num = 0; clGetDeviceIDs(NULL,CL_DEVICE_TYPE_GPU, 0, NULL, &num); cl_device_id devices[num]; clGetDeviceIDs(NULL,CL_DEVICE_TYPE_GPU,num,devices,NULL); cl_context ctx = clCreateContext(NULL,num,devices,NULL,NULL,NULL); // select non-connected GPU CGLRendererInfoObj rend; GLint nrend = 0; GLint nonDisplayGPURendererID = 0x0; CGLQueryRendererInfo(0xffffffff, &rend, &nrend); for(GLint idx=0; idx<nrend; idx++) { GLint online = 1; CGLDescribeRenderer(rend, idx, kCGLRPOnline, &online); if(!online) { GLint accelerated = 0; CGLDescribeRenderer(rend, idx, kCGLRPAcceleratedCompute, &accelerated); if(accelerated) { CGLDescribeRenderer(rend, idx, kCGLRPRendererID, &nonDisplayGPURendererID); break; } } } CGLDestroyRendererInfo(rend); // Converting a renderer ID to a cl_device_id cl_device_id gpu = (cl_device_id)(intptr_t)(nonDisplayGPURendererID&~0xff); // Obtain a dispatch queue for GPU in system dispatch_queue_t queue = gcl_create_dispatch_queue(CL_DEVICE_TYPE_GPU, gpu); // Optional: check device with clGetDeviceInfo char name[128]; // cl_device_id gpu = gcl_get_device_id_with_dispatch_queue(queue); clGetDeviceInfo(gpu, CL_DEVICE_NAME, 128, name, NULL); fprintf(stdout, "Created a dispatch queue using the %s\n", name);

Can anyone suggest how I should approach this problem? Is the OpenCL driver broken as some have suggested?

[doublepost=1505256899][/doublepost][doublepost=1505257439][/doublepost]The article between teagls and ocltypes17 regarding "getting the nonRendering GPU on the MacPro", there is an underlying question that went unanswered, namely once you get the nonDisplayCLDeviceId, what the heck do you do with it. It just sits there. Logically it should be input to the gcl_create_dispatch_queue command. The other somewhat unrelated question is how to test the nonDisplay device? Mine were replaced by Apple after they heated up and the display went coocoo but that says nothing about its current state.

Benjinator · Sep 16, 2017

ocltypes17 said:
Yes, I am compiling this in Xcode 8.3.3. I believe that it is the latest version? Admittedly I have only gotten started with C very recently. Prior to this it was mostly Perl and Python so I am still not very familiar with C-type pointers.

I thought that the double underline prefix for kernels was not required in Xcode? Apple did not use those in their guide, which was also kind of old (2013?): https://developer.apple.com/library...ide/ExampleHelloWorld/Example_HelloWorld.html

There was still an issue that I could not resolve, as gcl_create_dispatch_queue() only called CL_DEVICE_TYPE_GPU, but for some reason the binary kept churning out runtime warnings and errors that said,

and the same error code -2 from CL_DEVICE_NOT_AVAILBLE and the warning message actually made me wonder if they were actually referring to the same problem, where OpenCL tried to compile the code for the CPU and failed, despite that I asked explicitly only to queue GPU's.

Is there a newer guide than the one Apple supplied that I can refer to as well?

This is kind of late but if you're familiar with python you may want to look into pyopencl. I've been using it for a while and works very well.

Search

Search

macOS OpenCL - get the nonrendering GPU in Mac Pro?

ocltypes17

macrumors newbie

ocltypes17

macrumors newbie

teagls

macrumors regular

ocltypes17

macrumors newbie

teagls

macrumors regular

ocltypes17

macrumors newbie

teagls

macrumors regular

foobarbaz

macrumors 65816

rsacker

macrumors newbie

Benjinator

macrumors newbie

Our Staff