Я установил Intel OpenCL SDK и хотел создать проект. Visual Studio 2017 показала мне эти два варианта и третий «Пустой проект OpenCL». Я не знаю, в чем разница между ними. Я попытался просмотреть код шаблона, но так как я (пока) ничего не знаю об OpenCL, я не мог понять их разницу.
Заголовок лицензии:
/*****************************************************************************
* Copyright (c) 2013-2016 Intel Corporation
* All rights reserved.
*
* WARRANTY DISCLAIMER
*
* THESE MATERIALS ARE PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS
* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
* OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING
* NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THESE
* MATERIALS, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* Intel Corporation is the author of the Materials, and requests that all
* problem reports or change requests be submitted to it directly
*****************************************************************************/
Я запустил diff как предложено:
625,629c625,626
< // Create new OpenCL buffer objects
< // As these buffer are used only for read by the kernel, you are recommended to create it with flag CL_MEM_READ_ONLY.
< // Always set minimal read/write flags for buffers, it may lead to better performance because it allows runtime
< // to better organize data copying.
< // You use CL_MEM_COPY_HOST_PTR here, because the buffers should be populated with bytes at inputA and inputB.
---
> cl_image_format format;
> cl_image_desc desc;
631c628,650
< ocl->srcA = clCreateBuffer(ocl->context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint) * arrayWidth * arrayHeight, inputA, &err);
---
> // Define the image data-type and order -
> // one channel (R) with unit values
> format.image_channel_data_type = CL_UNSIGNED_INT32;
> format.image_channel_order = CL_R;
>
> // Define the image properties (descriptor)
> desc.image_type = CL_MEM_OBJECT_IMAGE2D;
> desc.image_width = arrayWidth;
> desc.image_height = arrayHeight;
> desc.image_depth = 0;
> desc.image_array_size = 1;
> desc.image_row_pitch = 0;
> desc.image_slice_pitch = 0;
> desc.num_mip_levels = 0;
> desc.num_samples = 0;
> #ifdef CL_VERSION_2_0
> desc.mem_object = NULL;
> #else
> desc.buffer = NULL;
> #endif
>
> // Create first image based on host memory inputA
> ocl->srcA = clCreateImage(ocl->context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, &format, &desc, inputA, &err);
634c653
< LogError("Error: clCreateBuffer for srcA returned %s\n", TranslateOpenCLError(err));
---
> LogError("Error: clCreateImage for srcA returned %s\n", TranslateOpenCLError(err));
638c657,658
< ocl->srcB = clCreateBuffer(ocl->context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint) * arrayWidth * arrayHeight, inputB, &err);
---
> // Create second image based on host memory inputB
> ocl->srcB = clCreateImage(ocl->context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, &format, &desc, inputB, &err);
641c661
< LogError("Error: clCreateBuffer for srcB returned %s\n", TranslateOpenCLError(err));
---
> LogError("Error: clCreateImage for srcB returned %s\n", TranslateOpenCLError(err));
645,649c665,666
< // If the output buffer is created directly on top of output buffer using CL_MEM_USE_HOST_PTR,
< // then, depending on the OpenCL runtime implementation and hardware capabilities,
< // it may save you not necessary data copying.
< // As it is known that output buffer will be write only, you explicitly declare it using CL_MEM_WRITE_ONLY.
< ocl->dstMem = clCreateBuffer(ocl->context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint) * arrayWidth * arrayHeight, outputC, &err);
---
> // Create third (output) image based on host memory outputC
> ocl->dstMem = clCreateImage(ocl->context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, &format, &desc, outputC, &err);
652c669
< LogError("Error: clCreateBuffer for dstMem returned %s\n", TranslateOpenCLError(err));
---
> LogError("Error: clCreateImage for dstMem returned %s\n", TranslateOpenCLError(err));
734c751,755
< cl_int *resultPtr = (cl_int *)clEnqueueMapBuffer(ocl->commandQueue, ocl->dstMem, true, CL_MAP_READ, 0, sizeof(cl_uint) * width * height, 0, NULL, NULL, &err);
---
> size_t origin[] = {0, 0, 0};
> size_t region[] = {width, height, 1};
> size_t image_row_pitch;
> size_t image_slice_pitch;
> cl_int *resultPtr = (cl_int *)clEnqueueMapImage(ocl->commandQueue, ocl->dstMem, true, CL_MAP_READ, origin, region, &image_row_pitch, &image_slice_pitch, 0, NULL, NULL, &err);
783c804
< cl_device_type deviceType = CL_DEVICE_TYPE_CPU;
---
> cl_device_type deviceType = CL_DEVICE_TYPE_GPU;
Я также мог бы вставить в два полных исходных файла, но они длинные (900 строк).