Name Strings

   cl_altera_compiler_mode

Contributors

   David Neto, Altera Corporation

Contact

   Michael Kinsner, mkinsner 'at' altera 'dot' com

IP Status

   No known IP claims.

Version

   Version 1, 2014-02-06

Number

   TBD

Status

   Complete. 
   Shipping with the Intel(R) FPGA SDK for OpenCL(TM), version 14.0

   An earlier version shipped with the Intel(R) FPGA SDK for OpenCL(TM), version 13.0

Extension Type

   OpenCL platform extension

Dependencies

   OpenCL 1.0 is required.
   This document is written against revision 48 of the OpenCL 1.0 specification.

Overview

   This extension specifies alternative device code compilation flows
   for OpenCL.

   In standard OpenCL, there are two methods for preparing code for
   execution on an OpenCL device.  The first is to compile OpenCL C code
   from source text using an online compiler.  The second is to load a
   precompiled device binary.

   This extension specifies mechanisms to enable alternative workflows for
   compiling and loading device code.  They enable simplified or optimized
   support for the following development and deployment scenarios:

    1. The target OpenCL device is not available in the development
       environment.

    2. The platform does not provide an online compiler for the target
       OpenCL device, or full online compilation is not desirable during
       host application development.

    3. The application only requires one device program, and does not
       need read access to the device binary.

   The development and deployment of a particular application may involve
   more than one of these scenarios.

   The mechanisms specified by this extension are:

    1. An "offline device" feature:  This is the ability to specify that
       the OpenCL runtime should partially emulate the presence a device.
       From the perspective of the host program, emulation should be
       complete, except that kernel code may have no effect.  For example,
       kernels are enqueued, buffers are copied or mapped as required,
       and event profiling information is updated to reflect command
       progression.  However side effects produced by running the code
       inside kernels may not be visible.

    2. A "create program executable library" compiler mode:  
       The clBuildProgram API method does not fully compile device code.
       Instead, it performs an online stub compilation, and saves enough
       information in a data store to perform a deferred full (offline)
       compilation.

       2a. The stub compilation produces a binary which includes enough
          kernel interface information to satisfy further host activities
          such as creating cl_kernel objects, setting kernel arguments,
          and enqueueing kernels for execution.  But the stub binary
          may not have executable code, so the execution of kernels from
          such a binary may not produce valid data.

       2b. The data store is used to perform offline compilation of
          kernel programs.  Each logically distinct call of clBuildProgram
          generates an entry in the data store.  Each entry is a tuple
          consisting of:

         - The target device 

         - The kernel program source 

         - The options supplied to clBuildProgram 

         - A set of instructions (such as a script) for offline
           compilation of the kernel program.  The result of executing
           these instructions is to associate a device binary (including
           executable device code) with this tuple in the data store.

    3. A "use program executable library" mode:  We assume that an
       offline compilation has been performed for all entries in the
       data store generated by use of the "create program executable
       library" mode.  In the "use program exectuable library" mode,
       the clBuildProgram API method performs a lookup in the data store
       based on the following attributes:

         - The target device 

         - The kernel program source 

         - The options supplied to clBuildProgram

       If such an entry exists in the data store, then the associated
       binary is loaded into the cl_program object.  If no such entry
       exists, or if the entry is not associated with a device binary,
       then an error is returned.

    4. A "preloaded binary only" compiler mode:  In this mode, the
       runtime does not compile any programs, does not load the device
       with new code at runtime, and does not necessarily produce a valid
       device binary when requested by the CL_PROGRAM_BINARIES query to
       the clGetProgramInfo API method.  Instead:

         - The device is assumed to already have a program loaded 
           before the cl_context is created.

         - All cl_program objects behave as if they refer to the preloaded
           binary

         - The clBuildProgram API method is a no-op except for
           setting a successful build status.

   The following paragraphs describe the benefits provided by the
   mechanisms specified in this extension.

   Scenario 1: Developing the host part of an OpenCL application without
   a device being present:

       The offline device feature enables the execution of a host program
       even while a device is not present in the system.  This is
       useful for developing an application before a device has been
       manufactured, is otherwise scarce, or unavailable.  Although the
       data coming back from the emulated device is invalid, host code
       interaction with the runtime may be developed and tested.

   Scenario 2: Porting an existing application to a platform without an
   online compiler:

       Many OpenCL platforms provide an online OpenCL C compiler
       for each device.  Therefore host applications often use
       clCreateProgramWithSource and clBuildProgram to create
       executable device code.  The standard method of porting such
       a program to an environment without an online compiler is to
       replace the compile-from-source sequence of API calls with a
       find-and-load-from-binary code sequence.  Transforming the host
       application in this way introduces platform dependencies (such
       as APIs to access a filesystem) and adds bookkeeping complexity
       to map the original OpenCL C source and compile options with
       an associated device binary.  The porting and verification effort
       can be quite onerous if many portions of the host code must be
       modified, or if the device source code is parameterized at runtime.

       We can reduce porting effort in this scenario by using a three-phase
       approach.
       
       In the first phase, we run the host program with the runtime
       configured to use offline device mode, and also the "create program
       executable libary" compiler mode.  This generates a data store
       (the program executable library) containing enough information
       to compile the required device programs in an offline manner,
       i.e. outside the control of the host program.

       In the second phase, we perform an offline compilation of all
       entries in the data store.  This is done by enumerating the entries
       in the data store and following the compilation instructions for
       each entry.

       In the third phase, the application is fully functional.  The host
       is configured to use "use program executable library" compiler mode.
       Any API request to compile a device program from source is translated
       into a lookup of the (fully functional) device binary in the data store.

       This approach works when:

          - The device has a stable name. That is, the device name does not
            change from one run to another.

          - The host program calls clBuildProgram with only a finite
            set of combinations of device, kernel source, and build
            options.

          - Those combinations are consistent from one run to the next.
            That is, the kernel source and build options do not depend
            on the data results from executing a previous kernel.

          - Other context upon which the compilation depends is stable.
            For example, the location and contents of files included
            from the OpenCL C source remains the same.

   Scenario 3: An application needs only one cl_program, runs in a
   constrained environment, and may need "instant on" behaviour:

      The "preloaded binary only" compiler mode is desiged to be used
      by embedded applications.  Such applications typically have
      the following constraints:

          - The application is simple enough that only one cl_program
            is required.

          - The target system has tight memory and time constraints.
            It may be infeasible or undesirable to process a full device
            binary through the standard OpenCL runtime APIs.  For example,
            the device binary might be too large to map into host memory,
            or the application cannot tolerate the delay to load the
            device binary from external storage.

          - The application has no requirement to recover the device
            binary via the CL_PROGRAM_BINARIES query to clGetProgramInfo.
            Even moreso, it may be undesirable for intellectual property
            reasons or otherwise to allow the host program to recover
            the device binary.

          - It is desirable to develop the application in the normal way,
            but to minimize host program code changes before deployment.
            That is, application development uses the standard
            clCreateProgramWithSource, or clCreateProgramWithBinary APIs.
            But the transition to a test or production environment
            is limited to a configuration change, as opposed to a code
            change.

Header File

   Interface constants are defined in cl_ext.h

New Tokens

   New context properties to be used in the <properties> array
   supplied to to clCreateContext or clCreateContextFromType:

      CL_CONTEXT_COMPILER_MODE_INTELFPGA                             0x40F0
      CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_INTELFPGA                  0x40F1
      CL_CONTEXT_OFFLINE_DEVICE_INTELFPGA                            0x40F2

   Values to be supplied for context property
   CL_CONTEXT_COMPILER_MODE_INTELFPGA:

      CL_CONTEXT_COMPILER_MODE_OFFLINE_INTELFPGA                     0
      CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_INTELFPGA  1
      CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_INTELFPGA     2
      CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_INTELFPGA       3



Additions to Chapter 4 of the OpenCL 1.0 (v48) Specification

   Add the following to Table 4.4, List of supported properties by
   clCreateContext:


      cl_context_properties enum:
         CL_CONTEXT_OFFLINE_DEVICE_INTELFPGA
      Property value:
         const char*
      Description:
         Specifies that the runtime should partly emulate the presence
         of the named device.  The device should behave normally except
         that kernels executed on such an "offline" device might
         not produce any side effects.

         The specified name value should be the first word in the full
         name returned by the CL_DEVICE_NAME, in C-style form.

         An implementation may restrict the use of this property as
         follows:

            - A platform may require external initialization when using
            this context property.  For example, Altera's
            platform implementations require that environment variable
            CL_CONTEXT_OFFLINE_DEVICE_INTELFPGA be set to the same device
            name string as supplied to this property.

            - When this property is specified, the specified device
            may be the only available device in the platform.

         This property may be useful for developing or porting
         applications when no online compiler is available, and
         when the CL_CONTEXT_COMPILER_MODE_INTELFPGA property is set to
         CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_INTELFPGA.


      cl_context_properties enum:
         CL_CONTEXT_COMPILER_MODE_INTELFPGA
      Property value:
         cl_ulong
      Description:
         For devices without an online compiler, this property
         specifies alternative behaviour for building, loading,
         and querying device programs.
         (Platforms implementing the embedded profile are the
         only ones that may omit providing an online compiler.)

         The value should be one of the following:
            CL_CONTEXT_COMPILER_MODE_OFFLINE_INTELFPGA,
            CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_INTELFPGA,
            CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_INTELFPGA,
            CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_INTELFPGA.

         CL_CONTEXT_COMPILER_MODE_OFFLINE_INTELFPGA - The clBuildProgram
            and clCreateProgramWithBinary, and clGetProgramInfo
            API methods behave in the standard way.  
            In particular, clBuildProgram may fail with error
            CL_COMPILER_NOT_AVAILABLE.
            This value is the default, and is the only value
            which specifies conformant behaviour.

         CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_INTELFPGA - The device
            is assumed to already have a program loaded before
            the first OpenCL APIs are invoked.
            All cl_program objects behave as if they refer to the
            preloaded device binary.
            The clBuildProgram API method always succeeds when compiling
            for this device, but does not produce an executable
            device binary.
            The clCreateProgramWithBinary does not validate
            the provided binary.  It always succeeds provided its
            arguments are well-formed.
            The data returned by the CL_PROGRAM_BINARIES query 
            for clGetProgramInfo is unspecified.  It may not be
            a valid device binary.
            
         CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_INTELFPGA -
            When compiling a program created with
            clCreateProgramWithSource, the clBuildProgram method
            performs a "stub" compilation, and populates a data store
            with enough information to perform a program compilation
            offline (i.e. outside the control of the host program).
            
            The online stub compilation produces a binary which includes
            enough information to satsify further runtime API activities
            such as creating cl_kernel objects, setting kernel arguments,
            and enqueing kernels for execution.  The stub binary may not
            have executable code, so the execution of kernels from such
            a binary may not produce valid data.

            This value for CL_CONTEXT_COMPILER_MODE_INTELFPGA is designed
            to be used with the CL_CONTEXT_OFFLINE_DEVICE_INTELFPGA context
            property.

            Additionally, each logically distinct call of clBuildProgram
            generates an entry in an external data store which persists
            beyond the lifetime of the host program.  Each entry is a
            tuple consisting of:

            - The target device 

            - The kernel program source 

            - The options supplied to clBuildProgram 

            - A set of instructions (such as a script) for offline
              compilation of the kernel program.  The result of executing
              these instructions is to associate a device binary (including
              executable device code) with this tuple in the data store.

            In Altera's platform implementation, the data store is a tree of
            directories and files.  Each data store entry is a leaf
            directory in this tree containing:
               - a file named "kernels.cl" containing the kernel program
               source
               - a file named "build.cmd" containing the list of operating
               system commands to be used to compile the program to a
               device binary, and to store that device binary in a file
               named "kernels.aocx" in the same directory.

         CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_INTELFPGA -
            When compiling a program created with clCreateProgramWithSoruce,
            the clBuildProgram API method performs a lookup in the data store
            based on the following attributes:

            - The target device 

            - The kernel program source 

            - The options supplied to clBuildProgram

             If such an entry exists in the data store, then the associated
             binary is loaded into the cl_program.  If no such entry exists,
             or if the entry is not associated with a device binary, then an
             error is returned.


      cl_context_properties enum:
         CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_INTELFPGA
      Property value:
         const char*
      Description:
         Specifies the filesystem root directory
         for the data store used when either value
         CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_INTELFPGA or value
         CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_INTELFPGA is specified
         for context property CL_CONTEXT_COMPILER_MODE_INTELFPGA.

         The specified value may be a relative directory name, but
         will be resolved to an absolute path at context creation time.

         If this property is left unspecified, then data store root
         is implementation-defined.

         In Altera's platform implementation, the default is the
         "aocl_program_library" subdirectory in the current directory
         in effect at context creation time.



Additions to Chapter 5 of the OpenCL 1.0 (v48) Specification
   
   Additions to Section 5.4.1 Creating Program Objects

      When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA is
      specified with value CL_CONTEXT_COMPILER_PRELOADED_BINARY_ONLY_ALTERA,
      the clCreateProgramWithBinary API method behaves in a non-standard
      way.  See the description of this property value in Table 4.4 for
      more information.

   Additions to Section 5.4.2 Building Program Exectuables

      When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA is
      specified, the clBuildProgram API method behaves in a non-standard
      way when compiling for devices without an online compiler.
      See the description of this property value in Table 4.4 for more
      information.

   Additions to Section 5.4.5 Program Object Queries

      The following is added to the description of the 
      CL_PROGRAM_BINARIES query in Table 5.11:

         When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA
         is specified with either value
         CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_INTELFPGA
         or value CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_INTELFPGA,
         then the binary returned by this query may not be a valid device
         binary.
         See the description of the CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA
         property value in Table 4.4 for more information.

   Additions to Section 5.6 Executing Kernels

      When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA
      is specified with value 
      CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_INTELFPGA,
      or if context property CL_CONTEXT_OFFLINE_DEVICE_INTELFPGA is specified,
      then a kernel execution may behave as if the kernel code
      produces no side effects, i.e. as if the body of the kernel
      contains no statements.
      (All other runtime activites must occur.  For example,
      the kernels are still enqueued, buffers are copied or mapped
      as required to satisfy kernel arguments, and event profiling
      information is updated to reflect command progression.)
      See the description of the CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA
      property value in Table 4.4 for more information.
   

Implementation Notes

   For each context property defined in this extension specification,
   Altera's platform implementation allows the property to be specified
   by setting an environment variable of the same name.

   For CL_CONTEXT_OFFLINE_DEVICE_INTELFPGA, the environment variable
   of *must* be set before any platform APIs are called.

   For CL_CONTEXT_COMPILER_MODE_INTELFPGA, value of the environment variable
   is the numerical value for the corresponding value enum as defined
   in the CL/cl_ext.h header file.

   Precedence:  For context properties CL_CONTEXT_COMPILER_MODE_INTELFPGA and
   CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_INTELFPGA, any value explicitly
   provided to the clCreateContext or clCreateContextFromType
   API methods will override an environment variable setting.

   If property CL_CONTEXT_OFFLINE_DEVICE_INTELFPGA is explicitly
   provided to the clCreateCOTNEXT or clCreateContextFromType
   methods, then it must match the value set for environment variable
   CL_CONTEXT_OFFLINE_DEVICE_INTELFPGA.


Issues

   1. The use of context property CL_CONTEXT_OFFLINE_DEVICE_INTELFPGA is
      rather restricted:

         - It must be set at the "beginning of time" for a host application,
           e.g. via an environment variable setting.

         - If an "offline device" is used in a context, then online
           devices cannot be used in any context for the platform.
   
      In this light, it might be more natural to call it a "platform"
      property.  But in OpenCL platforms are stateless, so this would
      not be appropriate.  Instead the most basic runtime controls are
      applied to contexts, not platforms.

      These restrictions reflect limitations of Altera's original
      implementation.  However, the feature in its current form is still
      quite useful for application development and porting.


Sample Code

   Example for using an "offline device", with creation and initialization of
   a data store for offline compilation of program binaries.

      #include <CL/opencl.h>
      #include <stdlib.h>
      #include <assert.h>
      #define CHECK(X) assert(CL_SUCCESS == (X))

      int main(...) {
         cl_platform platform = 0;
         cl_device_id device = 0;
         cl_context context = 0;
         cl_int status = 0;

         // Specify an offline device via environment variable here, or externally
         // before program startup.
         // This must occur befor the first OpenCL API method call.
         setenv("CL_CONTEXT_OFFLINE_DEVICE_INTELFPGA","mydevice");

         CHECK( clGetPlatformIDs(1,&platform,0) );
         CHECK( clGetDeviceIDs(platform,CL_DEVICE_TYPE_ACCELERATOR,1,&device,0) );

         cl_context_properties props[] = {

            // Specify creation and initialization of a program library data store.
            CL_CONTEXT_COMPILER_MODE_INTELFPGA, 
               (cl_context_properties)CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_INTELFPGA

            // Where is the data store rooted?
            CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_INTELFPGA
               (cl_context_properties)"/data/myproject/proglib"

            // Terminate the properties list.
            0
         };
         context = clCreateContext( props, 1, &device, 0, 0, &status );
         CHECK( status );

         cl_command_queue cq = clCreateCommandQueue( context, device, 0, &status );
         CHECK( status );

         const char* source = "kernel void foo( global int* A ) { *A = 42; }";
         cl_program program = clCreateProgramWithSource( context, 1, &source, NULL, &status );
         CHECK( status );

         // Perform a stub compilation, and create an entry for this (device,source,options)
         // combination in the program library data store.
         CHECK( clBuildProgram( program, 1, &device, "-cl-opt-disable", 0, 0 ) );

         cl_mem mem = clCreateBuffer(context,CL_MEM_READ_WRITE,sizeof(cl_int),0,&status);
         CHECK( status );

         // Only a stub compilation has been performed.
         // But this still enables full setup and enqueue of a kernel for execution.
         cl_kernel kernel = clCreateKernel( program, "foo", &status );
         CHECK( status );
         CHECK( clSetKernelArg( kernel, 0, sizeof(cl_mem), &mem ) );

         // Expect that kernels scheduled by clEnqueueNDRangeKernel and
         // clEnqueueTask may not produce expected side effects.  
         // It may appear that each kernel executes no statements.
         
         CHECK( clEnqueueTask( cq, kernel, 0, 0, 0 ) );
         cl_int the_answer = 0;
         CHECK( clEnqueueReadBuffer( cq, mem, 1 /*block*/, 0, sizeof(the_answer),&the_answer,0,0,0) );

         // Because we're using an "offline device", we can't rely on the_answer being 42.

         // ...



   Example for using an already-populated data store of program binaries compiled
   in an offline manner.  The code is the same as the previous example, except
   use these context properties instead:

         cl_context_properties props[] = {

            // Specify the use of a program data store library of pre-compiled device binaries.
            CL_CONTEXT_COMPILER_MODE_INTELFPGA, 
               (cl_context_properties)CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_INTELFPGA

            // Where is the data store rooted?
            CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_INTELFPGA
               (cl_context_properties)"/data/myproject/proglib"

            // Terminate the properties list.
            0
         };



   Example for using a "preloaded binary":

         cl_context_properties props[] = {

            // Specify the use of a preloaded binary.
            CL_CONTEXT_COMPILER_MODE_INTELFPGA, 
               (cl_context_properties)CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_INTELFPGA,

            // Terminate the properties list.
            0
         };
         context = clCreateContext( props, 1, &device, 0, 0, &status );

         ...

         // All cl_program objects refer to the preloaded binary.
         // clCreateProgramWithBinary accepts invalid device binaries.
         // clBuildProgram succeeds but does not actually compile the program from source.


Conformance Tests

   None.

Revision History

   Version 0, 2013-05-06 - Initial revision.
      Documents the behaviours in the Intel(R) FPGA SDK for OpenCL(TM), version 13.0.

   Version 1, 2014-02-06 - Assigned registered values to tokens.
      
