restrict(amp) restrictions part 3 of N – function declarators and calls

 

This post assumes and requires that you have read the introductory post to this series which also includes a table of content. With that out of the way let’s look at restrictions around function declarators and calls.

Function declarators with restrict(amp)

For a function declarator with restrict(amp) (or restrict(amp, cpu) ), besides the obvious rules that its return type and parameter types must be supported for amp, there are some extra rules as following:

· It is not allowed to have a trailing ellipsis (…) in its parameter list;

· It is not allowed to have an exception specification (including the empty throw() and __declspec(nothrow) );

· It is not allowed to have extern”C” linkage when it has multiple restriction specifiers;

· It is not allowed to be virtual;

 

Variadic functions require direct support from the C runtime, which is not amp- compatible in C++ AMP v1. In addition, C++ AMP does not support exception handling, therefore, an exception cannot be thrown inside an amp restricted function, and neither can the function have exception specifications. The empty exception specification is harmless, but we disallow it for consistency. The limitation on extern “C” linkage is due to the fact that the current C++ AMP implementation generates multiple symbols for a function with multiple restriction specifiers, which cannot be done for extern “C” functions since they do not have C++ decorated names and thus those symbols cannot be differentiated. Finally, the non-virtual requirement is due to the lack of hardware function call support.

Function calls

Within an amp-restricted function, the target of a function-like invocation (e.g., functions, member functions, object constructors & destructors, operators) must be amp-restricted too. Following the amp type restrictions, we know that it cannot be a virtual function or a function pointer/pointer to member function either. In addition, due to the lack of hardware stack and function call support, it is not allowed for a function to recursively invoke itself directly or via other functions indirectly.

Comments

  • Anonymous
    May 16, 2013
    I know the functionality is implied, but just for clarity could you discuss considerations for functions declared with "inline"? How will the VC++11 compiler respond to this keyword and will the rules for inlining functions be different for amp restricted code?

  • Anonymous
    May 17, 2013
    Hi Arman, when a restrict(amp) or restict(amp, cpu) function is called within the call graph rooted from the parallel_for_each, it will always be inlined. Please take a look at: blogs.msdn.com/.../c-amp-full-inlining-requirement.aspx. When a restrict(amp, cpu) function is called on host, its the inlining behavior is unchanged.

  • Anonymous
    May 28, 2014
    Is it possible to call variadic template amp restricted functions in parallel_for_each with restrict(amp) like this?        template <typename... Functions>        int FillArray(std::vector<double>& vArray, Functions... functs)        { double dParam = 1.0; std::vector<std::function<bool(double)>> vFunctions = { functs... };                for (auto funct : vFunctions)                        parallel_for_each(vArray.begin(), vArray.begin(), [funct, dParam](double& d)                        {                             d += funct(dParam);                        });        }

  • Anonymous
    May 29, 2014
    The comment has been removed

  • Anonymous
    May 30, 2014
    Thanks a lot Lukasz. I've known about  pointer restriction and is good to know that lambda closures will inline parallel_for_each(...) restrict (amp) statically  in compile-time. My problem is more complicated becase I need to use it for multiple GPU-s running on vArray.section like this: //     std::vector<double> vParams = { 0, 1, 2, ..., N};        array_view<double> avP(vParams);        std::vector<std::function<double(double)>> vFunctions = { functs... };        for (auto funct: vFunctions)                parallel_for_each(vGPUs.begin(), vGPUs.end(), [&](pair<accelerator, int> accel)                {                        accelerator_view device = accel.first.get_default_view();                        accel.first.set_default_cpu_access_type(access_type_auto);                        device.wait();                        int nGPU = accel.second;                        auto vArray_index = concurrency::index<1>(nGPU * vArray.extent / nGPUCount);                        auto vArray_extent = concurrency::extent<1>(vArray.extent / nGPUCount);                        auto P_index = concurrency::index<1>(nGPU * vArray.extent / nGPUCount);                        auto vArraySection = vArray.section(vArray_index, vArray_extent);                        auto avP_section = avP.section(P_index);                        vArraySection.discard_data();                        parallel_for_each(device, vArraySection.extent, [&, funct, avP_section, vArraySection](index<1> idx) restrict(amp)                        {                                 vArraySection(idx).val = funct(avP_section(idx));;                        });                        vArraySection.synchronize();                }); Problem is - vArray.section is determined in run-time.

  • Anonymous
    June 01, 2014
    The comment has been removed

  • Anonymous
    June 03, 2014
    Thank You Łukasz! It' works even though I need to use std::tuple instead of std::pair because of another parameters for extent<2> and extent<3> array_view ;-)