printf, errorf, abort in C++ AMP

There is full Visual Studio debugging support for C++ AMP, and we will cover that in future blog posts. In this post, I am going to introduce three debug diagnostic functions that can be used in restrict(amp) code including a variant of the well-known printf function.

All three functions are executed as any other device-side function: per-thread, and in the context of the calling thread.

void direct3d_abort() restrict(amp)

This function aborts the execution of a kernel. When the abort is detected by the runtime, it raises a runtime_exception on the host with the error message, “Reference Rasterizer: Shader abort instruction hit”.

void direct3d_printf(const char *_Format_string, …) restrict(amp)

(Parameters)_Format_string: The format string; ...: An optional list of parameters of variable count.

This function accepts a format string and an optional list of parameters of variable count. It prints formatted output from a kernel to the Visual Studio output window.

void direct3d_errorf(char *_Format_string, …) restrict(amp)

This function has identical characteristics and usage to the direct3d_printf function, in that a message is printed to the output window. Additionally the C++ AMP runtime will raise a runtime_exception on the host with the same error message passed to the direct3d_errof call.

Notes on usage

These functions are usable only if all of the following conditions are met and will otherwise behave as no-ops.

1) The Debug configuration in Visual Studio is selected, i.e. the code is compiled with the _DEBUG preprocessor definition.

2) The accelerator_view on which the kernel is invoked must be on an accelerator which supports the printf, errorf, and abort intrinsics. At the time of writing, only the direct3d_ref accelerator supports these intrinsics.

Also, because these debug functions are based on HLSL intrinsic functions, there are two restrictions to bear in mind.

1) The maximum number of allowed parameters is seven. If you break that rule, as follows (in amp restricted function):

int i = 0;

direct3d_printf("printf: the value is: %d, %d, %d, %d, %d, %d, %d\n", i, i, i, i, i, i, i);

You will get compiler error C3562: intrinsic function 'direct3d_printf' is limited to have no more than 7 parameters.

2) There is no auto widening/narrowing type conversion, for example (in amp restricted function):

float x = 1.0;

direct3d_printf("%lf", x);

If you ran similar code on CPU, then x would be converted to double type correctly before print. However, the code does not work correctly on GPU because there is no auto widening support for direct3d_printf and direct3d_error. So in the example, the print out value will not be correct.

Finally, for all the functions above, note that due to the asynchronous nature of kernel execution, the actual print out of direct3d_printf may happen asynchronously any time between the dispatch of the kernel and the completion of the kernel’s execution. Hence, errors from direct3d_errorf and direct3d_abort may be detected after the parallel_for_each call and before another call that results in a GPU command being queued.

Exception to the rule

One of the restrictions of restrict(amp) code is that trailing ellipsis (…) is not allowed. However, these debug diagnostic functions are implemented as compiler intrinsic functions, so they are the exception to the rule and their parameters can have trailing ellipsis. They are essentially the HLSL functions: abort, errorf, and printf. That restriction-violation is also the reason we could not wrap these intrinsics with friendlier functions.

Sample Code

Here is some sample code for you to try. Remember to run under the debug configuration. 

 #include <vector>
 #include <iostream>
 #include <amp.h>
  
 using std::vector;
 using namespace concurrency;
  
 int main()
 {
     const int N = 2;
     const int M = 2;
     const int size = N * M;
  
     vector<int> A(size);
     int i = 0;
     std::generate(A.begin(), A.end(), [&i](){return i++;});
     extent<2> e(N, M);
     array_view<int, 2> av(e, A);
     //At the time of writing, only the REF accelerator supports these intrinsics.
     accelerator_view acl_v = accelerator(accelerator::direct3d_ref).default_view; 
  
     parallel_for_each(acl_v, av.extent, [=](index<2> idx) restrict(amp) {
         av[idx]++;
         direct3d_printf("printf: the value is: %d\n", av[idx]);
     });
     av.synchronize();
  
     try
     {
         parallel_for_each(acl_v, av.extent, [=](index<2> idx) restrict(amp) {
             av[idx]++;
             direct3d_errorf("errorf: The value is: %d\n", av[idx]);
         });
         av.synchronize();
     } catch (runtime_exception &e)
     {
         std::cout << "catch runtime exception: " << e.what() << std::endl;
     }
  
     try
     {
         parallel_for_each(acl_v, av.extent, [=](index<2> idx) restrict(amp) {
             av[idx]++;
             direct3d_abort();
         });
         av.synchronize();
     } catch (runtime_exception &e)
     {
         std::cout << "catch runtime exception: " << e.what() << std::endl;
     }
  
     return 0;
 }

Output in Visual Studio

To view the output of these functions, in Visual Studio 11, after you have started debugging, enable the program output in the output windows.

image

Then go to menu->debug->output. You can view the output from these functions regardless if you have selected GPU debugging or the default CPU debugging.

With the default of CPU debugging (“Auto” or “Native Only”) you will see output like the following:

image

With the “GPU only” debugging selected, in the “GPU - Software Emulator”, you will see output like the following (abort causes the debugger to break, instead of outputting a message):

image

That is all for these three functions, hope you find them useful when log information in your code. Your feedback as always is welcome below or in our MSDN forum.

Comments

  • Anonymous
    February 23, 2012
    A small note regarding the sample code -- if it's meant to be C++, you need to include the appropriate header (aptly named "vector") in order to use std::vector (relying on implicit inclusion order is non-portable) and use "int main()" instead of "void main()" (which also isn't standard or portable). HTH!

  • Anonymous
    February 23, 2012
    MattPD, good points. By design, we are taking shortcuts in our blog post code for space and simplicity reasons. Good to see that you can easily map what we wrote to the portable way of writing things...

  • Anonymous
    February 24, 2012
    Daniel, sure, I get the constraints imposed by the blog format. At the same time, I think it's important to be extra careful when disseminating knowledge (which, if I understand correctly, is the focus of this post), since later on we might get others to rely on non-portable behavior and citing us as an authoritative reference (even if it wasn't the main topic of our writing), and I'm sure we wouldn't want that :-) Just think of it as a service to the future generations of devs who may have to maintain the code written that refers to what we disseminate as the guidelines to follow--IMHO we have an extra responsibility here to measure up to a higher standard! :-)

  • Anonymous
    February 24, 2012
    Hi MattPD, thank you for your continued feedback. I just wanted to make sure you knew that our pieces of code floating in blog posts are not aimed to be the guidance. For example there is no exception handling, localization of strings, optimizations for performance, or anything else that would make them "real code". They are meant as educational only around the very specific topic they cover. Your comment makes me think that maybe we should dumb down the code even further to make that even more obvious... In any case, thank you again for sharing your opinion that we should consider different goals for our blog code, we'll take it into consideration - stay tuned.

  • Anonymous
    February 24, 2012
    Daniel, thanks for the reply! Just a quick note: to clear up a possible miscommunication (perhaps I didn't express clearly what I had in mind) let me just quickly clarify: it's not that I think that blog posts are the guideline, it's more the way blog posts (and various completely random stuff floating around on unrelated websites) are actually treated in the real world, as in: blogs.msdn.com/.../4315707.aspx Perhaps I'm just overly sensitive due to history, the "void main" in particular has led to countless flame wars and the most popular "argument" was that it was "seen somewhere by someone", usually that someone not being able to tell the difference between a book and the ISO standard: c-faq.com/.../voidmainbooks.html -- that's precisely why extra care is needed in the docs "meant as educational only". This is more of a problem than any of these: "there is no exception handling, localization of strings, optimizations for performance, or anything else that would make them "real code"" -- none of this is an error, "void main" most definitely is. Sure, it would be nice if we were all living in a perfect world where everyone is rational, acquainted with the standards and able to tell the authoritative documents from the blog posts and books (we probably wouldn't even need debuggers or exceptions in that world--and I'm pretty sure our perfect, rational programmers would simply read the C++ AMP specs cover-to-cover, infer everything they would ever need using pure logic, and have no need for any puny blog posts), but that's certainly not the one we're living in and certainly not the readership of any blog (after all, those guys already know all the possible implications of the specs and never have any questions)! :-) Hope I've explained this more clearly this time! :)