Display Device Performance Profile Analysis

The following topic examines the data obtained from initial profiling run in How to Profile and Optimize a Display Driver.

Monte Carlo Profiling Data

After performing the steps in Obtaining a Performance Profile for a Display Driver, the data in your Debug window should resemble the following output.

Note   For clarity, the tick count, process IDs, and thread IDs have been removed from the following output. Also, the specific timer values vary from run to run because of the manual process of controlling the profiling process.

Kernel Profiler: Gathering MonteCarlo data in buffered mode
ProfileStart() : Allocated 13960 kB for Profiler Buffer (0x48000000)
Starting profile timer at 200 uS rate
ProfApp:  Took 34297 ms to perform blts. 
Kernel Profiler: Looking up symbols for 503297 hits.
.
.
(Additional lines omitted for clarity.)
.
.
Total samples recorded = 503297
Module        Hits        Percent
------------  ----------  -------
nk.exe            332367     66.0
ddi_flat.dll      170117     33.8
gwes.exe             662      0.1
coredll.dll          108      0.0
ProfApp.exe           25      0.0
relfsd.dll             3      0.0
filesys.exe            2      0.0
shell.exe              2      0.0
explorer.exe           2      0.0
pm.dll                 1      0.0
UNKNOWN                8      0.0

Hits       Percent Address  Module       Routine
---------- ------- -------- ------------:---------------------
    328544    65.2 802351c2 nk.exe      :_IDLE_STATE
    101277    20.1 03db634d ddi_flat.dll:?EmulatedBlt_Internal
     64291    12.7 03db7c7f ddi_flat.dll:?MaskedSrcToMaskedDst
      3864     0.7 03db2068 ddi_flat.dll:?CursorOn
      1351     0.2 8025a52f nk.exe      :_NE2000_WRITE_PORT_UCHAR
      1181     0.2 8025a53b nk.exe      :_NE2000_READ_PORT_UCHAR
       345     0.0 03db2268 ddi_flat.dll:?CursorOff
       162     0.0 80229a7f nk.exe      :_PerfCountSinceTick
       153     0.0 8025a54b nk.exe      :_NE2000_READ_PORT_USHORT
       150     0.0 0003d906 gwes.exe    :?dwRealizeColor
        98     0.0 8025ad06 nk.exe      :_HandleReceive

(Additional lines omitted for clarity.)

         1     0.0 0001f663 explorer.exe:?ItemCount
         1     0.0 00029696 explorer.exe:_GetSysColorBrush
        33     0.0                      :<UNACCOUNTED FOR>

DispPerf.exe Data

The following tables show the single set of data from running ProfApp.exe in Obtaining a Performance Profile for a Display Driver displayed in several smaller tables.

The following table shows the overall summary of the number of times each raster operation was called. For information about converting the ROP codes reported by DispPerf to the ROP codes listed in Ternary Raster Operations, see Display Driver Performance Profiling.

RopCode cTotal
0x0000CCCC 1063
0x0000F0F0 6222
0x00008888 59
0x00006666 52
0x0000AAF0 4537
0x0000EEEE 7
0xFEFEFFF1 49
0x0000E2E2 1
0x00005555 378

The following table shows the profiling results from all the ROPs performed by GPE functions.

RopCode cGPE dwGPETime Avg.GPETime
0x0000CCCC 1045 40923908 39161
0x0000F0F0 668 21737 32
0x00008888 59 41647 705
0x00006666 52 28160 541
0x0000AAF0 3957 80123 20
0x0000EEEE 0 0 0
0xFEFEFFF1 49 133721 2729
0x0000E2E2 1 355 355
0x00005555 0 0 0

The following table shows the profiling results from all the ROPs performed emulation functions. For more information, see BitBlT Emulation Library Functions.

RopCode cEmul dwEmulTime Avg.EmulTime
0x0000CCCC 18 94149 5230
0x0000F0F0 5554 486315 87
0x00008888 0 0 0
0x00006666 0 0 0
0x0000AAF0 580 63332 109
0x0000EEEE 7 13033 1861
0xFEFEFFF1 0 0 0
0x0000E2E2 0 0 0
0x00005555 378 42907 113

The DispPerf results do not show any profiling results for hardware calls because the settings in How to Profile and Optimize a Display Driver are based on the FLAT driver, which is a general purpose driver that does not make use of advanced hardware features.

Analysis of the Profiling Results

The Monte Carlo profiling results reveal the functions in which the most time was spent during the profiling period. Specifically, these results show that the system spent the most time in the idle state, most likely waiting for the manual interaction required to start and stop the profiler. Idle time is both expected and unavoidable with this method, so this result is neither surprising nor interesting for optimizing the display driver.

The second and third lines are more significant for optimization. The second line shows that a very large percentage of time — well over 50 percent of the total when the idle time is factored out — is spent in the driver function EmulatedBlt_Internal. The next most time-consuming function is MaskedSrcToMaskedDst, taking up nearly 40 percent of the execution time once the idle time is factored out.

The DispPerf.exe results show that the driver spent two orders of magnitude more time performing the 0x0000CCCC (SRCCOPY) raster operation in GPE than during any other operation. This highlights it as a prime candidate for optimization.

A search through the source code shows that the two functions are defined in files located in %_WINCEROOT%\Public\Common\OAK\Drivers\Display\GPE. EmulatedBlt_Internal is defined in Swblt.cpp and MaskedSrcToMaskedDst is defined in Swconvert.cpp. Examining this code to see how the specific ROP is processed reveals that both of these functions are involved in converting color from the RGB555 format used in the application to the RGB565 format used by the display hardware. EmulatedBlt_Internal is a GPE function called for the blit operation. It then calls MaskedSrcToMaskedDst for the format conversion. The function EmulatedBlt_Internal is of particular concern because it is a standard GPE function. This indicates that not only is there no hardware support for this conversion in the graphics device, there also is no emulation library function for the conversion either.

See Also

How to Profile and Optimize a Display Driver

 Last updated on Tuesday, May 18, 2004

© 1992-2003 Microsoft Corporation. All rights reserved.