Optimized GPE Emulation Function Analysis

After you have implemented the changes for Optimizing a GPE Emulation Function as part of How to Profile and Optimize a Display Driver, and obtained a new set of performance profiling data you need to examine the results of the data to determine whether the optimization really did improve performance.

Monte Carlo Profiling Data

With the optimized code in place, the profiling results should resemble the following output.

Note   For clarity, the tick count, process IDs, and thread IDs have been removed from the following output. Also, the specific timer values vary from run to run because of the manual process of controlling the profiling process.

Kernel Profiler: Gathering MonteCarlo data in buffered mode
ProfileStart() : Allocated 13946 kB for Profiler Buffer (0x48000000)
Starting profile timer at 200 uS rate
ProfApp:  Took 3666 ms to perform blts.
Kernel Profiler: Looking up symbols for 42478 hits.
.
.
(Additional lines omitted for clarity.)
.
.
.
Total samples recorded = 42478
Module        Hits        Percent
------------  ----------  -------
nk.exe             23679     55.7
ddi_flat.dll       18037     42.4
gwes.exe             645      1.5
coredll.dll           87      0.2
ProfApp.exe           16      0.0
fsdmgr.dll             4      0.0
relfsd.dll             2      0.0
kbdmouse.dll           1      0.0
UNKNOWN                7      0.0

Hits       Percent Address  Module       Routine
---------- ------- -------- ------------:---------------------
     22683    53.3 802351c2 nk.exe      :_IDLE_STATE
     14065    33.1 03dbac20 ddi_flat.dll:?EmulatedBltSrcCopy1616Convert
      3340     7.8 03db2068 ddi_flat.dll:?CursorOn
       300     0.7 03db2268 ddi_flat.dll:?CursorOff
       182     0.4 0003d906 gwes.exe    :?dwRealizeColor
       176     0.4 8025a53b nk.exe      :_NE2000_READ_PORT_UCHAR
       171     0.4 8025a52f nk.exe      :_NE2000_WRITE_PORT_UCHAR
       164     0.3 80229a7f nk.exe      :_PerfCountSinceTick
        89     0.2 03dbb680 ddi_flat.dll:?EmulatedBltFill16
        56     0.1 8024855c nk.exe      :_ObjectCall
        55     0.1 80243470 nk.exe      :_ZeroPage

(Additional lines omitted for clarity.)

         1     0.0 03db8e90 ddi_flat.dll:?ScanLine
         1     0.0 03c4265f kbdmouse.dll:_KeybdDriverVKeyToUnicode
        23     0.0                      :<UNACCOUNTED FOR>

For more information, see Monte Carlo Profiling.

DispPerf.exe Data

The following tables show the single set of data from the second run of ProfApp.exe in How to Profile and Optimize a Display Driver broken into several smaller tables.

The following table shows the overall summary of the number of times each raster operation (ROP) was called. For information about converting the ROP codes reported by DispPerf to the ROP codes listed in Ternary Raster Operations, see Display Driver Performance Profiling.

RopCode cTotal
0x0000CCCC 1063
0x0000F0F0 6337
0x00008888 59
0x00006666 52
0x0000AAF0 4628
0x0000EEEE 7
0xFEFEFFF1 49
0x0000E2E2 1
0x00005555 438

The following table shows the profiling results from all the ROPs performed by GPE functions.

RopCode cGPE dwGPETime Avg.GPETime
0x0000CCCC 47 67082 1427
0x0000F0F0 754 23428 31
0x00008888 59 51740 876
0x00006666 52 28191 542
0x0000AAF0 4056 40499 9
0x0000EEEE 0 0 0
0xFEFEFFF1 49 145383 2967
0x0000E2E2 1 353 353
0x00005555 0 0 0

The following table shows the profiling results from all the ROPs performed emulation functions. For more information, see BitBlT Emulation Library Functions.

RopCode cEmul dwEmulTime Avg.EmulTime
0x0000CCCC 1016 4412339 4342
0x0000F0F0 5583 435198 77
0x00008888 0 0 0
0x00006666 0 0 0
0x0000AAF0 572 36957 64
0x0000EEEE 7 23902 3414
0xFEFEFFF1 0 0 0
0x0000E2E2 0 0 0
0x00005555 438 48659 111

The DispPerf results do not show any profiling results for hardware calls because the settings in How to Profile and Optimize a Display Driver are based on the FLAT driver, which is a general purpose driver that does not make use of advanced hardware features.

Analysis of the Profiling Results After Optimization

For the profiling data for the original, non-optimized driver, see Obtaining a Performance Profile for a Display Driver.

The Monte Carlo results for the optimized driver show that the function MaskedSrcToMaskedDst is no longer a factor in the driver's performance because it is never called. The tasks that used to get handled by this function are now handled by the new emulated GPE function EmulatedBltSrcCopy1616Convert instead.

When reviewing Monte Carlo profiling results, remember that the results only show the proportion of time, not the absolute amount of time, spent in that function. The second set of profiling results show that just over 93 percent of the non-idle profiling time was spent in the driver, whereas the original run spent about 98 percent of its non-idle time in the driver. You should not interpret these results as a 5 percent increase in performance because there is no way to know the absolute times behind these percentages when analyzing Monte Carlo results.

The results from DispPerf.exe do show absolute times spent in the various ROPs. Whereas the original run spent a combined total of nearly 42 seconds for all of the ROPs, the optimized run spent a combined total of just over 5 seconds. The dramatic difference can be seen in ROP 0x0000CCCC (SRCCOPY), which was the focus of the optimization efforts. Although the amount of time spent for this ROP in the emulation libraries rose significantly, as expected, from an original total of 0.09 seconds to 4.4 seconds, this was more than offset by the reduction in time spent in GPE calls, which decreased from 40.9 seconds to 0.06 seconds.

ProfApp.exe is written to report its execution time in the debug output. This allows for an independent check against the results from the profiling tools. The first run of ProfApp.exe spent 34,297 milliseconds (ms) blitting to the screen. After optimizing the color format conversion in the driver, the second run of ProfApp.exe spent 3,666 ms blitting to the screen. This represents a 90 percent gain in total performance after optimization.

See Also

How to Profile and Optimize a Display Driver | Display Driver Performance

 Last updated on Tuesday, May 18, 2004

© 1992-2003 Microsoft Corporation. All rights reserved.