Optimizing a GPE Emulation Function

This topic describes changes to the implementation of the function EmulatedBlt_Internal to improve the performance for the FLAT display driver in How to Profile and Optimize a Display Driver. These changes are made to specifically address the specific scenario presented in that example, namely RGB555 to RGB565 color conversion. You could apply this optimization to all the drivers and all the platforms that you usually work with. The tradeoff for doing this, however, is an increase in the footprint and complexity of your display driver in exchange for an optimization for a scenario that is actually somewhat rare in most general applications.

As indicated in Display Driver Performance, the default emulation functions are good candidates for optimization because they can be replaced with more efficient implementations that use specific hardware features, specialized knowledge of the usage scenario, or both. The following steps describe how to add code to your platform to improve the performance of ProfApp.exe based on the results from Obtaining a Performance Profile for a Display Driver.

To optimize EmulatedBlt_Internal for RGB555 to RGB565 color conversion

  1. Replace line 218 of public\common\oak\drivers\display\emul\ebbltsel.cpp with the following code so that this if statement is nested within the existing if statement:

    if (!(pParms->pConvert && pParms->rop4 ==
          0xCCCC && pParms->pSrc->Format() == gpe16Bpp))
    {
      return  S_OK;   // This was originally line 218.
    }
    

    This code creates a special case for 16-bit-per-pixel (bpp) format conversions within the function EmulatedBltSelect16.

  2. Insert the following code into public\common\oak\drivers\display\emul\ebbltsel.cpp at what was originally line 268 to add another else if clause to the existing if statement. Depending on how you formatted your code changes in the previous step, the correct spot for this code should be approximately line 272.

    else if ((pParms->pSrc->Format() == gpe16Bpp) &&
             (NULL == pParms->pLookup) && (pParms->pConvert))
    {
      pParms->pBlt = FUNCNAME(BltSrcCopy1616Convert);
    }
    

    This creates a special code path to a new function to handle 16-bpp format conversions.

  3. Insert the following code into public\common\oak\inc\emul.h at line 133:

    SCODE EmulatedBltSrcCopy1616Convert( GPEBltParms * );
    

    This code prototypes a new function to perform 16-bpp color format conversions.

  4. Add the following code to public\common\oak\drivers\display\emul\ebcopy16.cpp:

    SCODE Emulator::EmulatedBltSrcCopy1616Convert(GPEBltParms* pBltParms)
    {
      // Source-related information.
      PRECTL   prcSrc         = pBltParms->prclSrc;
      UINT32   iScanStrideSrc = pBltParms->pSrc->Stride()/sizeof(WORD);
      WORD    *pwScanLineSrc  = (WORD *)pBltParms->pSrc->Buffer() +
                                prcSrc->top * iScanStrideSrc      +
                                prcSrc->left;
    
      // Destination-related information.
      PRECTL   prcDst         = pBltParms->prclDst;
      UINT32   iScanStrideDst = pBltParms->pDst->Stride()/sizeof(WORD);
      WORD    *pwScanLineDst  = (WORD *)pBltParms->pDst->Buffer() +
                                prcDst->top * iScanStrideDst      +
                                prcDst->left;
    
      int cRows = prcDst->bottom - prcDst->top;
      int cCols = prcDst->right  - prcDst->left;
    
      // Copy source before overwriting.
      if (!pBltParms->yPositive)
      {
        // Scan from end of memory, and negate stride.
        pwScanLineSrc += iScanStrideSrc * (cRows - 1);
        pwScanLineDst += iScanStrideDst * (cRows - 1);
    
        iScanStrideSrc = (UINT32)-(INT32)iScanStrideSrc;
        iScanStrideDst = (UINT32)-(INT32)iScanStrideDst;
      }
    
      if (!pBltParms->xPositive)
      {
        // Copy from right to left.
        for (int row = 0; row < cRows; row++)
        {
          WORD *pwPixelDst = pwScanLineDst + cCols - 1;
          WORD *pwPixelSrc = pwScanLineSrc + cCols - 1;
    
          while (pwPixelDst >= pwScanLineDst)
          {
            *pwPixelDst = *pwPixelSrc & 0x1f | ((*pwPixelSrc & 0x7fe) << 1);
            pwPixelDst--;
            pwPixelSrc--;
          }
          pwScanLineSrc += iScanStrideSrc;
          pwScanLineDst += iScanStrideDst;
        }  
      }
      else
      {
        // Copy from left to right.
        for (int row = 0; row < cRows; row++)
        {
          WORD *pwPixelDst = pwScanLineDst;
          WORD *pwPixelSrc = pwScanLineSrc;
          WORD *pwLim      = pwPixelDst + cCols;
    
          BOOL bPreWord;
          BOOL bPostWord;
    
          DWORD * pdwPixelDst;
          DWORD * pdwLim;
    
          bPreWord = ((WORD)pwPixelDst & 2) ? TRUE : FALSE;
    
          if (bPreWord)
          {
            *pwPixelDst = *pwPixelSrc & 0x1f | ((*pwPixelSrc & 0x7fe) << 1);
            pwPixelDst++;
            pwPixelSrc++;
          }
    
          pdwPixelDst = (DWORD *)pwPixelDst;
          pdwLim      = (DWORD *)((DWORD)pwLim & (~3));
          bPostWord   = ((WORD)pwLim & 2) ? TRUE : FALSE;
    
          while (pdwPixelDst < pdwLim)
          {
            DWORD dwSrc = (*pwPixelSrc << 16);
    
            pwPixelSrc++;
            dwSrc |= *pwPixelSrc;
            pwPixelSrc++;
    
            dwSrc = (dwSrc & 0x1F001F) | ((dwSrc & 0x7FE07FE0) << 1);
    
            *pdwPixelDst = dwSrc;
            pdwPixelDst++;
          }
    
          pwPixelDst = (WORD *)pdwPixelDst;
    
          if (bPostWord)
          {
           *pwPixelDst = *pwPixelSrc & 0x1f | ((*pwPixelSrc & 0x7fe) << 1);
            pwPixelDst++;
            pwPixelSrc++;
          }
    
          pwScanLineSrc += iScanStrideSrc;
          pwScanLineDst += iScanStrideDst;
        }
      }
      return S_OK;
    }
    

See Also

How to Profile and Optimize a Display Driver

 Last updated on Tuesday, May 18, 2004

© 1992-2003 Microsoft Corporation. All rights reserved.