Buffering in HTTP.SYS

My name is Chun Ye. I am a Software Design Engineer in the Microsoft Windows Networking Transports & Connectivity group. I'm here to describe the scenarios under which an application using HTTPAPI.DLL should set the HTTP_SEND_RESPONSE_FLAG_BUFFER_DATA flag. This flag applies to both HttpSendHttpResponse and HttpSendResponseEntityBody APIs.

This is the section in http.h that describes this flag:

//
// HTTP_SEND_RESPONSE_FLAG_BUFFER_DATA - Specifies that a caller wants the
// response to complete as soon as possible at the cost by buffering partial
// or the entire response.
//

#define HTTP_SEND_RESPONSE_FLAG_DISCONNECT 0x00000001
#define HTTP_SEND_RESPONSE_FLAG_MORE_DATA 0x00000002
#define HTTP_SEND_RESPONSE_FLAG_BUFFER_DATA 0x00000004

First of all, some background information related to the introduction of the buffer-data flag. Before Windows 2003, IIS 5.0 is implemented using winsock, which enables buffering by default. However, when IIS 6.0 in Windows 2003 moves to HTTP.SYS, users have encountered performances issues when sending a response or entity bodies. There are two underlying causes for this performance issue.

The first cause is related to the TCP Delayed ACK. The TCP stack in Microsoft Windows' implementation turns on the Delayed ACK algorithm by default. What this means is that the TCP receiver sends the ACK after every second segment or until a delayed ACK timer expires (the Windows TCP stack sets the timer value to 200 milliseconds). When an application sending a response or entity bodies results in odd number of TCP segments, the Delayed ACK is introduced, and the application's IO is blocked until the ACK is received.

Another issue that an application can encounter send performance issues is when the underlying network has long latencies (Round Trip Times). In this case, the ACKs take long time to come back therefore the IO is blocked for a long time until all ACKs are received.

An application using only a single send can be hit by both issues mentioned above. The result is under network utilization because the application is not keeping the network pipe full and it is serializing every send by RTT units of time.

To solve these problems, starting Windows 2003 SP1, HTTP.SYS introduced the HTTP_SEND_RESPONSE_BUFFER_DATA flag for the send APIs. HTTP.SYS applies the winsock-like buffering when the flag is set. By buffering, HTTP.SYS copies the user data into its own internal buffer and sends it to the transport layer. The original IO from the user mode is completed immediately once the data reaches the transport layer but without waiting for the all the ACKs. By buffering the data in the kernel, we are trading CPU/memory for network utilization. This allows a "one send at a time" application to queue sends in parallel.

Here is the recommendation for the three types of applications in terms of their IO models:

  1. An application that does synchronous IO. It is recommended that such applications always set the HTTP_SEND_RESPONSE_BUFFER_DATA flag for the send IOs.
  2. An application that does asynchronous IO but keeps maximum only one outstanding IO for send on each connection. Here it is recommended that the application should set the HTTP_SEND_RESPONSE_BUFFER_DATA flag for all the intermediate send IOs (when the HTTP_SEND_RESPONSE_MORE_FLAG flag is also set).
  3. An application that utilizes multiple outstanding send IOs on each connection. For instance, an application sends out the next entity body without waiting for the previous send completion. In this case, there is no need to set the HTTP_SEND_RESPONSE_BUFFER_DATA flag since such an application is not blocked in any way. New and truly asynchronous application should try to use this IO model.

While setting the buffering flag solves the performance issue for most of the applications that fall under category 1 and 2 above, there are certain negative sides that the user should be aware of:

  1. Setting the flag increases CPU and memory usage. Buffering requires extra memory to store the user response or entity bodies. So this means extra CPU cycles and memory usage. Compared to an truly asynchronous application (category 3 above), an application that sets the buffering flag can expect some decrement in terms of clients being serviced. A trade-off has to be made between application simplicity and system scalability.
  2. When the buffering flag is passed, the send API will be completed as soon as the data is copied and passed to the transport layer. Thus, the completion status of the send API is not indicative of the completion status of the actual send.

- Chun Ye

Comments

  • Anonymous
    August 20, 2006
    PingBack from http://weblog.pigfoot.org/pigfoot/2006/08/21/buffering-in-httpsys/

  • Anonymous
    September 07, 2006
    Here is my explaination on why disable Nagle's Algorithm doesn't help performance.  The reason is that we can only disable it on the server side.  The client side still has Nagle's Algorithm enabled by default so the delayed-ACK issue is still there.  Typically for a GET request, the server is the sender but the delayed-ACK behavior is controlled by the client.  When the client is the sender (for PUT/POST requests), disable Nagle's Algorithm on the server side should make a difference.

    BTW Nagling or Delaye-ACK is only part of the reason that causes the performance problem.  The other part is Round Trip Time, which can't be solved by disabling Nagle's Algorithm.  That is why the buffering is necessary.

  • Anonymous
    May 15, 2012
    Could you please provide the Instructions on how to set the HTTP_SEND_RESPONSE_FLAG_BUFFER_DATA flag to enable TCP buffering on Windows Server 2003 Service Pack 1 Regards, Suman

  • Anonymous
    May 15, 2012
    You use it in the Flags section of the APIs, see msdn.microsoft.com/.../aa364499(v=vs.85).aspx for the HttpSendHttpResponse case.