ASP.NET Performance Lab: Hello, World!

I thought I could shed some light on the ASP.NET performance testing by presenting our most basic scenario: Hello, World!

Step 1: State the Objective

We want to ensure that the basic ASP.NET pipeline does not regress from one release to the next.  We use server throughput (requests per second) as our performance metric.  We want throughput to be on-par or better than the previous release (aka, baseline).

Step 2: Create the Scenario

We have a basic IIS website with the following ASPX page:

 <html>
    <%=”Hello, World!” %>
</html>

Step 3: Run the Scenario

Most of our scenarios use the wcat load tool to test the web server.  You can download wcat here.

In order to detect regressions we need to have reproducible results.  Here are some of the ways that we reduce variability:

  • Eliminate the variables.  Since I am measuring ASP.NET performance, .NET should be my one variable.  Other factors such as hardware, OS, IIS, wcat version and the scenario itself should remain constant.
  • Use a private network.  Eliminate unnecessary network traffic to the web server which could affect results.
  • Minimize server processes.  Again, we want to avoid competition for server resources.  We usually disable Windows Firewall and run the wcat controller and clients from separate machines.
  • Measure warm.  Since we are testing throughput and not startup, we can reduce the variability by adding a warm-up period.  This will reduce JIT compilation and population of caches which could cause more variance.
  • Max the CPU.  Server load and system resources factor into the equation as well.  We try to keep this near-constant by applying a full load during testing.  Our goal is to achieve at least 90% CPU usage during our runs.

We quantify our test variance (aka, noise) by running multiple iterations.  We ignore the first iteration which we’ve found to have greater variance, and calculate the average and standard deviation for the remaining three.  Standard deviation is our noise indicator.

Step 4: Analyze the Results

After doing our baseline and test runs, we should have results like the following:

Scenario Baseline Result Diff Baseline StdDev Result StdDev Pass/Fail
HelloWorld 41549.67 40432.67 -2.69% 0.64% 0.35% PASS

These are the columns:

  • Baseline, Result – Requests per second average for the last 3 iterations of the run
  • Diff – Percentage of how far off the Result is from the Baseline
  • Baseline StdDev, Result StdDev – Percentage of variance between the run iterations
  • Pass / Fail – Whether the Diff is above our threshold.

Based on our experience we’ve set our threshold at 5%.  We define failures as runs that are more than 5% regressed.  Runs that show >5% improvement should also be investigated in order to understand and validate the cause.

We consider a run noisy if the standard deviation exceeds a threshold, which we also set at 5%.  If a failed run is noisy, we throw out one more iteration to see if the Diff and StdDev improve.  If they do we may ignore the failure and wait for the next run.  If the runs continue to be noisy, then the scenario should be investigated in order to further reduce the variance.

Step 5: Investigate Regressions

As the saying goes, “Measure Early, Measure Often”.  The less changes there are between your runs, the easier it will be to track down the cause of your regressions.  This is why the ASP.NET performance lab runs daily with builds from multiple branches.  Often times I’m able to quickly identify a regression simply by looking at the source control history.

Another way to quickly diagnose regressions is to enable performance counters or other non-invasive tracing which could help identify the cause.  We always save the wcat logs with our performance counters and other useful information such as throughput, working set, percent cpu usage, and HTTP responses.  The HTTP responses can rule out test failures, while the other diagnostics (combined with source control) can help narrow down the cause.

Finally, if a quick diagnosis is not possible, find a good profiler.  Some of the tools we use are the Visual Studio (F1) profiler, CLR Profiler, and XPerf.  I hope to demonstrate some of these in future posts.

Comments

  • Anonymous
    February 09, 2012
    When testing from one release to the next, is the test run on identical hardware to the previous?

  • Anonymous
    February 09, 2012
    Would it matter if you used MVC? Would it matter if you used different view-engines e.g. Razor @("Hello World")  ?

  • Anonymous
    February 09, 2012
    Phillip - Yes, I don't want the hardware to be a variable.  If I do update my hardware then I make sure to re-run my baseline.

  • Anonymous
    February 09, 2012
    Eric - I'm assuming you're looking for a HelloWorld comparison across WebForms, MVC2 (WebForms View Engine) and MVC3 (Razor View Engine)?  I haven't done that comparison - but maybe I can look into it for a future post.

  • Anonymous
    February 09, 2012
    The comment has been removed

  • Anonymous
    February 09, 2012
    There's no network latency in the measurements.  This is requests per second on the server, not response times on the client.  That is another approach and which approach you take depends on your goals. Load is added from remote clients so that wcat and other unrelated processes are not competing for system resources with IIS and ASP.NET.  I have confidence that my web server performance -- good or bad -- is actually due to the request processing. There is a variety of hardware in our lab, just as I'm sure our customers have.  Unfortunately we don't have infinite time and resources - so picking good scenarios and deciding how often to run them is important.

  • Anonymous
    February 10, 2012
    Very useful, thanks for sharing!