Same Query on Different SQL Server Versions

Introduction

The same query with different results?!? How can it be? There are too many variables to answer a general question like, "Why is the same query executed differently in two different databases?" (for example, it's much slower in one database, or it costs many more resources in one database). Basically, the main reason for a different execution is a different execution plan, and there are several scenarios where we might get a different execution plan, like for example:

  • Different server hardware (Number of CPU, Memory size, slower or a busier disk, etc.).
  • Different DDL and/or DML (data, indexes, statistics, fragmentation, etc.). 
  • Different server settings (regional settings, Power Plans, etc.).
  • Different SQL Server Instance configurations (MAX DOP, etc.).
  • Different Connection properties.
  • Different SQL Server versions (service packs, CUs).

In this short article we will focus on differences that are a result of different versions.

In every new version of an application, there are three basic improvements: (1) Revolutionary new features, (2) Evolution of something that's been in previous versions, and (3) Bug fixes. Usually these changes can be found in articles published on the Internet. But these articles usually cover just the tip of the iceberg, of the visible or published changes in features. Behind the scenes there may be a lot of improvements that are not published, and which drastically affect the application, such as improving certain hidden algorithm  in the code, in order to improve the work of the application. The "Query Optimizer" is one of those features, which improved over time.

Our case study

Consider the case of inserting data to a new table, while sorting the source data, in different versions of SQL Server. For this case study, we will first create a new database, then we will create a new table and insert some sample data. In the next step, we will select the data while sorting it, and insert the result set to a new table. We will repeat this actions on several Server versions, and we will examine the results.

/*************************************************************** DDL + DML */
-- Creating New Database
CREATE DATABASE  SameQueryDifferentServerVersion
GO
-- Creating New Table
use SameQueryDifferentServerVersion
GO
-- Creating New table which will use as our source data
CREATE TABLE  SourceTbl(
    ID INT,
    FName NVARCHAR(100),
    MName NVARCHAR(100),
    LName NVARCHAR(100)
)
GO
-- Insert sample data (DML)
insert SourceTbl (ID,FName,MName,LName)
select 1,'a','e','r' UNION  ALL
select 2,'f','w','t' UNION  ALL
select 3,'b','q','y' UNION  ALL
select 4,'h',',','u' UNION  ALL
select 5,'g','m','i' UNION  ALL
select 6,'c','n','o' UNION  ALL
select 7,'e','b','p' UNION  ALL
select 8,'i','v','l' UNION  ALL
select 9,'j','c','k' UNION  ALL
select 10,'d','x','j'
GO
-- Creating New destination Table using data from our source table, while sorting the source data.
select ID, FName, MName, LName
into destinationTbl
from SourceTbl
order by  FName
GO

Let's try to present our data from both tables, the source and the destination tables, without sorting:

select ID, FName, MName, LName
from SourceTbl;
select ID, FName, MName, LName
from destinationTbl;
GO

Our data in both tables is stored as a heap (Tables without Clustered Indexes) without specifying an order. Usually data is initially stored in the order it is was inserted into the table, but the Database Engine can move data around in the heap to store the rows more efficiently. While reading the data without sorting, again usually, the records are read in the order in which they are stored. therefore, we cannot predicted the order of the records with 100% accuracy, but we can assume that the data will be read in the same order that it was inserted.

The data in the source table inserted ordered by the ID column, while the data in the destination table were inserted, while sorting them by 'FName' column, from the source data.

Are the results of the two queries returned as expected?

* The left image shows the results from the SourceTbl, while the right image shows the results from the destinationTbl. Both images are from older versions of SQL Server.

** On new versions of SQL Server both queries return the same results as the left Image shown here.

On older versions of server SQL Server like 2005, 2008, and 2008r2, the results returned in a different order. In the second query the results returned apparently sorted by the "FName" column, while the first query returned results apparently ordered as they were inserted to the table, sorted by the ID column. However when we used the same process on newer versions of servers like SQL 2012/2014, then the results probably repeated in the same order of the original data in both queries, sorted by the ID column. What is the reason for the differences in the order of data?

* Note: This is a common question in forums, and it was the trigger for the current article.

* Note: In order to get the results we discussed in the article, it is recommended to perform the test on a machine that is not busy!

* Note: The order in which rows are returned in a select query results is not guaranteed, unless ORDER BY is expressly stated. Therefore we wrote in the paragraph above that probably these will be the results, but some situations may well be different (this is discussed briefly below).

Investigating our case study, and explanation of the results

Let's explore the following query on all server's versions:

select ID, FName, MName, LName
from SourceTbl
order by  FName

We have the same execution plan on all versions of SQL Server. The server scans the table, sorts the data according to the column "FName", and produces the result. Obviously this query must sort the data in order to display them in the right order. Indeed all servers show the sorting operation in the execution plan. It is a very expensive operation (in resources), which cost about 78% of the query execution resources.

Let's go back to the original query of the discussion, and examine the execution plan in various versions of SQL Server.

-- Creating New Table using our table, while sorting the source data
select ID, FName, MName, LName
into destinationTbl
from SourceTbl
order by  FName
GO

Using SQL Server versions 2012 and above:

Using SQL Server versions 2005 to 2008r2:

We can clearly see that different versions of the server execute the query in a very different way!

In older versions (like SQL Server 2005, 2008, and 2008 R2) the data is inserted after sorting, but in SQL Serer 2012+ versions, we do not see any sorting in the execution plan.

According to Microsoft's official definition, we can not rely on the order the data is inserted into the tables. The order in which rows are returned in a select query results is not guaranteed, unless ORDER BY is explicitly used! The data will return in the most convenient and fastest way for the server to physically read the data. If the data is on the same PAGE on disk, and the pages are next to each other, and there is no parallelism work, then probably the fastest way to read it will be in the order the physical data is on the disk. In simple cases as presented here, typically the data return in the order in which they were inserted, since we do not have a clustered index. But as stated we must not rely on that!

If the data is not selected in a certain order, why do we need to insert the data in a particular order?!? The Query Optimizer in SQL Server 2012 is apparently improved so that sorting is not necessary. This makes great sense, and led to considerable improvements in resources. We can see that the data sorting is a heavy operation, which takes about 46% of the resources of the query (see the execution plan image on older versions). On SQL Server 2012, we can see that the server simply ignored our request for sorting, and it  built a better execution plan. The fastest way to read the data is "as it is", hence we get a substantial difference in results.

Conclusion

There are many variables that can lead to different results when we execute the same query. One of those variables is the server version.

Additional Resources

See Also