Hierarchies WITH Common Table Expressions

Storing hierarchical data in a database is a very common task, and SQL Server 2005 brings us Common Table Expressions (CTEs) – a very useful tool for retrieving such data both elegantly and efficiently. I haven’t actually used these before, and they came to mind during a session with a customer, so I did a bit of research to see if they would meet the customer’s needs, and thought I’d publish the path I took.

For the purposes of this post, I’ll be using the simple database schema shown below;

 

Database Schema

The key point to note is that an employee has a manager, and this is represented as a foreign key field (named Manager) on the Employee table that points to another record in the same Employee table, yielding our hierarchy.

Adapting the SQL from the Common Table Expression documentation in SQL Server Books Online results in the query below;

 WITH OrganisationChart (Id, [Name], JobTitle, Manager) AS(      SELECT            Id, [Name], JobTitle, Manager      FROM            dbo.Employee      WHERE             Manager IS NULL      UNION ALL       SELECT            emp.Id, emp.[Name], emp.JobTitle, emp.Manager      FROM            dbo.Employee emp      INNER JOIN OrganisationChart ON             emp.Manager = OrganisationChart.Id)SELECT * FROM OrganisationChart

Breaking down the WITH statement, our CTE, we see two distinct parts. The first SELECT fetches the top level employees in our organisation, those which have no manager. The second SELECT is the really clever bit – it selects all employees that have a manager in the results of a recursive call to the OrganisationChart CTE. This recursion builds a tree of employees and their manager when it is invoked by SELECT-ing from the CTE – the final line in the script. The documentation contains more explanation about how these clauses work together.

Running this query gives some results that might look like those below;

 

Results 1

 

Now when I first ran this query, I have to be honest, my first reaction was “so what?”! As an application developer, what does this give me that a “SELECT * FROM Employee” statement wouldn’t? Surely with either statement I’ve got to load all the data into memory and build my object hierarchy before I can filter it for a specific management branch of the organisation that I’m after?

Well, yes.

The documentation adds something that makes the query slightly more useful though – namely a “level” field, which indicates what level of the hierarchy this employee is in. Our new query looks very similar to the first, then, but with one more field;

WITH OrganisationChart (Id, [Name], JobTitle, [Level], Manager) AS

(

      SELECT

             Id, [Name], JobTitle, 0, Manager      FROM            dbo.Employee      WHERE             Manager IS NULL      UNION ALL       SELECT            emp.Id, emp.[Name], emp.JobTitle, [Level] + 1, emp.Manager      FROM            dbo.Employee emp      INNER JOIN OrganisationChart ON             emp.Manager = OrganisationChart.Id)SELECT * FROM OrganisationChartORDER BY [Level]

On the top level clause, the “Level” is fixed at zero for obvious reasons. And relying on recursion, Level is then defined in the second clause as the previous level plus one. This yields results as follows;

Results 2

Now the results seem much more useful – the Level field not only gives us an order that helps us build our in-memory object hierarchy by organising the data logically, but it also may meet your full requirements – if you only want a list of people that are 3 steps removed from a Director, a simple WHERE clause on the script’s final SELECT statement would suffice.

But let’s be honest – this isn’t what I usually want. What I want to be able to do is say “show me everyone that reports to X, directly or indirectly”. That is, to retrieve one whole branch of an organisation with a simple WHERE clause. This enables me to load only the data I want into memory, minimising memory consumption, processing, network bandwidth, and so on. But at the moment I can’t do this – if I filter on the Manager column I will only get direct reports.

Well, it turns out the answer is surprisingly simple. Consider the following modified query;

 WITH OrganisationChart (Id, [Name], JobTitle, [Level], Manager, [Root]) AS(      SELECT            Id, [Name], JobTitle, 0, Manager, Id      FROM            dbo.Employee      WHERE             Manager IS NULL      UNION ALL       SELECT            emp.Id, emp.[Name], emp.JobTitle, [Level] + 1, emp.Manager, [Root]      FROM            dbo.Employee emp      INNER JOIN OrganisationChart ON             emp.Manager = OrganisationChart.Id)SELECT * FROM OrganisationChartWHERE [Root] = 11

I have named the new field “Root”, and it holds the top-most manager for each employee. The key to making this work is that in the first SELECT clause of the WITH statement, Root is hard-coded to the Identifier of the current record. In the second SELECT clause, and therefore for all other levels, Root is selected directly out of the recursive call to the OrganisationChart CTE – so it is never modified throughout the hierarchy.

This immediately enables us to add a WHERE clause to limit the results to employees that report to a single manager, directly or indirectly, such as follows;

Results 3

In this example, I have retrieved all employees that report to the Finance Director – and no-one else.

Suppose you don’t want the whole organisation branch – perhaps you want to retrieve the hierarchy that report to a manager in charge of a group of teams, further down the organisation? No problem, just change the WHERE clause in the WITH statement’s first SELECT statement; for example, instead of WHERE Manager IS NULL you could try filtering on another known field, such as WHERE [Name] = ‘A Ince’. You could of course create a Stored Procedure, and parameterise this WHERE clause to enable you to retrieve a hierarchy below any individual you specify.

So to conclude, Common Table Expressions are indeed a powerful tool to ensure you can retrieve all the data you need, and only the data you need, in a flexible and straightforward way. This post has shown just one simple way that you can optimise your data retrieval with them. Enjoy!

Comments

  • Anonymous
    October 17, 2007
    PingBack from http://www.artofbam.com/wordpress/?p=9580

  • Anonymous
    October 30, 2007
    How about the other way around? Given Peon A, give me his chain of management all the way up through the CEO in one simple recursive query.  How would this be done?

  • Anonymous
    October 30, 2007
    There was an interesting question left on my last post entitled Hierarchies WITH Common Table Expressions

  • Anonymous
    October 30, 2007
    Hi - good question. As might be able to see from the automatic ping-back above, I've just put up a short post that should answer your question; http://blogs.msdn.com/simonince/archive/2007/10/31/common-table-expressions-reversed.aspx Hope that helps! Simon

  • Anonymous
    October 30, 2007
    There was an interesting question left on my last post entitled Hierarchies WITH Common Table Expressions

  • Anonymous
    January 08, 2008
    helo i have a hierarchie for an auditory the hierarchie is ok but now i need to show in a report the customer who was modify or create for the name and in a deteminate date, i need show the herarchie whit all information this is the query, i apreciate your help thank you CREATE PROCEDURE [dbo].[ReporteAuditoriaDatosDetallePropiedad] ( @Entidad       Varchar(200) ,@Propiedad    Varchar(100) ,@FechaInicial dateTime ,@FechaFinal   dateTime ) as DECLARE @Bitacora table ( IdOperacion      int        ,OperacionPadre  int ,NombreOperacion      varchar(200)        ,Entidad              Varchar(200)        ,Fecha                dateTime        ,Usuario              Varchar(200)        ,Ip                   Varchar(100)        ,IdEntidad            int   ) ;WITH CTEBitacora as ( select IdOperacion , OperacionPadre,NombreOperacion,right(NombreOperacion, charindex(' ', reverse(NombreOperacion)) - 1) as Entidad       ,Fecha,Usuario,Ip,IdEntidad        from  Bitacora b where right(NombreOperacion, charindex(' ', reverse(NombreOperacion)) - 1)  = @Entidad UNION ALL SELECT  b.IdOperacion , b.OperacionPadre,b.NombreOperacion,right(b.NombreOperacion, charindex(' ', reverse(b.NombreOperacion)) - 1) as Entidad       ,b.Fecha,b.Usuario,b.Ip,b.IdEntidad from CTEBitacora cte inner join bitacora b on cte.IdOperacion = b.OperacionPadre ) insert into @Bitacora    select IdOperacion, OperacionPadre,NombreOperacion,Entidad,Fecha,Usuario,Ip,IdEntidad    from CTEBitacora set dateformat dmy SELECT     B.IdOperacion, B.OperacionPadre , B.Entidad, B.NombreOperacion, B.Fecha, B.Usuario, B.IP,B.IdEntidad           ,  DatosBitacora.Propiedad,DatosBitacora.DescripcionPropiedad           , DatosBitacora.ValorAnterior, DatosBitacora.ValorNuevo FROM             @Bitacora as B  left JOIN                      DatosBitacora ON B.IdOperacion = DatosBitacora.IdOperacion WHERE    B.Fecha BETWEEN  convert(datetime, @FechaInicial, 103)       and  convert(datetime, dateadd(day,1,@FechaFinal)  ,103)       and  DatosBitacora.Propiedad=@Propiedad ORDER BY B.IDOPERACION

  • Anonymous
    January 12, 2008
    Ingrid; I have struggled a little to understand what you are trying to achieve as I don't know your database or system. I believe you are saying that you are using a CTE to retrieve the correct data, but now you also need to show more data in a report? If this is the case, the simplest solution is to join the results of the CTE back to your customer table, to fetch the other data. In other words, work out which data you need using the CTE, and then use this result set to fetch the data you wish to see in the report. It looks like you might already be heading down this path? My apologies if I have failed to understand your problem. Simon

  • Anonymous
    July 06, 2008
    The keyword WITH is giving me an error

  • Anonymous
    July 08, 2008
    Andile; in what context? I assume you are running SQL Server 2005 or later? If so, the most common situation I've seen cause errors is when incorrectly declaring the CTE in a stored procedure (e.g. trying to reference a WITH block defined in a different stored proc, or similar, which isn't possible). Simon

  • Anonymous
    October 17, 2008
    If you’ve been following my blog you should remember a couple of posts about hierarchical data in SQL

  • Anonymous
    November 11, 2008
    This post by Simon Ince explores the topic of Hierarchies with HierarchyID in SQL 2008. If you’ve been

  • Anonymous
    January 13, 2011
    Awesome -- your final example with the ROOT ID is just what I was looking for. Thanks!!

  • Anonymous
    April 03, 2014
    Just what I was looking for! This solution has answered a lot of questions for me. And really simple. Great article!