T-SQL: Gaps and Islands Problem

This article will consider a simple classical Gaps & Islands problem asked recently in Transact-SQL Forum at MSDN with non-original title "Query Help".

Problem Definition

The thread originator was kind enough to provide DDL of the table and some data to describe the task:

Create table  T1
(Id int  identity primary  key,
VoucherNo varchar(4),
TransNo varchar(10)
)
 
Insert into  T1 values  ('V100','Trns1'),('V101','Trns1'),('V102','Trns1'),('V103','Trns1'),('V104','Trns1'),('V106','Trns1')

And he also provided the desired output:

TransNo  FirstVoucher  LastVoucher  Quantity 
 Trns1 V100 V104 5
 Trns1 V106 V106 1

The problem is to find consecutive vouchers (100-104, then 106).

Solution

As mentioned, this is a common problem in Transact-SQL and it was described by Itzik Ben-Gan here or by Plamen Ratchev in this easy to understand blog post Refactoring Ranges. Knowing the main idea of the solution it is easy to provide it assuming that all voucher numbers come in the following format (letter V following by the 3 digit number):

;WITH cte
AS (
    SELECT *
        ,CAST(SUBSTRING(VoucherNo, 2, 3) AS  INT) - ROW_NUMBER() OVER (
            ORDER BY  VoucherNo
            ) AS  Grp
    FROM T1
    )
SELECT TransNo
    ,min(VoucherNo) AS  FirstVoucherNo
    ,max(VoucherNo) AS  LastVoucherNo
    ,count(*) AS  Quantity
FROM cte
GROUP BY  TransNo
    ,Grp

So, the idea of this solution is to group consecutive ranges first using ROW_NUMBER() function and then apply aggregate functions based on that group identifier.

Note, it is easy to modify this query to work with different formats of the voucher number (say, some combinations of letters following by any length number). This article is concentrating on the problem posted by the thread originator and is solving it for the particular voucher number format. You may want to see some modifications of my solution suggested by Ronen Ariely in the original thread. 

Another Example

I am here adding another example of Gaps and Islands problem that was asked in the T-SQL Forum "find the missing date from output data".
Here is the DDL and sample data.

CREATE TABLE  Person
    (PersonID INT  IDENTITY(1,1) PRIMARY KEY,
     PersonName varchar(3),
     SigninDate date
     );
  
INSERT INTO  Person VALUES
('AAA','20130301'),
('AAA','20130302'),
('AAA','20130305'),
('AAA','20130306'),
('AAA','20130307'),
('BBB','20130301'),
('BBB','20130302'),
('BBB','20130303'),
('BBB','20130305'),
('BBB','20130307');

The task was to find the missing dates in the given data. So here is the expected result.

PersonName Date
AAA  2013-03-03
AAA 2013-03-04 
BBB  2013-03-04 
BBB  2013-03-06

So the first task was to group the data on the 'PersonName', assign a row number to identify each element and then find the gaps in each these groups.

/*
Solution for a *Pre* SQL 2012 version
The first  CTE groups the PersonName and  assign a group  number and
a row identifier for  each group
 */
WITH CTE_GROUPS AS
(SELECT *,
        DENSE_RANK() OVER (ORDER BY  PersonName) AS  GroupID,
        ROW_NUMBER() OVER(PARTITION BY  PersonName ORDER  BY SigninDate) RowNum
    FROM Person
),
/*
The second  CTE will find the border values for  the GAPs
*/
CTE_GAPS AS
(SELECT C2.PersonName,
       C2.SigninDate GapStart,
       C1.SigninDate GapEnd
FROM CTE_GROUPS C1 INNER JOIN CTE_GROUPS C2 ON
    C1.GroupID=C2.GroupID and C1.RowNum=C2.RowNum+1
    AND ISNULL(DATEDIFF(DAY,C2.Signindate,C1.SigninDate),0)>1
)
SELECT * FROM CTE_GAPS

Here is how you could do the same in SQL 2012 using LAG function

WITH CTE_GAPS AS
(SELECT Personname,
        LAG(P.SigninDate) Over(Partition by  P.PersonName Order  By P.PersonName,p.Signindate) GapStart ,  
        SigninDate GapEnd,
        (DATEDIFF(DAY,LAG(P.SigninDate) Over(Partition by  P.PersonName Order  By P.PersonName,p.Signindate),P.SigninDate)-1) GapDays 
FROM Person P)
 
SELECT  P.PersonName, 
        C.CalendarDate 
FROM MASTER..Calendar C CROSS JOIN CTE_GAPS P
WHERE
GapDays>0 AND
    C.CalendarDate 
        BETWEEN DATEADD(DAY,1,P.GapStart ) AND DATEADD(DAY,-1,P.GapEnd)

See Also

This entry participated in the Technology Guru TechNet WiKi for July contest and won the gold prize.