ORDER BY and OFFSET/FETCH Clause (U-SQL)

Summary

U-SQL provides the optional ORDER BY clause to order a rowset generated by a query expression such as a SELECT expression, set expression or function invocation, and as an optional clause on the OUTPUT statement.

Since all rowsets flow unordered through the query processor to provide better optimization until the rowset is returned or written into a file, having an ORDER BY clause without a FETCH clause is meaningless on the current batch query statements. Thus, the ORDER BY clause in U-SQL has to contain a FETCH clause.

To output with a specific order, use ORDER BY with the OUTPUT statement which has no such restrictions and makes the OFFSET/FETCH clause optional. For more information on the OUTPUT statement's specific ordering and examples please refer to Output Statement (U-SQL). Note that the ORDER BY and OFFSET/FETCH semantics also apply to the OUTPUT statement’s ORDER BY clause.

Syntax

Order_By_Fetch_Clause :=                                                                                 
     Order_By_Clause Offset_Fetch.
Order_By_Clause := 'ORDER' 'BY' Sort_Item_Expression_List.

Remarks

  • Sort_Item_Expression_List
    The ORDER BY sort item expression list can refer to any of the columns in the rowset and can include any C# expression.

Syntax

  Sort_Item_Expression_List :=                                                                        
       Sort_Item_Expression { ',' Sort_Item_Expression }.
Sort_Item_Expression := expression [Sort_Direction].

The order gets applied from left to right in the expression list and may either be sorted in ascending or descending order depending on the optionally specified sort direction. The default sort direction is ascending.

U-SQL always orders null values last, regardless of the sort direction or data types.

  • Offset_Fetch
    The full syntax of the OFFSET/FETCH clause is:

Syntax

  Offset_Fetch :=                                                                                     
       ['OFFSET' integer_or_long_literal ('ROW' | 'ROWS')] [Fetch].
Fetch := 'FETCH' ['FIRST' | 'NEXT'] integer_or_long_literal ['ROW' | 'ROWS'] ['ONLY'].

The OFFSET/FETCH clause is the ANSI SQL-conformant way to specify getting the first number of rows. U-SQL makes many of the keywords optional to minimize the amount of typing required. If the OFFSET x ROWS clause is not specified, it defaults to OFFSET 0 ROWS. If the FETCH clause is not specified, then all the rows starting after the skipped rows are being returned.

Note

There is no guarantee that different invocations with the same or different offsets operate from a single snapshot of the ordered rowset. The set of returned rows may be non-deterministically impacted if the order specification is not deterministic (e.g., the order by clause is under-specified, so that multiple rows can be ordered in the same local position) or the data changes between different invocations.

The OFFSET and FETCH clauses allow specifying integer or long literal values. The value for the FETCH clause has to be in the range [1, 10000], otherwise an error is raised.

Note

Other SQL dialects may offer different syntactic options such as TOP() or a LIMIT clause, while U-SQL currently offers the FETCH clause.

Examples

  • The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
  • The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
  • The examples below are based on the dataset defined below. Ensure your execution includes the rowset variable.

Dataset

@employees = 
    SELECT * FROM 
        ( VALUES
        (1, "Noah",   100, (int?)10000, new DateTime(2012,05,31), "cell:030-0074321,office:030-0076545"),
        (2, "Sophia", 100, (int?)15000, new DateTime(2012,03,19), "cell:(5) 555-4729,office:(5) 555-3745"),
        (3, "Liam",   100, (int?)30000, new DateTime(2014,09,14), "cell:(5) 555-3932"),
        (4, "Amy",    100, (int?)35000, new DateTime(1999,02,27), "cell:(171) 555-7788,office:(171) 555-6750, home:(425) 555-6238"),
        (5, "Justin", 100, (int?)15000, new DateTime(2015,01,12), "cell:0921-12 34 65,office:0921-12 34 67"),
        (6, "Emma",   200, (int?)8000,  new DateTime(2014,03,08), (string)null),
        (7, "Jacob",  200, (int?)8000,  new DateTime(2014,09,02), ""),
        (8, "Olivia", 200, (int?)8000,  new DateTime(2013,12,11), "cell:88.60.15.31,office:88.60.15.32"),
        (9, "Mason",  300, (int?)50000, new DateTime(2016,01,01), "cell:(91) 555 22 82,office:(91) 555 91 99, home:(425) 555-2819"),
        (10, "Ava",   400, (int?)15000, new DateTime(2014,09,14), "cell:91.24.45.40,office:91.24.45.41"),
        (11, "Ethan", 400, (int?)9000,  new DateTime(2015,08,22), "cell:(604) 555-4729,office:(604) 555-3745"),
        (12, "David", 800, (int?)100,   new DateTime(2016,11,01), "cell:(171) 555-1212"),
        (13, "Andrew", 100, (int?)null, new DateTime(1995,07,16), "cell:(1) 135-5555,office:(1) 135-4892"),
        (14, "Jennie", 100, (int?)34000, new DateTime(2000,02,12), "cell:(5) 555-3392,office:(5) 555-7293")
        ) AS T(EmpID, EmpName, DeptID, Salary, StartDate, PhoneNumbers);

SELECT & ORDER BY using FETCH only
Returns first five records.

@result =
    SELECT *
    FROM @employees
    ORDER BY EmpID ASC
    FETCH 5 ROWS;    

OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleA.txt"
USING Outputters.Csv();

SELECT & ORDER BY using OFFSET and FETCH
Skip first two records, then return next five records.

@result =
    SELECT *
    FROM @employees
    ORDER BY EmpID ASC
    OFFSET 2 ROWS 
    FETCH 5 ROWS;

OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleB.txt"
USING Outputters.Csv();

SELECT & ORDER BY using OFFSET only
Skip first two records, then return remainder.

@result =
    SELECT *
    FROM @employees
    ORDER BY EmpID ASC
    OFFSET 2 ROWS; 

OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleC.txt"
USING Outputters.Csv();

OUTPUT & ORDER BY

OUTPUT @employees
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleD.txt"
ORDER BY EmpID ASC
OFFSET 2 ROW    // Can be used without FETCH
FETCH 5 ROWS    // Can be used without OFFSET
USING Outputters.Csv();

SELECT and OUTPUT using ORDER BY

// Order by EmpID ASC and select all but first two records
@result =
    SELECT *
    FROM @employees
    ORDER BY EmpID ASC
    OFFSET 2 ROWS;

// Then output remaining records in reverse EmpID order
OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleE.txt"
ORDER BY EmpID DESC
USING Outputters.Csv();

See Also