ORDER BY and OFFSET/FETCH Clause (U-SQL)
Summary
U-SQL provides the optional ORDER BY
clause to order a rowset generated by a query expression such as a SELECT
expression, set expression or function invocation, and as an optional clause on the OUTPUT
statement.
Since all rowsets flow unordered through the query processor to provide better optimization until the rowset is returned or written into a file, having an ORDER BY
clause without a FETCH
clause is meaningless on the current batch query statements. Thus, the ORDER BY
clause in U-SQL has to contain a FETCH
clause.
To output with a specific order, use ORDER BY
with the OUTPUT
statement which has no such restrictions and makes the OFFSET/FETCH
clause optional. For more information on the OUTPUT
statement's specific ordering and examples please refer to Output Statement (U-SQL). Note that the ORDER BY
and OFFSET/FETCH
semantics also apply to the OUTPUT
statement’s ORDER BY
clause.
Syntax
Order_By_Fetch_Clause := Order_By_Clause Offset_Fetch.
Order_By_Clause := 'ORDER' 'BY' Sort_Item_Expression_List.
Remarks
Sort_Item_Expression_List
The ORDER BY sort item expression list can refer to any of the columns in the rowset and can include any C# expression.
Syntax
Sort_Item_Expression_List := Sort_Item_Expression { ',' Sort_Item_Expression }.
Sort_Item_Expression := expression [Sort_Direction].
The order gets applied from left to right in the expression list and may either be sorted in ascending or descending order depending on the optionally specified sort direction. The default sort direction is ascending.
U-SQL always orders null values last, regardless of the sort direction or data types.
Syntax
Offset_Fetch := ['OFFSET' integer_or_long_literal ('ROW' | 'ROWS')] [Fetch].
Fetch := 'FETCH' ['FIRST' | 'NEXT'] integer_or_long_literal ['ROW' | 'ROWS'] ['ONLY'].
The OFFSET/FETCH
clause is the ANSI SQL-conformant way to specify getting the first number of rows. U-SQL makes many of the keywords optional to minimize the amount of typing required. If the OFFSET x ROWS
clause is not specified, it defaults to OFFSET 0 ROWS
. If the FETCH
clause is not specified, then all the rows starting after the skipped rows are being returned.
Note
There is no guarantee that different invocations with the same or different offsets operate from a single snapshot of the ordered rowset. The set of returned rows may be non-deterministically impacted if the order specification is not deterministic (e.g., the order by clause is under-specified, so that multiple rows can be ordered in the same local position) or the data changes between different invocations.
The OFFSET
and FETCH
clauses allow specifying integer or long literal values. The value for the FETCH
clause has to be in the range [1, 10000], otherwise an error is raised.
Note
Other SQL dialects may offer different syntactic options such as TOP()
or a LIMIT
clause, while U-SQL currently offers the FETCH
clause.
Examples
- The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
- The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
- The examples below are based on the dataset defined below. Ensure your execution includes the rowset variable.
Dataset
@employees =
SELECT * FROM
( VALUES
(1, "Noah", 100, (int?)10000, new DateTime(2012,05,31), "cell:030-0074321,office:030-0076545"),
(2, "Sophia", 100, (int?)15000, new DateTime(2012,03,19), "cell:(5) 555-4729,office:(5) 555-3745"),
(3, "Liam", 100, (int?)30000, new DateTime(2014,09,14), "cell:(5) 555-3932"),
(4, "Amy", 100, (int?)35000, new DateTime(1999,02,27), "cell:(171) 555-7788,office:(171) 555-6750, home:(425) 555-6238"),
(5, "Justin", 100, (int?)15000, new DateTime(2015,01,12), "cell:0921-12 34 65,office:0921-12 34 67"),
(6, "Emma", 200, (int?)8000, new DateTime(2014,03,08), (string)null),
(7, "Jacob", 200, (int?)8000, new DateTime(2014,09,02), ""),
(8, "Olivia", 200, (int?)8000, new DateTime(2013,12,11), "cell:88.60.15.31,office:88.60.15.32"),
(9, "Mason", 300, (int?)50000, new DateTime(2016,01,01), "cell:(91) 555 22 82,office:(91) 555 91 99, home:(425) 555-2819"),
(10, "Ava", 400, (int?)15000, new DateTime(2014,09,14), "cell:91.24.45.40,office:91.24.45.41"),
(11, "Ethan", 400, (int?)9000, new DateTime(2015,08,22), "cell:(604) 555-4729,office:(604) 555-3745"),
(12, "David", 800, (int?)100, new DateTime(2016,11,01), "cell:(171) 555-1212"),
(13, "Andrew", 100, (int?)null, new DateTime(1995,07,16), "cell:(1) 135-5555,office:(1) 135-4892"),
(14, "Jennie", 100, (int?)34000, new DateTime(2000,02,12), "cell:(5) 555-3392,office:(5) 555-7293")
) AS T(EmpID, EmpName, DeptID, Salary, StartDate, PhoneNumbers);
SELECT & ORDER BY using FETCH only
Returns first five records.
@result =
SELECT *
FROM @employees
ORDER BY EmpID ASC
FETCH 5 ROWS;
OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleA.txt"
USING Outputters.Csv();
SELECT & ORDER BY using OFFSET and FETCH
Skip first two records, then return next five records.
@result =
SELECT *
FROM @employees
ORDER BY EmpID ASC
OFFSET 2 ROWS
FETCH 5 ROWS;
OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleB.txt"
USING Outputters.Csv();
SELECT & ORDER BY using OFFSET only
Skip first two records, then return remainder.
@result =
SELECT *
FROM @employees
ORDER BY EmpID ASC
OFFSET 2 ROWS;
OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleC.txt"
USING Outputters.Csv();
OUTPUT & ORDER BY
OUTPUT @employees
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleD.txt"
ORDER BY EmpID ASC
OFFSET 2 ROW // Can be used without FETCH
FETCH 5 ROWS // Can be used without OFFSET
USING Outputters.Csv();
SELECT and OUTPUT using ORDER BY
// Order by EmpID ASC and select all but first two records
@result =
SELECT *
FROM @employees
ORDER BY EmpID ASC
OFFSET 2 ROWS;
// Then output remaining records in reverse EmpID order
OUTPUT @result
TO "/Output/ReferenceGuide/PrimaryRowsetExpressions/Select/OrderBy/exampleE.txt"
ORDER BY EmpID DESC
USING Outputters.Csv();