WHERE Clause (U-SQL)
Summary
The optional WHERE
clause in a SELECT expression will filter the rowset that the FROM clause calculated.
Syntax
Where_Clause := 'WHERE' Boolean_Expression.
Remarks
Syntax
Boolean_Expression := bool_expression | ('NOT' | '!') Boolean_Expression | Boolean_Expression ('AND' | '&&') Boolean_Expression | Boolean_Expression ('OR' | '||') Boolean_Expression.
The predicate can either be any C# expression that evaluates to a bool, or a negation of a Boolean expression, a conjunction or a disjunction. U-SQL’s Boolean logic is based on C# and thus is like in C# 2-valued logic where null == null evaluates to true
and null == 1 will evaluate to false
.
The expressions can of course refer to any of the columns in the rowset, can invoke any C# expression and function and method call as long as the functions and methods are included in the scope either implicitly or explicitly.
The AND and OR operators do not guarantee execution order of their operands to allow the query processor to reorder them for better performance. If the order is important, for example to guard a subsequent expression from a runtime error like a null exception, one should use C#’s && and || which will preserve the expression’s execution order from left to right and will shortcut the expression if the left side of the logical expression determines the outcome.
Examples
The following query finds all the search session in the @searchlog rowset that are in the en-gb
region.
@rs1 =
SELECT Start, Region, Duration
FROM @searchlog
WHERE Region == "en-gb";
Note the use of ==
in the example above instead of =
. This is because expressions in the SELECT
statement are true C# expressions where ==
is the comparison operator for equality.
The following example shows a more complex combination of AND and OR. It finds all the search sessions from the @searchlog rowset that lasted between 2 and 5 minutes or are in the en-gb region.
@rs2 =
SELECT Start, Region, Duration
FROM @searchlog
WHERE (Duration >= 2*60 AND Duration <= 5*60) OR (Region == "en-gb");
While U-SQL supports the BETWEEN comparison operation, the following example shows how to use AND and the DateTime.Parse() method to filter between two dates:
@rs3 =
SELECT Start, Region, Duration
FROM @searchlog
WHERE Start >= DateTime.Parse("2012/02/16") AND Start <= DateTime.Parse("2012/02/17");
Assuming the Region can contain null values and one wants to check if the first three characters correspond to the regions that start with "de-", one should use the C# && operator to guarantee that the null check occurs before applying the string operations and to short-circuit the expression if the check fails:
@rs4 =
SELECT Start, Region, Duration
FROM @searchlog
WHERE Start >= DateTime.Parse("2012/02/16") AND Start <= DateTime.Parse("2012/02/17")
AND (Region != null && Region.StartsWith("de-"));