STDEVP (U-SQL)

Summary

The STDEVP aggregator returns the statistical standard deviation for the population for all nonnull values in the group or returns null if all the input values are null. The values have to be numeric or an error is raised.

The identity value is null.

Syntax

STDEVP_Expression :=                                                                                     
     'STDEVP' '(' ['DISTINCT'] expression ')'.

Remarks

  • DISTINCT
    Optionally allows to de-duplicate the values returned by the expression inside the group before aggregation.

  • expression
    The C# expression (including column references) that gets aggregated. The type of the expression has to be a numeric type or an error is raised.

Return Type

The return type is double?.

Usage in Windowing Expression

This aggregator can be used in a windowing expression without any additional restrictions.

Examples

  • The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.

  • The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.

  • The examples below are based on the dataset defined below. Ensure your execution includes the rowset variable.

      @employees = 
          SELECT * FROM 
              ( VALUES
              (1, "Noah",   "Engineering", 100, 10000),
              (2, "Sophia", "Engineering", 100, 20000),
              (3, "Liam",   "Engineering", 100, 30000),
              (4, "Emma",   "HR",          200, 8000),
              (5, "Jacob",  "HR",          200, 8000),
              (6, "Olivia", "HR",          200, 8000),
              (7, "Mason",  "Executive",   300, 50000),
              (8, "Ava",    "Marketing",   400, 15000),
              (9, "Ethan",  "Marketing",   400, 9000) 
              ) AS T(EmpID, EmpName, DeptName, DeptID, Salary);
    

A. Using STDEVP
The following query returns the standard deviation for the population of all: 1) salary values, and 2) distinct salary values.

@result =
    SELECT STDEVP(Salary) AS STDEVP_AllSalaries,
           STDEVP(DISTINCT Salary) AS STDEVP_DistinctSalaries
    FROM @employees;

OUTPUT @result
TO "/Output/ReferenceGuide/stdevp/exampleA.csv"
USING Outputters.Csv();

B. STDEVP per group
The following query determines the standard deviation for the salary population for each department with the GROUP BY clause.

@result =
    SELECT DeptName,
           STDEVP(Salary) AS STDEVPSalaryByDept
    FROM @employees
    GROUP BY DeptName;

OUTPUT @result
TO "/Output/ReferenceGuide/stdevp/exampleB.csv"
USING Outputters.Csv();

C. STDEVP with OVER()
The OVER clause in the following query is empty which defines the "window" to include all rows. The query determines the standard deviation for the salary population over the window - all employees.

@result =
    SELECT EmpName,
           STDEVP(Salary) OVER() AS STDEV_AllSalaries
    FROM @employees;

OUTPUT @result
TO "/Output/ReferenceGuide/stdevp/exampleC.csv"
USING Outputters.Csv();

D. STDEVP over a defined window using OVER()
The OVER clause in the following query is DeptName. The query returns EmpName, DeptName, Salary, and the standard deviation for the salary population over the window - DeptName.

@result =
    SELECT EmpName,
           DeptName,
           Salary,
           STDEVP(Salary) OVER(PARTITION BY DeptName) AS STDEVPSalaryByDept
    FROM @employees;

OUTPUT @result
TO "/Output/ReferenceGuide/stdevp/exampleD.csv"
USING Outputters.Csv();

See Also