CREATE TABLE (U-SQL): Creating a Table from a Query
Summary
Often a script converts unstructured data in a file into a table, by first extracting the data using an EXTRACT
expression and then inserting it into a table after some optional transformations. In order to simplify the process, U-SQL provides the ability to create a table from a U-SQL query expression. The CREATE TABLE AS
statement will infer the schema from the query expression and will create a clustered table, thus the clustered index needs to be provided as part of the CREATE TABLE AS
statement.
Syntax
Create_Managed_Table_As_Query_Statement := 'CREATE' 'TABLE' ['IF' 'NOT' 'EXISTS'] Identifier Table_As_Query.
Table_As_Query := ( '(' Table_Index Horizontal_Partition_Specification ')' | '(' Table_Index ')' Horizontal_Partition_Specification ) 'AS' Query_Expression
Horizontal_Partition_Specification := 'PARTITIONED' 'BY' Partition_Type ['INTO' positive_integer_literal].
Remarks
Identifier
Specifies the name of the schema. If the Identifier is a three-part identifier, the table will be created in the specified database and schema. If it is a two-part identifier, then the table will be created in the specified schema of the current database context. If the identifier is a simple identifier, then the table will be created in the current database and schema context.If a table or other object of the given name already exists in the specified database and schema context or the user has no permissions to create a table, an error is raised.
Table_Index
Specifies the clustered index for the table. For more details on defining a table index, see CREATE TABLE (U-SQL): Creating a Table with Schema.Horizontal_Partition_Specification
Only horizontal partitioning is currently supported with the CREATE TABLE AS statement. For more detail on the partitioning schemes and options see CREATE TABLE (U-SQL): Creating a Table with Schema.Query_Expression
Provides the query expression that defines the schema of the data and provides the initial data values. Any U-SQL query expression can be used to create a table, including SELECT, EXTRACT,PRODUCE
, invocation of a TVF etc..
Examples
- The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
- The examples below use the sample data,
/Samples/Data/SearchLog.tsv
, provided with your Data Lake Analytics account. See Prepare source data for additional information.
Creating a table from a query - an extract
The following example creates a table called searchLogTable
from SearchLog.tsv
.
DROP TABLE IF EXISTS dbo.searchLogTable;
CREATE TABLE dbo.searchLogTable (
INDEX clx_UserId CLUSTERED(UserId ASC)
DISTRIBUTED BY HASH (UserId)
) AS EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
Creating a table from a query - a rowset variable
The following example creates a table called searchLogTable
from SearchLog.tsv
.
CREATE DATABASE IF NOT EXISTS TestReferenceDB;
USE DATABASE TestReferenceDB;
// Extract data from file
@data =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
// Create table based on extraction and populate table
DROP TABLE IF EXISTS dbo.searchLogTable;
CREATE TABLE dbo.searchLogTable (
INDEX clx_UserId CLUSTERED(UserId ASC)
DISTRIBUTED BY HASH (UserId)
) AS SELECT * FROM @data;
Creating a table from a query - another table
The following example creates a table called searchLogTableCopy
from searchLogTable
, created from prior example.
DROP TABLE IF EXISTS dbo.searchLogTableCopy;
CREATE TABLE dbo.searchLogTableCopy (
INDEX clx_UserId CLUSTERED(UserId ASC)
DISTRIBUTED BY HASH (UserId)
) AS SELECT * FROM dbo.searchLogTable; // Can also use a View
Creating a table from a query - invocation of a TVF
The following example creates a table called searchLogFromFunction
by invoking the fuction tvf_SearchLog
which was created from Basic Syntax - tvf_SearchLog.
DROP TABLE IF EXISTS dbo.searchLogFromFunction;
CREATE TABLE dbo.searchLogFromFunction (
INDEX clx_UserId CLUSTERED(UserId ASC)
DISTRIBUTED BY HASH (UserId)
) AS dbo.tvf_SearchLog();