How to Create Delta Table in Azure Synapse Analytics with Id Auto Increment Identity Column ?

Question

I have created the Delta Lake Delta tables In ADLS using Synapse Notebook and in that table, I want to add an identity column (Auto increment 1,1) but I am not able to create the same, Below is my Create table script and error which i am facing.

Table Script:

%%sql
CREATE TABLE Staging.DimContract_NAC_Test
(
    DimContractKey BIGINT GENERATED ALWAYS AS IDENTITY,
    IdentCode INT,
    ContractId INT,
    SourceUpdatedDate TIMESTAMP,
    EffectiveStartDate DATE,
    EffectiveEndDate DATE,
    ETLId INT,
    SourceId INT,
    IsIdentCodeDuplicate boolean
)
USING DELTA;

Error:

Syntax error at or near 'GENERATED'(line 3, pos 26) == SQL == CREATE TABLE Staging.DimContract_NAC_Test ( DimContractKey BIGINT GENERATED ALWAYS AS IDENTITY, --------------------------^^^ IdentCode INT, ContractId INT, SourceUpdatedDate TIMESTAMP, EffectiveStartDate DATE, EffectiveEndDate DATE, ETLId INT, SourceId INT, IsIdentCodeDuplicate boolean ) USING DELTA org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:306) org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:143) org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:52) org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:89) io.delta.sql.parser.DeltaSqlParser.$anonfun$parsePlan$1(DeltaSqlParser.scala:78) io.delta.sql.parser.DeltaSqlParser.parse(DeltaSqlParser.scala:111) io.delta.sql.parser.DeltaSqlParser.parsePlan(DeltaSqlParser.scala:75) com.microsoft.azure.synapse.ml.predict.SynapsePredictParser.parsePlan(SynapsePredictParser.scala:19) org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:620) org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:120) org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:620) org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617) org.apache.livy.repl.SQLInterpreter.execute(SQLInterpreter.scala:150) org.apache.livy.repl.Session.$anonfun$executeCode$1(Session.scala:814) scala.Option.map(Option.scala:230) org.apache.livy.repl.Session.executeCode(Session.scala:811) org.apache.livy.repl.Session.$anonfun$execute$9(Session.scala:536) org.apache.livy.repl.Session.withRealtimeOutputSupport(Session.scala:1038) org.apache.livy.repl.Session.$anonfun$execute$2(Session.scala:536) scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) scala.util.Success.$anonfun$map$1(Try.scala:255) scala.util.Success.map(Try.scala:213) scala.concurrent.Future.$anonfun$map$1(Future.scala:292) scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:750)

Screenshot 2024-06-04 113610

Accepted Answer

When I check the documentation :

You can define a table as having the IDENTITY property when you first create the table by using syntax that is similar to the following statement:

CREATE TABLE dbo.T1
(    C1 INT IDENTITY(1,1) NOT NULL
,    C2 INT NULL
)
WITH
(   DISTRIBUTION = HASH(C2)
,   CLUSTERED COLUMNSTORE INDEX
)
;

Update :

After verifying I think GENERATED ALWAYS AS IDENTITY syntax is not supported directly in Delta Lake

Instead, you can use a workaround to simulate an auto-incrementing column.

Create the table without the identity column.

%%sql
CREATE TABLE Staging.DimContract_NAC_Test
(
    IdentCode INT,
    ContractId INT,
    SourceUpdatedDate TIMESTAMP,
    EffectiveStartDate DATE,
    EffectiveEndDate DATE,
    ETLId INT,
    SourceId INT,
    IsIdentCodeDuplicate BOOLEAN
)
USING DELTA;

Insert data into the table while generating an auto-incrementing column using Spark.

from pyspark.sql import SparkSession
from pyspark.sql.functions import monotonically_increasing_id

# Initialize Spark Session
spark = SparkSession.builder.appName("AutoIncrementColumn").getOrCreate()

# Load your data into a DataFrame
data = [
    (123, 1, '2023-06-01 12:00:00', '2023-06-01', '2024-06-01', 1001, 2001, True),
    (124, 2, '2023-06-02 13:00:00', '2023-06-02', '2024-06-02', 1002, 2002, False),
    # Add more rows as needed
]

columns = ["IdentCode", "ContractId", "SourceUpdatedDate", "EffectiveStartDate", "EffectiveEndDate", "ETLId", "SourceId", "IsIdentCodeDuplicate"]
df = spark.createDataFrame(data, columns)

# Add an auto-incrementing column
df_with_id = df.withColumn("DimContractKey", monotonically_increasing_id())

# Reorder columns if necessary
df_with_id = df_with_id.select("DimContractKey", "IdentCode", "ContractId", "SourceUpdatedDate", "EffectiveStartDate", "EffectiveEndDate", "ETLId", "SourceId", "IsIdentCodeDuplicate")

# Write the DataFrame to the Delta table
df_with_id.write.format("delta").mode("append").saveAsTable("Staging.DimContract_NAC_Test")

Share via

How to Create Delta Table in Azure Synapse Analytics with Id Auto Increment Identity Column ?

0 additional answers