How to specify a custom catalog name for Azure Databricks Delta Lake Dataset in ADF

Tom Young 0 Reputation points
2024-01-04T06:25:53.11+00:00

Hello,

I am creating an Azure Databricks Delta Lake Dataset in ADF and I am only able to choose the database name that links to Databricks's hive_metastore. How can I specify a custom catalog name that I created in Databricks instead of hive_metastore?

Thank you.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,409 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,045 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,012 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Edward Loughran 5 Reputation points
    2024-06-07T12:27:04.5566667+00:00

    You can specify the catalog within the dataset by using catalog`.`schema within the database. When the dataset is used to reference the table it always wraps the database name and table name with so by doing this it will be referenced as `catalog`.`schema`.`table`.

    Hope this helps

    {
        "name": "AzureDatabricksDeltaLakeDataset1",
        "properties": {
            "linkedServiceName": {
                "referenceName": "databricks",
                "type": "LinkedServiceReference"
            },
            "parameters": {
                "p_catalog_name": {
                    "type": "string"
                },
                "p_db_name": {
                    "type": "string"
                },
                "p_table_name": {
                    "type": "string"
                }
            },
            "annotations": [],
            "type": "AzureDatabricksDeltaLakeDataset",
            "typeProperties": {
                "database": {
                    "value": "@concat(dataset().p_catalog_name, '`.`', dataset().p_db_name)",
                    "type": "Expression"
                },
                "table": {
                    "value": "@dataset().p_table_name",
                    "type": "Expression"
                }
            },
            "schema": []
        }
    }
    
    1 person found this answer helpful.

  2. AnnuKumari-MSFT 31,816 Reputation points Microsoft Employee
    2024-01-08T06:06:42.6766667+00:00

    Hi Tom Young ,

    Welcome to Microsoft Q&A platform and thanks for posting your query here.

    As per my understanding you are trying to check whether we can specify the catalog name or schema details in the databricks delta lake dataset or not. Kindly let me know if that is not the ask here.

    By default only 3 mandatory properties are needed for databricks delta lake dataset in ADF pipeline- type, database and table.

    In case you want to specify any additional property details, related to schema then you can edit the dataset json and add the schema property.

    Here is the json format:

    {
        "name": "AzureDatabricksDeltaLakeDataset",
        "properties": {
            "type": "AzureDatabricksDeltaLakeDataset",
            "typeProperties": {
                "database": "<database name>",
                "table": "<delta table name>"
            },
            "schema": [ < physical schema, optional, retrievable during authoring > ],
            "linkedServiceName": {
                "referenceName": "<name of linked service>",
                "type": "LinkedServiceReference"
            }
        }
    }
    
    
    

    For more details, kindly check out the below resources:

    Dataset properties for Azure Databricks Delta Lake

    Parameterize Linked Services using Advanced section in Azure Data Factory

    Hope it helps . Kindly accept the answer by clicking on Accept answer button. Thankyou


  3. Harun Raseed Basheer 160 Reputation points MVP
    2024-02-06T21:38:58.3033333+00:00

    Hi Tom Young

    In AzureDatabricksDeltaLakeDataset, As of now there is no option to select the Catalog. You just create the dataset and in the Source Settings choose Query option instead of Table and then write the query like i have used, It will allow you to connect to your Unity Catalog Tables. User's image

    Hope it helps . Kindly accept the answer by clicking on Accept answer button. Thankyou