Create schemas and tables for Unity Catalog enabled in Azure Databricks workspace

Ashwini Gaikwad 130 Reputation points
2024-03-26T15:42:47.8+00:00

Hello Team,

I have enabled Unity Catalog for our Azure Databricks workspace, now the challenge is to create schemas name and tables name under the catalog for multiple folders and files existing in our adls gen2 account.

for e.g. my container name is test-d-01 and inside that I have created multiple folders like acquisiton, operation and so on.

That means for reading one parquet/orc file in my adls gen2 account I have to navigate a path as mentioned in below examples:

test-d-01/acquisition/tactical/pint/*.parquet

test-d-01/acquisition/rap/cts/*.parquet

test-d-01/operation/model/alm/*.orc

Can you recommend any reliable way how should we create schemas name and table name in catalog for the above mentioned paths. It will be really very helpful for us, also I didn't find any relevant articles for the above scenario.

Regards,

Ashwini Gaikwad

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,466 questions
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,149 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,175 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 89,466 Reputation points Microsoft Employee
    2024-03-27T03:47:44.7866667+00:00

    @Ashwini Gaikwad - Thanks for the question and using MS Q&A platform.

    To create schemas and tables for Unity Catalog enabled in Azure Databricks workspace, you can follow the below steps:

    Step1: You need to create a database in Unity Catalog. You can create a database using the following command:

    %sql
    CREATE DATABASE IF NOT EXISTS <database_name> COMMENT 'database_comment' LOCATION 'database_location'
    

    Replace database_name with the name of your database, database_comment with a comment for your database, and database_location with the location where you want to store your database.

    Step2: Once you have created the database, you can create tables for your data. You can create tables using the following command:

    %sql
    CREATE TABLE IF NOT EXISTS <table_name> USING file_format OPTIONS (   'path' 'path_to_your_data',
      'recurse' 'true'
    )
    

    Replace table_name with the name of your table, file_format with the file format of your data (e.g. parquet, orc), and path_to_your_data with the path to your data in ADLS Gen2.

    You can also specify additional options for your table, such as partitioning and schema inference. For more information on creating tables in Unity Catalog, you can refer to the official documentation.

    Step3: To create tables for multiple folders and files, you can use a loop to iterate over the folders and files in your ADLS Gen2 account. Here's an example code snippet that creates tables for all parquet files in the test-d-01/acquisition/tactical/pint folder:

    %python
    import os
    database_name = 'my_database'
    file_format = 'parquet'
    base_path = 'test-d-01/acquisition/tactical/pint'
    for root, dirs, files in dbutils.fs.ls(base_path):
        for file in files:
            if file.endswith(f'.{file_format}'):
                table_name = os.path.splitext(file)[0]
                path = os.path.join(root, file)
                table_path = f'{base_path}/{table_name}'
                dbutils.fs.mkdirs(table_path)
                dbutils.fs.cp(path, table_path)
                spark.sql(f'CREATE TABLE IF NOT EXISTS {database_name}.{table_name} USING {file_format} OPTIONS (\'path\' \'{table_path}\')')
    

    This will create tables for all parquet files in the test-d-01/acquisition/tactical/pint folder, with the tables named after the file names (without the file extension) and stored in the my_database database in Unity Catalog.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.