Data size of databricks delta tables

NIKHIL KUMAR 101 Reputation points
2024-05-02T09:39:01.4133333+00:00

It has been observed that the size of delta tables are much less as compared to when checked the underlying delta files in the storage account.

Suppose a databricks delta table raw.deltaTableA has size of 2MB if we check the size of underlying delta files directly in data lake it shows 50MB.

How this data size is being calculated ?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,409 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,045 questions
{count} votes

Accepted answer
  1. Vinodh247 12,741 Reputation points
    2024-05-02T13:46:37.8+00:00

    Hi NIKHIL KUMAR,

    Thanks for reaching out to Microsoft Q&A.

    A delta table is a high-level abstraction that represents your data in a structured format. It includes metadata, schema information, and transaction logs. Delta files, on the other hand, are the actual data files stored in the underlying storage such as Azure Data Lake Storage. These files contain the raw data in a columnar format (Parquet or Delta format) hence the size you are seeing might be higher than the table.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful