How to import and export data using pg_azure_storage extension in Azure Database for PostgreSQL - Flexible Server

APPLIES TO: Azure Database for PostgreSQL - Flexible Server

The pg_azure_storage extension allows you to import or export data in multiple file formats, directly between Azure Storage accounts and an instance of Azure Database for PostgreSQL Flexible Server.

Examples of data export and import using this extension can be found in the Examples section of this article.

To use the pg_azure_storage extension on your Azure Database for PostgreSQL flexible server instance, you need to add the extension to the shared_preload_libraries, and also add it to the azure.extensions server parameter, as described in how to use PostgreSQL extensions.

Because shared_preload_library is a static server parameter, it requires a restart of the server for the change to take effect.

Once the server restarts, connect to your instance of PostgreSQL using the client of your preference (for example psql, pgAdmin, etc.). Confirm that SHOW azure.extensions;, and SHOW shared_preload_libraries;, both include the value azure_storage in the list of the comma-separated values returned by each of the SHOW statements.

Only then you can install the extension, by connecting to your target database, and running the CREATE EXTENSION statement. You need to repeat the command separately for each database in which you want the extension to be available.

CREATE EXTENSION azure_storage;

Permissions

Your Azure blob storage access keys are similar to a root password for your storage account. Always be careful to protect your access keys. Use Azure Key Vault to manage and rotate your keys securely. The account key is stored in a table that is accessible only by the superuser.

Users granted the azure_storage_admin role can interact with this table using the following functions:

  • account_add
  • account_list
  • account_remove
  • account_user_add
  • account_user_remove

The azure_storage_admin role is, by default, granted to the azure_pg_admin role.

azure_storage.account_add

Function that allows adding a storage account and its associated access key, to the list of storage accounts that the pg_azure_storage extension can access.

If a previous invocation of this function already added the reference to this storage account, it doesn't add a new entry but instead updates the access key of the existing entry.

Note

This function doesn't validate if the referred account name exists or if it's accessible with the access key provided. However, it validates that the name of the storage account is valid, according to the naming validation rules imposed on Azure storage accounts.

azure_storage.account_add(account_name_p text, account_key_p text);

Permissions

Must be a member of azure_storage_admin.

Arguments

account_name_p

text the name of the Azure blob storage account that contains all of your objects: blobs, files, queues, and tables. The storage account provides a unique namespace that is accessible from anywhere in the world over HTTPS.

account_key_p

text the value of one of the access keys for the storage account. Your Azure blob storage access keys are similar to a root password for your storage account. Always be careful to protect your access keys. Use Azure Key Vault to manage and rotate your keys securely. The account key is stored in a table that is accessible only by the superuser. Users granted the azure_storage_admin role can interact with this table via functions. To see which storage accounts are added, use the function account_list.

azure_storage.account_remove

Function that allows removing a storage account and its associated access key from the list of storage accounts that the pg_azure_storage extension can access.

azure_storage.account_remove(account_name_p text);

Permissions

Must be a member of azure_storage_admin.

Arguments

account_name_p

text the name of the Azure blob storage account that contains all of your objects: blobs, files, queues, and tables. The storage account provides a unique namespace that is accessible from anywhere in the world over HTTPS.

azure_storage.account_user_add

Function that allows granting a PostgreSQL user or role access to a storage account through the functions provided by the pg_azure_storage extension.

Note

The execution of this function only succeeds if the storage account, whose name is being passed as the first argument, was already created using account_add, and if the user or role, whose name is passed as the second argument, already exists.

azure_storage.account_add(account_name_p text, user_p regrole);

Permissions

Must be a member of azure_storage_admin.

Arguments

account_name_p

text the name of the Azure blob storage account that contains all of your objects: blobs, files, queues, and tables. The storage account provides a unique namespace that is accessible from anywhere in the world over HTTPS.

user_p

regrole the name of a PostgreSQL user or role available on the server.

azure_storage.account_user_remove

Function that allows revoking a PostgreSQL user or role access to a storage account through the functions provided by the pg_azure_storage extension.

Note

The execution of this function only succeeds if the storage account whose name is being passed as the first argument has already been created using account_add, and if the user or role whose name is passed as the second argument still exists. When a user or role is dropped from the server, by executing DROP USER | ROLE, the permissions that were granted on any reference to Azure storage accounts are also automatically eliminated.

azure_storage.account_remove(account_name_p text, user_p regrole);

Permissions

Must be a member of azure_storage_admin.

Arguments

account_name_p

text the name of the Azure blob storage account that contains all of your objects: blobs, files, queues, and tables. The storage account provides a unique namespace that is accessible from anywhere in the world over HTTPS.

user_p

regrole the name of a PostgreSQL user or role available on the server.

azure_storage.account_list

Function that lists the names of the storage accounts that were configured via the account_add function, together with the PostgreSQL users or roles that are granted permissions to interact with that storage account through the functions provided by the pg_azure_storage extension.

azure_storage.account_list();

Permissions

Must be a member of azure_storage_admin.

Arguments

This function doesn't take any arguments.

Return type

TABLE(account_name text, allowed_users regrole[]) a two-column table with the list of Azure blob storage accounts added, and the list of PostgreSQL users or roles that are granted access to it.

azure_storage.blob_list

Function that lists the names and other properties (size, lastModified, eTag, contentType, contentEncoding, and contentHash) of blobs stored in the given container of the referred storage account.

azure_storage.blob_list(account_name text, container_name text, prefix text DEFAULT ''::text);

Permissions

User or role invoking this function must be added to the allowed list for the account_name referred, by executing azure_storage.account_user_add. Members of azure_storage_admin are automatically allowed to reference all Azure Storage accounts whose references were added using azure_storage.account_add.

Arguments

account_name

text the name of the Azure blob storage account that contains all of your objects: blobs, files, queues, and tables. The storage account provides a unique namespace that is accessible from anywhere in the world over HTTPS.

container_name

text the name of a container. A container organizes a set of blobs, similar to a directory in a file system. A storage account can include an unlimited number of containers, and a container can store an unlimited number of blobs. A container name must be a valid Domain Name System (DNS) name, as it forms part of the unique URI used to address the container or its blobs. When naming a container, make sure to follow these rules.

The URI for a container is similar to: https://myaccount.blob.core.windows.net/mycontainer

prefix

text when specified, the function returns the blobs whose names begin with the value provided in this parameter. Defaults to an empty string.

Return type

TABLE(path text, bytes bigint, last_modified timestamp with time zone, etag text, content_type text, content_encoding text, content_hash text) a table with one record per blob returned, including the full name of the blob, and some other properties.

path

text the full name of the blob.

bytes

bigint the size of blob in bytes.

last_modified

timestamp with time zonethe date and time the blob was last modified. Any operation that modifies the blob, including an update of the blob's metadata or properties, changes the last-modified time of the blob.

etag

text the ETag property is used for optimistic concurrency during updates. It isn't a timestamp as there's another property called Timestamp that stores the last time a record was updated. For example, if you load an entity and want to update it, the ETag must match what is currently stored. Setting the appropriate ETag is important because if you have multiple users editing the same item, you don't want them overwriting each other's changes.

content_type

text the content type specified for the blob. The default content type is application/octet-stream.

content_encoding

text the Content-Encoding property of a blob that Azure Storage allows you to define. For compressed content, you could set the property to be Gzip. When the browser accesses the content, it automatically decompresses the content.

content_hash

text the hash used to verify the integrity of the blob during transport. When this header is specified, the storage service checks the provided hash with one computed from content. If the two hashes don't match, the operation fails with error code 400 (Bad Request).

azure_storage.blob_get

Function that allows importing data. It downloads one or more files from a blob container in an Azure Storage account. Then it translates the contents into rows, which can be consumed and processed with SQL language constructs. This function adds support to filter and manipulate the data fetched from the blob container before importing it.

Note

Before trying to access the container for the referred storage account, this function checks if the names of the storage account and container passed as arguments are valid according to the naming validation rules imposed on Azure storage accounts. If either of them is invalid, an error is raised.

azure_storage.blob_get(account_name text, container_name text, path text, decoder text DEFAULT 'auto'::text, compression text DEFAULT 'auto'::text, options jsonb DEFAULT NULL::jsonb);

There's an overloaded version of this function, which accepts a rec parameter that allows you to conveniently define the output format record.

azure_storage.blob_get(account_name text, container_name text, path text, rec anyelement, decoder text DEFAULT 'auto'::text, compression text DEFAULT 'auto'::text, options jsonb DEFAULT NULL::jsonb);

Permissions

User or role invoking this function must be added to the allowed list for the account_name referred, by executing azure_storage.account_user_add. Members of azure_storage_admin are automatically allowed to reference all Azure Storage accounts whose references were added using azure_storage.account_add.

Arguments

account_name

text the name of the Azure blob storage account that contains all of your objects: blobs, files, queues, and tables. The storage account provides a unique namespace that is accessible from anywhere in the world over HTTPS.

container_name

text the name of a container. A container organizes a set of blobs, similar to a directory in a file system. A storage account can include an unlimited number of containers, and a container can store an unlimited number of blobs. A container name must be a valid Domain Name System (DNS) name, as it forms part of the unique URI used to address the container or its blobs. When naming a container, make sure to follow these rules.

The URI for a container is similar to: https://myaccount.blob.core.windows.net/mycontainer

path

text the full name of the blob.

rec

anyelement the definition of the record output structure.

decoder

text the specification of the blob format. Can be set to any of the following values:

Format Default Description
auto true Infers the value based on the last series of characters assigned to the name of the blob. If the blob name ends with .csv or .csv.gz, it assumes csv. If ends with .tsv or .tsv.gz, it assumes it's tsv. If ends with .json, .json.gz, .xml, .xml.gz, .txt, or .txt.gz, it assumes it's text.
csv Comma-separated values format used by PostgreSQL COPY.
tsv Tab-separated values, the default PostgreSQL COPY format.
binary Binary PostgreSQL COPY format.
text | xml | json A file containing a single text value.

compression

text the specification of compression type. Can be set to any of the following values:

Format Default Description
auto true Infers the value based on the last series of characters assigned to the name of the blob. If the blob name ends with .gz, it assumes it's gzip. Otherwise, it assumes it's none.
gzip Forces using gzip decoder to decompress the blob.
none Forces to treat the blob as one which doesn't require decompression.

The extension doesn't support any other compression types.

options

jsonb the settings that define handling of custom headers, custom separators, escape characters, etc. options affects the behavior of this function in a way similar to how the options you can pass to the COPY command in PostgreSQL affect its behavior.

Return type

SETOF record SETOF anyelement

azure_storage.blob_put

Function that allows exporting data, by uploading files to a blob container in an Azure Storage account. The content of the files is produced from rows in PostgreSQL.

Note

Before trying to access the container for the referred storage account, this function checks if the names of the storage account and container passed as arguments are valid according to the naming validation rules imposed on Azure storage accounts. If either of them is invalid, an error is raised.

azure_storage.blob_put(account_name text, container_name text, path text, tuple record)
RETURNS VOID;

There's an overloaded version of function, containing encoder parameter that allows you to specify the encoder to use when it can't be inferred from the extension of the path parameter, or when you want to override the one inferred.

azure_storage.blob_put(account_name text, container_name text, path text, tuple record, encoder text)
RETURNS VOID;

There's an overloaded version of function that also contains a compression parameter that allows you to specify the compression to use when it can't be inferred from the extension of the path parameter, or when you want to override the one inferred.

azure_storage.blob_put(account_name text, container_name text, path text, tuple record, encoder text, compression text)
RETURNS VOID;

There's an overloaded version of function that also contains an options parameter for handling custom headers, custom separators, escape characters etc. options works in similar fashion to the options that can be passed to the COPY command in PostgreSQL.

azure_storage.blob_put(account_name text, container_name text, path text, tuple record, encoder text, compression text, options jsonb)
RETURNS VOID;

Permissions

User or role invoking this function must be added to the allowed list for the account_name referred, by executing azure_storage.account_user_add. Members of azure_storage_admin are automatically allowed to reference all Azure Storage accounts whose references were added using azure_storage.account_add.

Arguments

account_name

text the name of the Azure blob storage account that contains all of your objects: blobs, files, queues, and tables. The storage account provides a unique namespace that is accessible from anywhere in the world over HTTPS.

container_name

text the name of a container. A container organizes a set of blobs, similar to a directory in a file system. A storage account can include an unlimited number of containers, and a container can store an unlimited number of blobs. A container name must be a valid Domain Name System (DNS) name, as it forms part of the unique URI used to address the container or its blobs. When naming a container, make sure to follow these rules.

The URI for a container is similar to: https://myaccount.blob.core.windows.net/mycontainer

path

text the full name of the blob.

tuple

record the definition of the record output structure.

encoder

text the specification of the blob format. Can be set to any of the following values:

Format Default Description
auto true Infers the value based on the last series of characters assigned to the name of the blob. If the blob name ends with .csv or .csv.gz, it assumes it's csv. If ends with .tsv or .tsv.gz, it assumes it's tsv. If ends with .json, .json.gz, .xml, .xml.gz, .txt, or .txt.gz, it assumes it's text.
csv Comma-separated values format used by PostgreSQL COPY.
tsv Tab-separated values, the default PostgreSQL COPY format.
binary Binary PostgreSQL COPY format.
text | xml | json A file containing a single text value.

compression

text the specification of compression type. Can be set to any of the following values:

Format Default Description
auto true Infers the value based on the last series of characters assigned to the name of the blob. If the blob name ends with .gz, it assumes it's gzip. Otherwise, it assumes it's none.
gzip Forces using gzip decoder to decompress the blob.
none Forces to treat the blob as one which doesn't require decompression.

The extension doesn't support any other compression types.

options

jsonb the settings that define handling of custom headers, custom separators, escape characters, etc. options affects the behavior of this function in a way similar to how the options you can pass to the COPY command in PostgreSQL affect its behavior.

Return type

SETOF record SETOF anyelement

azure_storage.options_csv_get

Function that acts as a utility function, which can be called as a parameter within blob_get, and is useful for decoding the content of a csv file.

azure_storage.options_csv_get(delimiter text DEFAULT NULL::text, null_string text DEFAULT NULL::text, header boolean DEFAULT NULL::boolean, quote text DEFAULT NULL::text, escape text DEFAULT NULL::text, force_not_null text[] DEFAULT NULL::text[], force_null text[] DEFAULT NULL::text[], content_encoding text DEFAULT NULL::text);

Arguments

delimiter

text the character that separates columns within each row (line) of the file. It must be a single 1-byte character. Although this function supports delimiters of any number of characters, if you try to use more than a single 1-byte character, PostgreSQL reports back a COPY delimiter must be a single one-byte character error.

null_string

text the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings.

boolean flag that indicates if the file contains a header line with the names of each column in the file. On output, the initial line contains the column names from the table.

quote

text the quoting character to be used when a data value is quoted. The default is double-quote. It must be a single 1-byte character. Although this function supports delimiters of any number of characters, if you try to use more than a single 1-byte character, PostgreSQL reports back a COPY quote must be a single one-byte character error.

escape

text the character that should appear before a data character that matches the QUOTE value. The default is the same as the QUOTE value (so that the quoting character is doubled if it appears in the data). It must be a single 1-byte character. Although this function supports delimiters of any number of characters, if you try to use more than a single 1-byte character, PostgreSQL reports back a COPY escape must be a single one-byte character error.

force_not_null

text[] don't match the specified columns' values against the null string. In the default case where the null string is empty, it means that empty values are read as zero-length strings rather than nulls, even when they aren't quoted.

force_null

text[] match the specified columns' values against the null string, even if quoted, and if a match is found, set the value to NULL. In the default case where the null string is empty, it converts a quoted empty string into NULL.

content_encoding

text name of the encoding with which the file is encoded. If the option is omitted, the current client encoding is used.

Return type

jsonb

azure_storage.options_copy

Function that acts as a utility function, which can be called as a parameter within blob_get. It acts as a helper function for options_csv_get, options_tsv, and options_binary.

azure_storage.options_copy(delimiter text DEFAULT NULL::text, null_string text DEFAULT NULL::text, header boolean DEFAULT NULL::boolean, quote text DEFAULT NULL::text, escape text DEFAULT NULL::text, force_quote text[] DEFAULT NULL::text[], force_not_null text[] DEFAULT NULL::text[], force_null text[] DEFAULT NULL::text[], content_encoding text DEFAULT NULL::text);

Arguments

delimiter

text the character that separates columns within each row (line) of the file. It must be a single 1-byte character. Although this function supports delimiters of any number of characters, if you try to use more than a single 1-byte character, PostgreSQL reports back a COPY delimiter must be a single one-byte character error.

null_string

text the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings.

header

boolean flag that indicates if the file contains a header line with the names of each column in the file. On output, the initial line contains the column names from the table.

quote

text the quoting character to be used when a data value is quoted. The default is double-quote. It must be a single 1-byte character. Although this function supports delimiters of any number of characters, if you try to use more than a single 1-byte character, PostgreSQL reports back a COPY quote must be a single one-byte character error.

escape

text the character that should appear before a data character that matches the QUOTE value. The default is the same as the QUOTE value (so that the quoting character is doubled if it appears in the data). It must be a single 1-byte character. Although this function supports delimiters of any number of characters, if you try to use more than a single 1-byte character, PostgreSQL reports back a COPY escape must be a single one-byte character error.

force_quote

text[] forces quoting to be used for all non-NULL values in each specified column. NULL output is never quoted. If * is specified, non-NULL values are quoted in all columns.

force_not_null

text[] don't match the specified columns' values against the null string. In the default case where the null string is empty, it means that empty values are read as zero-length strings rather than nulls, even when they aren't quoted.

force_null

text[] match the specified columns' values against the null string, even if quoted, and if a match is found, set the value to NULL. In the default case where the null string is empty, it converts a quoted empty string into NULL.

content_encoding

text name of the encoding with which the file is encoded. If the option is omitted, the current client encoding is used.

Return type

jsonb

azure_storage.options_tsv

Function that acts as a utility function, which can be called as a parameter within blob_get, and is useful for decoding the content of a tsv file.

azure_storage.options_tsv(delimiter text DEFAULT NULL::text, null_string text DEFAULT NULL::text, content_encoding text DEFAULT NULL::text);

Arguments

delimiter

text the character that separates columns within each row (line) of the file. It must be a single 1-byte character. Although this function supports delimiters of any number of characters, if you try to use more than a single 1-byte character, PostgreSQL reports back a COPY delimiter must be a single one-byte character error.

null_string

text the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings.

content_encoding

text name of the encoding with which the file is encoded. If the option is omitted, the current client encoding is used.

Return type

jsonb

azure_storage.options_binary

Function that acts as a utility function, which can be called as a parameter within blob_get, and is useful for decoding the content of a binary file.

azure_storage.options_binary(content_encoding text DEFAULT NULL::text);

Arguments

content_encoding

text name of the encoding with which the file is encoded. If the option is omitted, the current client encoding is used.

Return type

jsonb

Examples

You must meet the following prerequisites before you can run the following examples:

  1. Create an Azure Storage account. To create an Azure Storage account, if you don't have one already, customize the values of <resource_group>, <location>, <storage_account>, and <blob_container>, and run the following Azure CLI command:
    resource_group=<resource_group>
    location=<location>
    storage_account=<storage_account>
    blob_container=<blob_container>
    az group create --name $resource_group --location $location
    az storage account create --resource-group $resource_group --name $storage_account --location $location --sku Standard_LRS --kind BlobStorage --public-network-access enabled --access-tier hot
    
  2. Create a blob container. To create the blob container, run the following Azure CLI:
    az storage container create --account-name $storage_account --name $blob_container -o tsv
    
  3. Fetch one of the two access keys assigned to the storage account. Make sure you copy the value of your access_key as you need to pass it as an argument to account_add in a subsequent step. To fetch the first of the two access keys, run the following Azure CLI command:
    access_key=$(az storage account keys list --resource-group $resource_group --account-name $storage_account --query [0].value)
    echo "Following is the value of your access key:"
    echo $access_key
    
  4. Download the file with the data set that is used during the examples, and upload it to your blob container. To download the file with the data set, run the following Azure CLI command:
    mkdir --parents azure_storage_examples
    cd azure_storage_examples
    curl -O https://examples.citusdata.com/tutorial/events.csv
    gzip -k events.csv
    cp events.csv events_blob_without_extension
    cp events.csv events_pipe.csv
    cp events.csv.gz events_compressed
    sed -i 's/,/|/g' events_pipe.csv
    az storage blob upload-batch --account-name $storage_account --destination $blob_container --source . --pattern "events*" --account-key $access_key --overwrite --output none --only-show-errors
    

Note

You can list containers or the blobs stored in them for a specific storage account, but only if your PostgreSQL user or role is granted permission on on the reference to that storage account by using account_user_add. Members of the azure_storage_admin role are granted this privilege over all Azure Storage accounts that have been added using azure_storage.account_add. By default, only members of azure_pg_admin are granted the azure_storage_admin role.

Create table in which data is loaded

Let's create the table into which we import the contents of the CSV file that we uploaded to the storage account. To do so, connect to your instance of Azure Database for PostgreSQL flexible server using PgAdmin, psql, or the client of your preference, and execute the following statement:

CREATE TABLE IF NOT EXISTS events
        (
         event_id bigint
        ,event_type text
        ,event_public boolean
        ,repo_id bigint
        ,payload jsonb
        ,repo jsonb
        ,user_id bigint
        ,org jsonb
        ,created_at timestamp without time zone
        );

Add access key of storage account

This example illustrates how to add a reference to a storage account, together with the access key of that storage account which is required to access its content via the functionality provided by the pg_azure_storage extension in your instance of Azure Database for PostgreSQL flexible server.

<storage_account> must be set to the name of your storage account. If you used the previous scripts, this value should match whatever value you set to the storage_account environment variable in those scripts.

Similarly, <access_key> must be set to the value you fetched from your storage account.

SELECT azure_storage.account_add('<storage_account>', '<access_key>');

Tip

If you want to retrieve the storage account name and one of its access keys from the Azure portal, search for your storage account, in the resource menu select Access keys, copy the Storage account name and copy the Key from key1 section (you have to select Show next to the key first).

Remove reference to storage account

This example illustrates how to remove any reference to a storage account, so that no user in the current database can use the pg_azure_storage extension functionality to access that storage account.

<storage_account> must be set to the name of your storage account. If you used the previous scripts, this value should match whatever value you set to the storage_account environment variable in those scripts.

SELECT azure_storage.account_remove('<storage_account>');

Grant access to a user or role on the Azure Blob storage reference

This example illustrates how to grant access to a user or role named <regular_user>, so that such PostgreSQL user can use the pg_azure_storage extension to access the blobs stored in containers hosted by the referred Azure storage account.

<storage_account> must be set to the name of your storage account. If you used the previous scripts, this value should match whatever value you set to the storage_account environment variable in those scripts.

<regular_user> must be set to the name of an existing user or role.

SELECT * FROM azure_storage.account_user_add('<storage_account>', '<regular_user>');

List all the references to Azure storage accounts

This example illustrates how to find out which Azure storage accounts the pg_azure_storage extension can reference in this database, together with the type of authentication that is used to access each storage account, and which users or roles are granted permission, via the account_user_add function, to access that Azure storage account through the functionality provided by the extension.

SELECT * FROM azure_storage.account_list();

Revoke access from a user or role on the Azure Blob storage reference

This example illustrates how to revoke access from a user or role named <regular_user>, so that such PostgreSQL user can't use the pg_azure_storage extension to access the blobs stored in containers hosted by the referred Azure storage account.

<storage_account> must be set to the name of your storage account. If you used the previous scripts, this value should match whatever value you set to the storage_account environment variable in those scripts.

<regular_user> must be set to the name of an existing user or role.

SELECT * FROM azure_storage.account_user_remove('<storage_account>', '<regular_user>');

List all blobs in a container

This example illustrates how to list all existing blobs inside container <container_name> of storage account <storage_account>.

<storage_account> must be set to the name of your storage account. If you used the previous scripts, this value should match whatever value you set to the storage_account environment variable in those scripts.

<blob_container> must be set to the name of your blob container. If you used the previous scripts, this value should match whatever value you set to the blob_container environment variable in those scripts.

SELECT * FROM azure_storage.blob_list('<storage_account>','<blob_container>');

List the objects with specific blob name prefix

This example illustrates how to list all existing blobs inside container <blob_container> of storage account <storage_account>, whose blob name begins with <blob_name_prefix>.

<storage_account> must be set to the name of your storage account. If you used the previous scripts, this value should match whatever value you set to the storage_account environment variable in those scripts.

<blob_container> must be set to the name of your blob container. If you used the previous scripts, this value should match whatever value you set to the blob_container environment variable in those scripts.

<blob_name_prefix> should be set to the whatever prefix you want the blobs enumerated to include in their names. If you want to return all blobs, you can set this parameter to an empty string or don't even specify a value for this parameter, in which case the value defaults to an empty string.

SELECT * FROM azure_storage.blob_list('<storage_account>','<blob_container>','<blob_name_prefix>');

Alternatively, you can use the following syntax:

SELECT * FROM azure_storage.blob_list('<storage_account>','<blob_container>') WHERE path LIKE '<blob_name_prefix>%';

Read content from a blob in a container

The blob_get function retrieves the contents of one specific blob (events.csv in this case), in the referred container <blob_container> of the <storage_account> storage. In order for blob_get to know how to parse the data you can pass a value in the form NULL::table_name, where table_name refers to a table whose schema matches that of the blob being read. In the example, it refers to the events table we created at the very beginning.

<storage_account> must be set to the name of your storage account. If you used the previous scripts, this value should match whatever value you set to the storage_account environment variable in those scripts.

<blob_container> must be set to the name of your blob container. If you used the previous scripts, this value should match whatever value you set to the blob_container environment variable in those scripts.

<blob_name_prefix> should be set to the whatever prefix you want the blobs enumerated to include in their names. If you want to return all blobs, you can set this parameter to an empty string or don't even specify a value for this parameter, in which case the value defaults to an empty string.

SELECT * FROM azure_storage.blob_get
        ('<storage_account>'
        ,'<blob_container>'
        ,'events.csv'
        , NULL::events)
LIMIT 5;

Alternatively, you can explicitly define the schema of the result using the AS clause after the blob_get function.

SELECT * FROM azure_storage.blob_get('<storage_account>','<blob_container>','events.csv.gz')
AS res (
         event_id BIGINT
        ,event_type TEXT
        ,event_public BOOLEAN
        ,repo_id BIGINT
        ,payload JSONB
        ,repo JSONB
        ,user_id BIGINT
        ,org JSONB
        ,created_at TIMESTAMP WITHOUT TIME ZONE)
LIMIT 5;

Use the decoder option

This example illustrates the use of the decoder option. Normally format is inferred from the extension of the file, but when the file content doesn't have a matching extension you can pass the decoder argument.

<storage_account> must be set to the name of your storage account. If you used the previous scripts, this value should match whatever value you set to the storage_account environment variable in those scripts.

<blob_container> must be set to the name of your blob container. If you used the previous scripts, this value should match whatever value you set to the blob_container environment variable in those scripts.

SELECT * FROM azure_storage.blob_get
        ('<storage_account>'
        ,'<blob_container>'
        ,'events_blob_without_extension'
        , NULL::events
        , decoder := 'csv')
LIMIT 5;

Use compression with decoder option

This example shows how to enforce using the gzip compression on a gzip compressed blob whose name doesn't end with a .gz extension.

<storage_account> must be set to the name of your storage account. If you used the previous scripts, this value should match whatever value you set to the storage_account environment variable in those scripts.

<blob_container> must be set to the name of your blob container. If you used the previous scripts, this value should match whatever value you set to the blob_container environment variable in those scripts.

SELECT * FROM azure_storage.blob_get
        ('<storage_account>'
        ,'<blob_container>'
        ,'events_compressed'
        , NULL::events
        , decoder := 'csv'
        , compression := 'gzip')
LIMIT 5;

Import filtered content and modify before loading from csv format object

This example illustrates the possibility to filter and modify the content imported from the blob, before loading that into a SQL table.

<storage_account> must be set to the name of your storage account. If you used the previous scripts, this value should match whatever value you set to the storage_account environment variable in those scripts.

<blob_container> must be set to the name of your blob container. If you used the previous scripts, this value should match whatever value you set to the blob_container environment variable in those scripts.

SELECT concat('P-',event_id::text) FROM azure_storage.blob_get
        ('<storage_account>'
        ,'<blob_container>'
        ,'events.csv'
        , NULL::events)
WHERE event_type='PushEvent'
LIMIT 5;

Query content from file with headers, custom separators, escape characters

This example illustrates how you can use custom separators and escape characters, by passing the result of options_copy to the options argument.

<storage_account> must be set to the name of your storage account. If you used the previous scripts, this value should match whatever value you set to the storage_account environment variable in those scripts.

<blob_container> must be set to the name of your blob container. If you used the previous scripts, this value should match whatever value you set to the blob_container environment variable in those scripts.

SELECT * FROM azure_storage.blob_get
        ('<storage_account>'
        ,'<blob_container>'
        ,'events_pipe.csv'
        ,NULL::events
        ,options := azure_storage.options_csv_get(delimiter := '|' , header := 'true')
        );

Aggregation query over the contents of a blob

This example illustrates how you can do aggregation operations over information that is stored in a blob container, without the need to import the contents of the blob into PostgreSQL tables.

<storage_account> must be set to the name of your storage account. If you used the previous scripts, this value should match whatever value you set to the storage_account environment variable in those scripts.

<blob_container> must be set to the name of your blob container. If you used the previous scripts, this value should match whatever value you set to the blob_container environment variable in those scripts.

SELECT event_type, COUNT(*) FROM azure_storage.blob_get
        ('<storage_account>'
        ,'<blob_container>'
        ,'events.csv'
        , NULL::events)
GROUP BY event_type
ORDER BY 2 DESC
LIMIT 5;

Import data using a COPY statement

The following example shows the import of data from a blob called events.csv that resides in the blob container <blob_container> in the Azure Storage account <storage_account>, via the COPY command:

  1. Create a table that matches the schema of the source file:

    CREATE TABLE IF NOT EXISTS events
            (
             event_id bigint
            ,event_type text
            ,event_public boolean
            ,repo_id bigint
            ,payload jsonb
            ,repo jsonb
            ,user_id bigint
            ,org jsonb
            ,created_at timestamp without time zone
            );
    
  2. Use a COPY statement to copy data into the target table. Specify that the first row contains column headers.

    COPY events
    FROM 'https://<storage_account>.blob.core.windows.net/<blob_container>/events.csv'
    WITH (FORMAT 'csv', header);
    

Write content to a blob in a container

The blob_put function composes the contents of one specific blob (eventscopy.csv in this case), and uploads it to the referred container <blob_container> of the <storage_account> storage. This example uses blob_get to construct a set of five rows, which are then passed to the blob_put aggregate function which uploads them as a blob named eventscopy.csv.

<storage_account> must be set to the name of your storage account. If you used the previous scripts, this value should match whatever value you set to the storage_account environment variable in those scripts.

<blob_container> must be set to the name of your blob container. If you used the previous scripts, this value should match whatever value you set to the blob_container environment variable in those scripts.

SELECT azure_storage.blob_put
        ('<storage_account>'
        ,'<blob_container>'
        ,'eventscopy.csv'
        , top_5_events)
FROM (SELECT * FROM azure_storage.blob_get
                     ('<storage_account>'
                     ,'<blob_container>'
                     ,'events.csv'
                     , NULL::events) LIMIT 5) AS top_5_events;

Export data using a COPY statement

The following example shows the export of data from a table called events, to a blob called events_exported.csv that resides in the blob container <blob_container> in the Azure Storage account <storage_account>, via the COPY command:

  1. Create a table that matches the schema of the source file:

    CREATE TABLE IF NOT EXISTS events
            (
             event_id bigint
            ,event_type text
            ,event_public boolean
            ,repo_id bigint
            ,payload jsonb
            ,repo jsonb
            ,user_id bigint
            ,org jsonb
            ,created_at timestamp without time zone
            );
    
  2. Load data into the table. Either run INSERT statements to populate it with several synthetic rows, or use the Import data using a COPY statement example to populate it with the contents of the sample data set.

  3. Use a COPY statement to copy data into the target table. Specify that the first row contains column headers.

    COPY events
    TO 'https://<storage_account>.blob.core.windows.net/<blob_container>/events_exported.csv'
    WITH (FORMAT 'csv', header);