Blobfuse is an open source project developed to provide a virtual filesystem backed by the Azure Blob storage.
Blobfuse uses the libfuse open source library to communicate with the Linux FUSE kernel module, and implements the filesystem operations using the Azure Storage Blob REST APIs.
Features
- Mount a Blob storage container on Linux
- Basic file system operations such as mkdir, opendir, readdir, rmdir, open, read, create, write, close, unlink, truncate, stat, rename
- Local cache to improve subsequent access times
- Parallel download and upload features for fast access to large blobs
- Allows multiple nodes to mount the same container for read-only scenarios.
Installation
You can install blobfuse from the Linux Software Repository for Microsoft products. The process is explained in the blobfuse installation page. Alternatively, you can clone this repository, install the dependencies (fuse, libcurl, gcrypt and GnuTLS) and build from source code. See details in the wiki and the GitHub Repo.
Blobfuse and Data Science Virtual Machine
Blobfuse is already installed on the Ubuntu DSVM. To use it, create a configuration file /opt/blobfuse.cfg as described https://docs.microsoft.com/en-us/azure/storage/blobs/storage-how-to-mount-container-linux
or https://github.com/Azure/azure-storage-fuse/tree/43e82df5d85a4c082dc67af8131bcf05f4d9270a
Usage
Mounting
Once you have installed blobfuse, configure your account credentials either in the template provided in blobfuse folder (connection.cfg), or in the environment variables. For brevity, let's use the environment variables:
export AZURE_STORAGE_ACCOUNT=myaccountname
export AZURE_STORAGE_ACCESS_KEY=myaccountkey
Then mount your blob storage on the VM:
Use of a high performance disk, or ramdisk for the local cache is recommended. In Azure VMs, this is the ephemeral disk which is mounted on /mnt in Ubuntu, and /mnt/resource in RHEL. Please make sure that your user has write access to this location. If not, create and chown
to your user.
sudo mkdir /images
sudo mkdir /mnt/blobfusecache
chown -R <your-user-account> /images
chown -R <your-user-account> /mnt/blobfusecache/
Create your mountpoint (mkdir /path/to/mount
) and mount a Blob container (must already exist) with blobfuse:
blobfuse /images --tmp-path=/mnt/blobfusecache -o big_writes -o max_read=131072 -o max_write=131072 -o attr_timeout=240 -o fsname=blobfuse -o entry_timeout=240 -o negative_timeout=120 --config-file=/opt/blobfuse.cfg
NOTE Use absolute paths for directory paths in the command. Relative, and shortcut paths (~/) do not work. Blobfuse does not support multiple writers to a single blob, so you will need to guarantee that the file names generated during the extraction part are unique.
For more information, see the wiki
Interested in Data Engineering
Check out the Data Engineering learning resources at Microsoft learn