Quickstart: Azure Health De-identification client library for .NET
Get started with the Azure Health De-identification client library for .NET to de-identify your health data. Follow these steps to install the package and try out example code for basic tasks.
API reference documentation | Library source code | Package (NuGet) | More Samples on GitHub
Prerequisites
- An Azure account with an active subscription. Create an account for free.
- An Azure Storage Account (only for job workflow).
Setting up
Create a de-identification service (preview)
A de-identification service (preview) provides you with an endpoint URL. This endpoint url can be utilized as a Rest API or with an SDK.
Install Azure CLI
Create a de-identification service resource
REGION="<Region>" RESOURCE_GROUP_NAME="<ResourceGroupName>" DEID_SERVICE_NAME="<NewDeidServiceName>" az resource create -g $RESOURCE_GROUP_NAME -n $DEID_SERVICE_NAME --resource-type microsoft.healthdataaiservices/deidservices --is-full-object -p "{\"identity\":{\"type\":\"SystemAssigned\"},\"properties\":{},\"location\":\"$REGION\"}"
Create an Azure Storage account
Install Azure CLI
Create an Azure Storage Account
STORAGE_ACCOUNT_NAME="<NewStorageAccountName>" az storage account create --name $STORAGE_ACCOUNT_NAME --resource-group $RESOURCE_GROUP_NAME --location $REGION
Authorize de-identification service (preview) on the Azure Storage account
Give the de-identification service (preview) access to your storage account
STORAGE_ACCOUNT_ID=$(az storage account show --name $STORAGE_ACCOUNT_NAME --resource-group $RESOURCE_GROUP_NAME --query id --output tsv) DEID_SERVICE_PRINCIPAL_ID=$(az resource show -n $DEID_SERVICE_NAME -g $RESOURCE_GROUP_NAME --resource-type microsoft.healthdataaiservices/deidservices --query identity.principalId --output tsv) az role assignment create --assignee $DEID_SERVICE_PRINCIPAL_ID --role "Storage Blob Data Contributor" --scope $STORAGE_ACCOUNT_ID
Install the package
The client library is available through NuGet, as the Azure.Health.Deidentification
package.
Install package
dotnet add package Azure.Health.Deidentification
Also, install the Azure Identity package if not already installed.
dotnet add package Azure.Identity
Object model
- DeidentificationClient is responsible for the communication between the SDK and our De-identification Service Endpoint.
- DeidentificationContent is used for string de-identification.
- DeidentificationJob is used to create jobs to de-identify documents in an Azure Storage Account.
- PhiEntity is the span and category of a single PHI entity detected via a Tag OperationType.
Code examples
- Create a de-identification Client
- De-identify a string
- Tag a string
- Create a de-identification Job
- Get the status of a de-identification Job
Create a de-identification client
Before you can create the client, you need to find your de-identification service (preview) endpoint URL.
You can find the endpoint URL with the Azure CLI:
az resource show -n $DEID_SERVICE_NAME -g $RESOURCE_GROUP_NAME --resource-type microsoft.healthdataaiservices/deidservices --query properties.serviceUrl --output tsv
Then you can create the client using that value.
using Azure.Identity;
using Azure.Health.Deidentification;
string serviceEndpoint = "https://example123.api.deid.azure.com";
DeidentificationClient client = new(
new Uri(serviceEndpoint),
new DefaultAzureCredential()
);
De-identify a string
This function allows you to de-identify any string you have in memory.
DeidentificationContent content = new("SSN: 123-04-5678");
DeidentificationResult result = await client.DeidentifyAsync(content);
Tag a string
Tagging can be done the same way and de-identifying by changing the OperationType
.
DeidentificationContent content = new("SSN: 123-04-5678");
content.Operation = OperationType.Tag;
DeidentificationResult result = await client.DeidentifyAsync(content);
Create a de-identification job
This function allows you to de-identify all files, filtered via prefix, within an Azure Blob Storage Account.
To create the job, we need the URL to the blob endpoint of the Azure Storage Account.
az resource show -n $STORAGE_ACCOUNT_NAME -g $RESOURCE_GROUP_NAME --resource-type Microsoft.Storage/storageAccounts --query properties.primaryEndpoints.blob --output tsv
Now we can create the job. This example uses folder1/
as the prefix. The job will de-identify any document that matches this prefix and write the de-identified version with the output_files/
prefix.
using Azure;
Uri storageAccountUri = new("");
DeidentificationJob job = new(
new SourceStorageLocation(new Uri(storageAccountUrl), "folder1/"),
new TargetStorageLocation(new Uri(storageAccountUrl), "output_files/")
);
job = client.CreateJob(WaitUntil.Started, "my-job-1", job).Value;
Get the status of a de-identification job
Once a job is created, you can view the status and other details of the job.
DeidentificationJob job = client.GetJob("my-job-1").Value;
Run the code
Once your code is updated in your project, you can run it using:
dotnet run
Clean up resources
Delete de-identification service
az resource delete -n $DEID_SERVICE_NAME -g $RESOURCE_GROUP_NAME --resource-type microsoft.healthdataaiservices/deidservices
Delete Azure Storage account
az resource show -n $STORAGE_ACCOUNT_NAME -g $RESOURCE_GROUP_NAME --resource-type Microsoft.Storage/storageAccounts
Delete role assignment
az role assignment delete --assignee $DEID_SERVICE_PRINCIPAL_ID --role "Storage Blob Data Contributor" --scope $STORAGE_ACCOUNT_ID
Troubleshooting
Unable to access source or target storage
Ensure the permissions are given, and the Managed Identity for the de-identification service (preview) is set up properly.
See Authorize de-identification service (preview) on the Azure Storage account
Job failed with status PartialFailed
You can utilize the GetJobDocuments
function on the DeidentificationClient
to view per file error messages.
See Sample
Next steps
In this quickstart, you learned:
- How to create a de-identification service (preview) and assign a role on a storage account.
- How to create a de-identification client
- How to de-identify strings and create jobs on documents within a storage account.