Use unstructured clinical notes enrichment (preview) in healthcare data solutions

[This article is prerelease documentation and is subject to change.]

Note

This content is currently being updated.

Unstructured clinical notes enrichment (preview) uses Azure AI Language's Text Analytics for health service to extract key Fast Healthcare Interoperability Resources (FHIR) entities from unstructured clinical notes. It creates structured data from these clinical notes. You can then analyze this structured data to derive insights, predictions, and quality measures aimed at enhancing patient health outcomes.

To learn more about the capability and understand how to deploy and configure it, see:

Unstructured clinical notes enrichment (preview) has a direct dependency on the healthcare data foundations capability. Ensure you successfully set up and execute the healthcare data foundations pipelines first.

Prerequisites

NLP ingestion service

The healthcare#_msft_ta4h_silver_ingestion notebook executes the NLPIngestionService module in the healthcare data solutions library to invoke the Text Analytics for health service. This service extracts unstructured clinical notes from the FHIR resource DocumentReference.Content to create a flattened output. To learn more, see Review the notebook configuration.

Data storage in silver layer

After the natural language processing (NLP) API analysis, the structured and flattened output is stored in the following native tables within the healthcare#_msft_silver lakehouse:

  • nlpentity: Contains the flattened entities extracted from the unstructured clinical notes. Each row is a single term extracted from the unstructured text after performing the text analysis.
  • nlprelationship: Provides the relationship between the extracted entities.
  • nlpfhir: Contains the FHIR output bundle as a JSON string.

To track the last updated timestamp, the NLPIngestionService uses the parent_meta_lastUpdated field in all the three silver lakehouse tables. This tracking ensures that the source document DocumentReference, which is the parent resource, is first stored to maintain referential integrity. This process helps prevent inconsistencies in the data and orphaned resources.

Important

Currently, Text Analytics for health returns vocabularies listed in the UMLS Metathesaurus Vocabulary Documentation. For guidance on these vocabularies, see Import data from UMLS.

For the preview release, we use the SNOMED-CT (Systematized Nomenclature of Medicine - Clinical Terms), LOINC (Logical Observation Identifiers, Names, and Codes), and RxNorm terminologies that are included with the OMOP sample dataset based on guidance from Observational Health Data Sciences and Informatics (OHDSI).

OMOP transformation

Healthcare data solutions in Microsoft Fabric also provide another capability for Observational Medical Outcomes Partnership (OMOP) transformations. When you execute this capability, the underlying transformation from the silver lakehouse to the OMOP gold lakehouse also transforms the structured and flattened output of the unstructured clinical notes analysis. The transformation reads from the nlpentity table in the silver lakehouse and maps the output to the NOTE_NLP table in the OMOP gold lakehouse.

For more information, see Overview of OMOP transformations.

Here's the schema for the structured NLP outputs, with the corresponding NOTE_NLP column mapping to the OMOP common data model:

Flattened document reference Description Note_NLP mapping Sample data
id Unique identifier for the entity. Composite key of parent_id, offset, and length. note_nlp_id 1380
parent_id A foreign key to the flattened documentreferencecontent text the term was extracted from. note_id 625
text Entity text as appears in the document. lexical_variant No Known Allergies
Offset Character offset of the extracted term in the input documentreferencecontent text. offset 294
data_source_entity_id ID of the entity in the given source catalog. note_nlp_concept_id and note_nlp_source_concept_id 37396387
nlp_last_executed The date of the documentreferencecontent text analysis processing. nlp_date_time and nlp_date 2023-05-17T00:00:00.0000000
model Name and version of the NLP system (Name of the Text Analytics for health NLP system and the version). nlp_system MSFT TA4H

An image displaying the schema with sample NLP data.

Service limits for Text Analytics for health

  • Maximum number of characters per document is limited to 125,000.
  • Maximum size of documents contained in the entire request is limited to 1 MB.
  • Maximum number of documents per request is limited to:
    • 25 for the web-based API.
    • 1000 for the container.

Enable logs

Follow these steps to enable request and response logging for the Text Analytics for health API:

  • Enable the diagnostic settings for your Azure Language service resource using the instructions in Enable diagnostic logging for Azure AI services. This resource is the same language service that you created during the Set up Azure Language service deployment step.

    • Enter a diagnostic setting name.
    • Set the category to Request and Response Logs.
    • For destination details, select Send to Log Analytics workspace, and select the required Log Analytics workspace. If you don't have a workspace, follow the prompts to create one.
    • Save the settings.

    A screenshot displaying the language service diagnostic settings.

  • Go to the NLP Config section in the NLP ingestion service notebook. Update the value of the configuration parameter enable_text_analytics_logs to True. For more information about this notebook, see Review the notebook configuration.

View logs in Azure Log Analytics

To explore the log analytics data:

  • Navigate to the Log Analytics workspace.
  • Locate and select Logs. From this page, you can run queries against your logs.

Sample query

Following is a basic Kusto query that you can use to explore your log data. This sample query retrieves all the failed requests from the Azure Cognitive Services resource provider in the past day, grouped by error type:

AzureDiagnostics
| where TimeGenerated > ago(1d)
| where Category == "RequestResponse"
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where tostring(ResultSignature) startswith "4" or tostring(ResultSignature) startswith "5"
| summarize NumberOfFailedRequests = count() by ResultSignature