@Mike Shapleski I see two possible solutions for your scenario.
- Extracting text from your documents using the computer vision API and passing the required text as input to Azure Text Analytics for Health API
- Using Azure cognitive search to upload the documents and creating a search service and enabling specific skills on the service to extract PII data or entities
The first solution can help you achieve this and ensure everything is offline or using docker containers without uploading any of your data to any storage externally. For billing purposes the containers need to connect to a metering endpoint on Azure to bill your usage of both these services(Computer Vision API & Azure text analytics containers). Also, you can use C# client library to call the local endpoint of these containers. The setup could take time to configure docker containers and passing the PDF documents to the computer vision read API to extract text. The extracted text can then be directly used or stored, to call the text analytics for health API.
The second solution can be used to index all the documents by using the search service by having your data in the cloud or behind a firewall to index the documents and make them searchable. There are some skills that can be enabled on the search service to extract entities and other PII information but this may not extract the same data as text analytics for health. This solution can be faster to setup because you can directly query your data after uploading the documents.
If an answer is helpful, please click on or upvote which might help other community members reading this thread.