Data concern for usage of Open-Source models in Azure Marketplace

Question

Our team is currently working on a product that leverages Azure OpenAI GPT models. We are eager to perform experiments on different open-source models such as Code-Llama and Mistral available in the Azure AI Studio. However, we've come across a notice and have some concerns related to the usage of data.
Given that we are working with production data, we wanted to understand better how the data is being utilized by these models and what specific data-related concerns we should be aware of while using them.
Additionally, I am curious to know if it is safe to use the Mistral and Code-Llama models provided in the Azure Marketplace. Or if there are any open source models safe for production data?
User's image

Accepted Answer

Hi Mansi Vaishnav,

Thanks for reaching out to Microsoft Q&A.

In the image you shared, there is a notice stating that the Microsoft Purchase Policy applies to the use of the opensource model, specifically mentioning that only dummy or artificial datasets should be used in non prod systems for testing or evaluating the model. Additionally, no live or prod data is allowed unless you are enrolled and compliant as a supplier with all applicable microsoft policies.

Based on this notice, here are the key points to consider:

Data Usage Restrictions: The models, such as Mistral and possibly others like Code-Llama, are designed for experimentation with non-production (dummy or artificial) data unless you are specifically compliant with Microsoft's supplier terms. This means that you cannot use production data with these models without proper enrollment and adherence to Microsoft's policies.

Open-Source Models and Production Data: In terms of using these models safely with production data, the notice is clear that these open-source models provided in Azure Marketplace are not safe for production data unless you have specific permissions and compliance in place. This restriction is due to potential risks related to data processing, as the models are intended for experimentation rather than direct production usage without the necessary agreements.

Recommendation: If you're experimenting with models like Mistral or Code-Llama, you should stick to testing with dummy or synthetic data. If your goal is to use them in a production environment, you will need to ensure compliance with Microsoft's supplier policies. Alternatively, for production use cases, consider leveraging fully supported Azure services like Azure OpenAI, which have clearer policies on production data handling.

For further guidance on using any open-source models in production, you might need to explore legal and compliance aspects or look for models that specifically mention support for production environments in the Azure Marketplace or AI Studio.

General Best Practices for AI Model Usage with Production Data:

Data Encryption: Ensure that all production data is encrypted both at rest and in transit, whether it’s being processed by an AI model or stored in cloud infrastructure.
Access Control: Implement strong access controls to ensure that only authorized personnel and systems can interact with sensitive data.
Data Auditing: Enable detailed logging and auditing to track how data is processed, by whom, and when.
Anonymization: Whenever possible, anonymize production data before passing it to any experimental or open-source models.

Next Steps:

Review Azure OpenAI Policies: Familiarize yourself with the policies governing Azure OpenAI to assess whether it suits your needs for production use.
Ensure Compliance: If you must use open-source models, work closely with legal and compliance teams to ensure you meet microsoft’s Supplier and data handling policies.
Model Evaluation: If you’re still experimenting, make sure to leverage artificial or dummy data and avoid using any live customer or prod data.

Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

Answer

@Mansi Vaishnav See this page for data privacy and security for models used from the model catalog. Model catalog from AI studio is using Azure ML workspace which is created with the same name as your project. Under Azure ML documentation the required information about data, privacy and security are addressed. As per the referenced document, the data passed to the model is governed by the terms and conditions of the model provider and are subject to Azure's data, privacy and security commitments.

While the model is provided by the model provider, and your use of the model (and the model provider's accountability for the model and its outputs) is subject to the license terms provided with the model, Microsoft provides and manages the hosting infrastructure and API endpoint. The models hosted in Models-as-a-Service are subject to Azure's data, privacy, and security commitments. Learn more about Azure compliance offerings applicable to Azure Machine Learning here.

I think the same is outlined in the page under Azure AI studio too.

The highlighted text in your screen shot reiterates that using the model for demo/testing purposes with production data is not recommended as per the purchase policy. I think this acknowledgement is required if you are trying to deploy your first model from the catalog as the subsequent deployments from model catalog do not ask for acknowledgement of the same. See my screenshot below for the same model without the acknowledgement.

User's image

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Microsoft Learn Challenge

Share via

Data concern for usage of Open-Source models in Azure Marketplace

1 additional answer

Your answer