Hi @Tom Chow,
Thank you for reaching out to Microsoft Q&A forum!
When training a custom neural extraction model in Azure Document Intelligence, it is recommended to use a diverse set of training data that represents the range of document types and layouts that the model will encounter in production. It is not necessary to separate the types of documents into several models, the neural model can perform well on a variety of document types. However, if you have a large number of documents with very different layouts or structures, it may be beneficial to train separate models for each type of document.
To classify the type of documents in a resume-related model, it is recommended to use a composed model based on job descriptions or manual classification. To handle empty fields, use techniques such as data augmentation or synthetic data generation. When extracting table fields, ensure that the table structure is consistent. Use at several labelled examples for each field and evaluate the model's performance on a validation set of documents. If the model is not performing well, adjust the training data or model parameters and retrain the model.
To avoid manual data entry, you can use the Azure Document Intelligence service to automatically extract data from your resumes. You can use Azure Functions to automate the process of uploading resumes to Azure Blob Storage and extracting data from them using the Document Intelligence REST API. This can help you optimize your model related to resume parsing and avoid repetitive manual data entry.
To optimize your model, create a custom template or neural model, ensure your training data is representative and label it accurately. Maximize accuracy by using a sufficient number of samples and representative data.
I hope this information helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful.