OpenAI Finetuning error

aa 0 Reputation points
2023-04-07T07:03:49.5033333+00:00

Hello, I want to finetune my own data using GPT3. But i'm trying to train model(i.e. finetuning) after preprocessing the data, but I get an error. Error message is 'Training data validation failed: each of the classes must start with a different token. You can view your class tokenizations at https://platform.openai.com/tokenizer?view=bpe.' The data format is as follows: {"prompt":"Big: 382\nMid: 1039\nSmall: 3417\nProblem: blahblah blahblah\n\n###\n\n","completion":" 5032"} Examples for reference are:
https://github.com/openai/openai-cookbook/blob/main/examples/azure/finetuning.ipynb Please help me.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,942 questions
{count} votes

1 answer

Sort by: Most helpful
  1. VasimTamboli 4,910 Reputation points
    2023-04-08T11:48:15.69+00:00

    he error message suggests that there is a problem with the formatting of your training data. Specifically, it seems that there are issues with the class tokenizations, which are likely related to how you have preprocessed your data. One possible solution is to ensure that each class starts with a different token. You can check your class tokenizations using the link provided in the error message (https://platform.openai.com/tokenizer?view=bpe) to see if there are any issues there. Additionally, you may want to check that your preprocessing steps are consistent with the format expected by the GPT-3 model. Make sure that your training data is properly tokenized and that you are using the correct encoding for your inputs. Finally, it may be helpful to consult the OpenAI documentation and community forums for more specific guidance on how to troubleshoot this issue. Good luck!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.