Automatic Generation Of SFT Data For LLM
Developed a script for processing a custom text dataset (~400 entries) using a lightweight language model. Built a solution deployable in Google Colab for automatically generating a new CSV dataset in a "question-answer" format based on the original data for subsequent LLM training.
The system is capable of analyzing each new context, generating relevant questions and answers in real time, and efficiently handling large volumes of data.
The system is capable of analyzing each new context, generating relevant questions and answers in real time, and efficiently handling large volumes of data.