There is a CSV with approximately 55,000 companies (ID, name, location) and a clearly configured and tested prompt for classification into 14 industry categories. Attached is a screenshot of part of the CSV file and a text file with the prompt.
Required:
Based on my CSV and prompt:
split the data into batches of 50 companies;
generate a file
requests.jsonlin OpenAI Batch API format forPOST /v1/responseseach line of JSONL must contain my prompt plus 50 lines of companies.
I will upload
requests.jsonlto OpenAI myself, run the Batch, and upload the results fileresults.jsonl.After that, it is necessary to:
parse
results.jsonl;extract from the model responses pairs
ID,ShortLabel(where ShortLabel is one of the 14 codes orUnknown);return the final CSV with columns
ID,ShortLabel;separately mark problematic or unreadable lines, if any.
Important conditions:
Access to my OpenAI API key is not needed, I will run the Batch myself.
The content of the prompt cannot be changed, only minimal cosmetic formatting is allowed.
This is a one-time task: a single correct run on all 55,000 lines is sufficient.
Desired skills:
Experience with Python or Node.js.
Ability to work with JSONL files.
Real experience with OpenAI API is preferred, ideally with Batch API.
In your response, please specify:
Which language you will use (Python or Node).
If you have experience specifically with OpenAI API / Batch.