Switch to English?
Yes
Переключитись на українську?
Так
Переключиться на русскую?
Да
Przełączyć się na polską?
Tak
AI Recruiting CLI

A command-line Python tool for a recruitment agency. Automates the entire vacancy processing cycle: from a raw employer text file to a row in Google Sheets.

Problem

The agency receives vacancy files from employers daily in arbitrary formats — TXT, CSV, DOCX. Each company has its own layout, language (Ukrainian / Polish / Russian), and its own way of separating vacancies within a file. Operators were manually transferring data into a spreadsheet — slowly and with errors.

Solution

A 6-stage pipeline with interactive operator confirmation at each key step:

Incoming file → Split into blocks → Text cleanup → Gemini LLM → Excel → Google Sheets

Parsing — recognizes 6+ vacancy separation formats: emoji markers, empty lines, tab-digest in the first line. Removes duplicates across languages (companies send one vacancy in 3–4 languages at once).

Cleanup — strips links, phone numbers, emails, and template tokens before sending to the LLM, reducing request cost.

Structured Output — Gemini fills 13 fields via a Pydantic schema and response_schema. Response-to-object mapping is done by vacancy_id from the XML tag, not by index (protection against data loss on partial batch responses). Three parsing levels: response.parsed → model_validate_json() → JSON fallback.

Deduplication — SHA-256 for exact matches + MinHash (threshold 0.85) for fuzzy matches, stored in SQLite.

Excel — saves the result to .xlsx with a timestamp; the operator can edit it manually before uploading.

Google Sheets — appends rows via gspread (OAuth). Upload only happens after explicit operator confirmation (y/n).

Tech Stack

LLM: google-genai — Gemini Flash
Structured Output: pydantic v2 + response_schema
Console/UI: rich
Google Sheets: gspread + google-auth (OAuth)
Excel: openpyxl
Deduplication: datasketch MinHash + sqlite3
Config: pyyaml
API keys: keyring (Windows Credential Manager)
Logs: loguru
Retry: tenacity

On first launch, a setup wizard prompts for the Gemini API key and saves it to Windows Credential Manager via keyring — the key is never stored in project files.

Notable Details

thinking_budget=1024 — Gemini's extended thinking mode is enabled to improve structured output accuracy.
Batching — vacancies are sent to the LLM in groups (XML tags vacancy id="..."); id-based mapping prevents vacancy loss on incomplete responses.
Cost tracking — counts tokens and USD per session, warns when the configured threshold is exceeded.
Portable — compiled into a .exe via PyInstaller for delivery to operators without a Python environment.

#python #automation #AI #LLM #Gemini #GoogleSheets #recruiting #CLI #pydantic #opensource
Work details
Budget 150 USD
Added 4 June
18 views
Freelancer
Anton P.
Ukraine Kyiv  6  0

Available for hire Available for hire
6 Safes completed
On the service 3 years