Eval-Lab — Regression of prompts and models – work from freelancer's portfolio Dmytro | example from category AI & Machine Learning (№2051218)

Switch to English version?

Yes

Переключитись на українську версію?

Так

Переключиться на русскую версию?

Да

Przełączyć się na polską wersję?

Tak

Login
Registration
- Welcome to Freelancehunt
  
  Work risk-free, saving time and money
  
  Login Registration

Web dashboard for regression testing of prompts and models. Running a test set through two models/prompts — comparison based on 4 sub-scores.

Technically interesting aspects:
— LLM-as-judge through 5 providers (OpenRouter, Anthropic via tool-use, Gemini, Groq, mock)
— 4 sub-scores for each case: correctness, relevance, completeness, prompt_quality
— Cap on final score for poor prompt — prevents strong model from masking poor prompt
— Per-provider throttle and retry with backoff + Retry-After
— Mock mode for running without API keys (CI-friendly, $0)
— Editing secrets in logs

Stack: FastAPI, async SQLAlchemy, Alembic, httpx, Pydantic, vanilla JS, Docker.

←
Work 1 out of 3
→

huggingface.co

Added 15 June

4 views

Publish a similar project

Dmytro Staroselskyi

Lvov
No reviews

Available for hire

On the service 6 years

←
Work 1 out of 3
→