Correct the mistakes in the work of the AI agent with the vector database.
Hello.
I am creating an application. The system is built in such a way: The user makes a request in the application chat - the AI Agent searches for information in the database and provides a response. If the user's request goes beyond the database, then the agent refers to Gemini.
The database is Supabase. Data from two government websites is uploaded there via API.
A flow is built in the backend using n8n (the same agent). It runs through the database and provides answers and comments.
In Supabase, tables with data from 2 websites are created. They are combined into 1 table document_chunks.
The problem is that the AI Agent finds information only in 1 table and does not seem to see the data from the second one. Even if a request is made and the data that needs to be found is copied (we know that this data is there), the agent does not find it.
It is also necessary to build an algorithm for searching for the agent. The information base is huge, and I assume the agent does not know what is relevant and what is not. There is a lack of a "clusters" search system.
Regarding different metadata. this is not an error, but an architectural decision. One table stores two fundamentally different types of documents:
Legislation (Sejm) - has the structure "Article/Paragraph".
Court decisions (SAOS) - have the structure "Case Number/Court". The fact that some rows have certain fields while others are NULL is normal practice for mixed databases. This allows the Agent to understand whether it is citing a law or a court ruling. They cannot be unified because they are different entities.
Regarding "we record some vectors, we search for others".
Both for writing to the database (parsing) and for searching (agent), the same model is used — Google Gemini Embeddings.
The vector dimension is fixed — 768.
If the sizes were different, the database would have thrown a transaction error at the stage of writing or searching, and the system would not work at all.
Regarding "multi-stage search" - here I probably agree, because I found no other options, it probably needs to implement hybrid search
Vectors are formed by the Google Gemini Embeddings model via API.
Technically, this is implemented through the Google Gemini Embeddings node in n8n, which sends text to Google servers, receives a vector array embedding in response, and passes it for writing to the database.
The incoming request is transformed into a vector using the Google Gemini model.
A function is called in Supabase that compares the query vector with the document vectors in the database.
Results are sorted by similarity coefficient (from highest to lowest).
The database returns a set number of Top-K most relevant text fragments, regardless of whether it is Sejm law or SAOS court decision, which are then passed into the AI context.
Physically, the data is in the database. I checked this through direct queries to the database — records of SAOS court decisions are loaded correctly.
The agent does not see these decisions at the moment of forming the response. The AI agent cannot correctly match a human question with the legal texts of court decisions during the search. It sees Sejm laws but "skips" SAOS rulings, and this is the main problem.
All logic and structure of the project are implemented standardly within your Supabase account (PostgreSQL). There is nothing external or hidden there.
The search functions are standard SQL functions, they are located directly in the database schema.
The vector size is clearly defined in the data types of the columns in the document table.
Any developer with access to the database can obtain this information in a minute by simply reviewing the Schema structure and the list of functions through an SQL query or the Supabase interface.
Rows where sejm_id exists and saos_id is NULL are legislative acts. They come from the Polish Sejm database. Since this is a law, it has a number in the Sejm register, but it is not a court decision, so it cannot have saos_id.
Rows where saos_id exists and sejm_id is NULL are court decisions from the SAOS database. Each ruling has its unique number in the court system. But a ruling is not a law, the Sejm did not adopt it, so it does not have and cannot have sejm_id.
Щодо різної метадати. це не помилка, а архітектурне рішення. В одній таблиці зберігаються два принципово різні типи документів:
Законодавство (Sejm) - має структуру "Артикул/Параграф".
Судові рішення (SAOS) - мають структуру "Номер справи/Суд". Те, що в одних рядках є одні поля, а в інших вони NULL — це нормальна практика для змішаних баз даних. Це дозволяє Агенту розуміти, чи цитує він закон, чи вирок суду. Уніфікувати їх неможливо, бо це різні сутності.
Щодо «записуємо одні вектори, шукаємо інші».
І для запису в базу (парсинг), і для пошуку (агент) використовується одна й та сама модель — Google Gemini Embeddings.
Розмірність вектора фіксована — 768.
Якби розміри відрізнялися, база даних видала б помилку транзакції ще на етапі запису або пошуку, і система не працювала б взагалі.
Щодо "пошуку в декілька етапів" - а ось тут мабуть погоджуюсь, бо інших варіантів я не знайшов, напевно треба реалізовувати гібрідний пошук
Вектори формує модель Google Gemini Embeddings через API.
Технічно це реалізовано через ноду Google Gemini Embeddings у n8n вона відправляє текст на сервери Google, отримує у відповідь векторний масив embedding і передає його для запису в базу даних.
Вхідний запит перетворюється на вектор за допомогою моделі Google Gemini.
У Supabase викликається функція, яка порівнює вектор запиту з векторами документів у базі.
Результати сортуються за коефіцієнтом схожості (від найбільшого до найменшого).
База повертає встановлену кількість Top-K найбільш релевантних фрагментів тексту, незалежно від того, чи це закон Sejm, чи судове рішення SAOS, які далі передаються в контекст ШІ.
Фізично в базі дані є. Я перевірив це через прямі запити до бази даних — записи судових рішень SAOS завантажені коректно.
Агент не бачить цих рішень у момент формування відповіді. ШІ-агент при пошуку не може правильно зіставити людське питання із юридичними текстами судових рішень. Він бачить закони Sejm, але «пропускає» вироки SAOS і саме це головна проблема.
Вся логіка та структура проекту реалізована стандартно в межах вашого акаунта Supabase (PostgreSQL). Нічого зовнішнього чи прихованого там немає.
Функції пошуку це стандартні SQL-функції, вони знаходяться безпосередньо в схемі бази даних.
Розмір вектора цей параметр чітко визначений у типах даних колонок таблиці з документами.
Будь-який розробник, маючи доступ до бази, може отримати цю інформацію за хвилину, просто переглянувши структуру Schema та список функцій через SQL-запит або інтерфейс Supabase.
Рядки, де sejm_id є, а saos_id — NULL це законодавчі акти. Вони приходять з бази польського Сейму. Оскільки це закон, у нього є номер у реєстрі Сейму, але він не є судовим рішенням, тому він не може мати saos_id.
Рядки, де saos_id є, а sejm_id — NULL це судові рішення з бази SAOS. У кожного вироку є свій унікальний номер у системі судів. Але вирок — це не закон, Сейм його не ухвалював, тому він не має і не може мати sejm_id.
Applications 1
-
2 days266 USD
272 1 2 days266 USDGood morning,
I have analyzed the problem with your AI agent and identified the main cause — vector competition between two different types of documents in one space. The documents of the Sejm (structured articles) always win against the SAOS rulings (unstructured legal text) in standard vector searches.
Solution
Architecture with two independent search streams:
Query classifier — determines the type of search (law/precedent/both)
Parallel search — separate queries with optimal configuration
Intelligent merging — merging with cross-reference (rulings → laws)
… API cost optimization
Current system: ~$0.20-$0.25/query
New system: ~$0.03-$0.04/query
Reduction: 85% (with 1000 queries/month = savings of ~800 PLN/month)
Scope of work
I will perform remotely (9-10h):
Analysis and technical plan (2-3h)
SQL functions + configuration (2-2.5h)
Query classifier (2h)
Basic n8n workflow (3-3.5h)
You will perform independently (savings of ~4-5h):
Data migration (ready SQL script with instructions)
Import workflow + configuration of credentials
Testing according to the checklist
Fine-tuning parameters
You will receive: SQL scripts, n8n workflow JSON, step-by-step instructions, checklist, Telegram support
Pricing
Standard calculation: 1000-1350 PLN (9-10h × 25-30 EUR)
Proposal: 990 PLN
Discount of ~30% from the standard rate
Optionally: full "turnkey" implementation +200 PLN (TeamViewer debugging, extended testing)
If the proposal is OK, we can start with the analysis (2-3h) — you will receive a specific plan and confirmation of feasibility.
Best regards,
Eugeniusz
-
1 day135 USD
1562 7 0 1 day135 USDI am among the top 5 developers in the category of "Artificial Intelligence and Machine Learning" among ~2100 specialists on the platform.
I guarantee:
- Fast and high-quality task execution
- Strict adherence to deadlines
- Regular communication throughout the entire process
I would be happy to discuss the details of your project in private messages.
The price is conditional.
-
3 days135 USD
1539 4 0 1 3 days135 USDGood day. I looked at the description of the problem, a typical case.
It seems that part of the data is either not indexed due to embeddings or simply does not appear in the search. Plus, the agent currently lacks proper selection logic, so it gets lost in the large database and does not understand where to search for what.
I have worked with Supabase, vector search, and RAG systems. I can find out why the second part of the data is not being found. Rebuild the search logic. Make it so that Gemini connects only when there is truly no answer in the database.
As a result, the agent will start to consistently find data and respond adequately.
I am ready to look at your current implementation and suggest a solution.
-
15 days1345 USD
12784 4 2 15 days1345 USDHello,
I’m very interested in helping improve your AI Agent’s search capabilities and ensure it reliably retrieves both Sejm legislation and SAOS court decisions. I have extensive experience building and optimizing backend systems with large, multi-entity databases, including Supabase/PostgreSQL, and integrating vector-based AI search workflows.
I can implement a robust, hybrid search algorithm that accounts for multiple document types, improves relevance ranking across large datasets, and introduces clustering or entity-aware retrieval to ensure the AI Agent does not skip any source. I’m comfortable working with n8n workflows, embedding models like Google Gemini, and designing scalable solutions for high-volume, structured data.
I’m confident I can enhance the search logic, improve result accuracy, and ensure the AI Agent fully leverages your combined dataset.
Best regards,
… Jeo Vincent Carretas
-
5 days269 USD
3160 23 1 3 5 days269 USDHello, Maximilian! Understanding your situation, I propose an integrative approach to solving the problem with the AI agent. First, I will conduct a detailed audit of the existing flow in n8n and the structure of your Supabase database. Relying on my experience in system architecture design, I will quickly identify and resolve the issue with searching among "document_chunks." Additionally, I will develop an optimal clustering algorithm that will enhance the accuracy and relevance of the agent's results. I will configure the system for seamless and scalable operation. Shall we discuss the project further?
-
1 day135 USD
2236 46 0 1 day135 USDGood day. Without a real understanding of what you have done there, it is difficult to say the real deadlines and costs. Write to me in private messages, we will discuss in more detail. For now, just preliminary questions.
Are you sure that you have two tables? Or maybe you have two databases? From the picture, nothing is clear, except that there is one table. What does the column chunk_index mean in this table?
What is the dimensionality of the embeddings? What are the sizes of the chunks? Why sub-bases and not specialized solutions?