Google Sheets - formula (task to extract info from a junk application)
Looking for a good expert in regular expressions who could write a single expression (complex, of course) for all known cases.
Problem: there is no single format for application submissions and there won't be one - therefore, it is important to extract the type of vehicle from the most cluttered applications using a formula.
Task 1 - improve the formula (I provide the formula below) - its task is to determine what type of vehicle is needed from the stream of cluttered text of the application.
The formula that works (but does not account for all cases):
=ARRAYFORMULA(
IF(J2:J="";"";IF(
(
REGEXMATCH(LOWER(J2:J);"(^|[^0-9])10\s*т([^0-9]|$)|10т([^0-9]|$)|10\s*тон|десят")
*
REGEXMATCH(LOWER(J2:J);"гідроборт|гидроборт|(^|[^а-яіїєґa-z])гб([^а-яіїєґa-z]|$)|(^|[^a-z])gb([^a-z]|$)|г\s*/\s*б")
)>0;"10Т";
IF(
(
REGEXMATCH(LOWER(J2:J);"(^|[^0-9])5\s*т([^0-9]|$)|5т([^0-9]|$)|пятитон|5/7|5тгб")
*
REGEXMATCH(LOWER(J2:J);"гідроборт|гидроборт|(^|[^а-яіїєґa-z])гб([^а-яіїєґa-z]|$)|(^|[^a-z])gb([^a-z]|$)|г\s*/\s*б")
)
+
REGEXMATCH(LOWER(J2:J);"5т\s*3\s*гб|5т\s*з\s*гб|5\s*т\s*з\s*гб|5т\s*гб")
>0;"5Т";
IF(
REGEXMATCH(LOWER(J2:J);"верхн(є|е)\s*завантаж|верхн(яя|є)\s*загруз|верхняя\s*загруз|верх\s");"10Т";
IF(
REGEXMATCH(LOWER(J2:J);"(^|[^0-9])10\s*т([^0-9]|$)|10т([^0-9]|$)|10\s*тон|десят");"10Т";
IF(
REGEXMATCH(LOWER(J2:J);"(^|[^0-9])7\s*т([^0-9]|$)|7т([^0-9]|$)|7-10");"7Т";
IF(
REGEXMATCH(LOWER(J2:J);"(^|[^0-9])6\s*м([^0-9]|$)|6м([^0-9]|$)");"5Т";
IF(
REGEXMATCH(LOWER(J2:J);"(^|[^0-9])5\s*т([^0-9]|$)|5т([^0-9]|$)|пятитон|5/7|5тгб");"5Т";
IF(
REGEXMATCH(LOWER(J2:J);"(^|[^0-9])3\s*т([^0-9]|$)|3т([^0-9]|$)|тонік|3тгб");"3Т";
IF(
REGEXMATCH(LOWER(J2:J);"(?:^|[^0-9])8\s*м(?:[^0-9]|$)|(?:^|[^0-9])8м(?:[^0-9]|$)");"10Т";
IF(
REGEXMATCH(LOWER(J2:J);"(?:^|[^0-9])7\s*м(?:[^0-9]|$)|(?:^|[^0-9])7м(?:[^0-9]|$)");"7Т";
IF(
REGEXMATCH(LOWER(J2:J);"газел|газель|авто|будка|фургон|тент");
IF(
NOT(REGEXMATCH(LOWER(J2:J);"гідроборт|гидроборт|рокла"));"Газель"; // Exclude Gazelle if there is a гідроборт or рокла
IF(
REGEXMATCH(LOWER(J2:J);"бус|бусик|минибус|мінік|миник|мини|максі|макси|середн(ій|ый)\s*бус");"Бус"; // Exclude Bus if there is a гідроборт or рокла
IF(
REGEXMATCH(LOWER(J2:J);"авто\s*\+\s*2\s*вантажника|2\s*вантажника|авто\s*\+\s*вантажника|2\s*грузчика");" + грузчики"; // For loaders
IF(
REGEXMATCH(LOWER(J2:J);"авто\s*\+\s*гідроборт\s*\+\s*рокла|авто\s*гідроборт\s*рокла|авто\+гідроборт\+рокла");"3Т"; // For vehicles with гідроборт
IF(
REGEXMATCH(LOWER(J2:J);"Закрою\s*2200/600\s*\+200\s*гідроборт\s*\+\s*250\s*рокла");"5Т"; // For other cases
IF(
REGEXMATCH(LOWER(J2:J);"Закрою\s*2200/600\s*");"5Т";
IF(
REGEXMATCH(LOWER(J2:J);"Закрою\s*2800");"10Т"; // For 10Т
IF( REGEXMATCH(LOWER(J2:J);"авто\s*\+\s*2\s*вантажника|2\s*вантажника|авто\s*\+\s*вантажника|2\s*грузчика");" + грузчики"; // For loaders
IF(
AND(
REGEXMATCH(LOWER(J2:J);"гб|гиброборт|гідроборт")
;
NOT(OR(REGEXMATCH(LOWER(J2:J);"3т"); REGEXMATCH(LOWER(J2:J);"Газель")))
);"3Т" // For 3Т
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)

Client's review of cooperation with Andrii Andriiev
Google Sheets - formula (task to extract info from a junk application)I am grateful to Andrey for the quick and quality solution to my task, I recommend for collaboration!
Freelancer's review of cooperation with Mariya Dzyuba
Google Sheets - formula (task to extract info from a junk application)Great experience of collaboration. Maria clearly set the task regarding regular expressions and listened to the suggestion to change the formula architecture instead of piling up conditions. Quick review and payment. I recommend!
-
300 2 0 Hello, Maria. I reviewed your screenshot. The appearance of FALSE is a natural result of the current formula architecture. Nested conditions (IF) have "blind spots": if the text does not meet strict criteria (for example, there is "hydroboard," but there is no tonnage), the chain breaks and the formula fails.
My solution: I will rewrite this formula using modern functions LET and IFS. This will allow us to avoid nesting conditions within each other and establish a clear hierarchy of checks:
Normalization: First, we clean the text of junk.
Feature identification: We separately look for markers (Tonnage, Hydroboard, Body type).
Decision matrix: If "10t" — then 10T. If "Hydroboard" WITHOUT tonnage — then we assign a default class (for example, 5T or Van, as you say).
Result: You will receive a stable formula that does not produce FALSE, is easy to read, and that you can supplement with new trigger words on your own without breaking the entire system.
-
631 5 0 I can do it, contact me
-
986 12 1 I am ready to promptly and qualitatively complete your order. I have experience working with similar projects, always adhere to deadlines and technical specifications. I will be happy to collaborate!
-
973 17 0 Good day!
Can AI be tasked with parsing the application and extracting the car) Or can this comment be broken down into different columns altogether. It will work automatically.
-
Могу предположить, что для такой задачи правильнее использовать правила, в идеале с обучением и весами.
-
Думав над реалізацією Вашого проекту. Тут проблема в тому, що будь яка помилка в тексті - пропуск букви, опечатка - призводить до непрацездатності формули. А передбачити наперед, де, наприклад, в слові "гідроборт" зроблять помилку, досить проблематично. Тому найкращим рішенням буде все таки AI- сам буде шукати слова, вирази, інтерпретації слів, слова з помилками. І базу пошуку бота виправити буде значно легше ніж формулу. Якщо цікавить така реалізація - пишіть, проконсультую.
-
Current freelance projects in the category Databases & SQL
Refinement of 1C UT 11 for Zebra TSD (RDP): different sound signals when scanning
22 USD
Configuration: 1C UT 11 Address warehouse Zebra TC26 TSD Work via RDP Product scanning is performed in receiving, placement, picking documents, and other warehouse operations. Current problem: Warehouse workers operate through the Zebra TSD. When scanning, they do not always… C#, Databases & SQL ∙ 3 hours 57 minutes back ∙ 2 proposals |
Heal the 1C configuration
111 USD
Configuration of CRM & ERP SmartCeiling (2.8.26.0) Protection via Registration Code. Registered until the end of the year. Databases & SQL ∙ 19 hours 3 minutes back ∙ 7 proposals |
Need a 1C specialist for refinements and development.I am looking for a 1C specialist for freelance collaboration. I am currently working with a contractor who provides support and maintenance for the 1C system. However, due to the contractor's workload, there is a need for prompt execution of additional tasks, improvements, and… Databases & SQL ∙ 5 days 18 hours back ∙ 11 proposals |
Restoring Instagram after a banOn March 3, 2026, my Instagram was blocked due to a violation of rules. I believe there has been some mistake, as the page was personal with photos and videos of my personal trips. I did not post any comments anywhere. I have sent several appeals. I cannot download my files… Databases & SQL, Web Programming ∙ 6 days 5 hours back ∙ 6 proposals |
Eliminate the issue of incomplete data import from Excel files of Nova Poshta specifications into 1C:
45 USD
Goal: There is a processing in 1C for uploading the specifications of Nova Poshta. For unclear reasons, it has stopped loading some tabular data. We need to find the reason and eliminate the problem of incomplete data import from Excel files of Nova Poshta specifications into… System & Network Administration, Databases & SQL ∙ 9 days 12 hours back ∙ 16 proposals |