Randomly extract non-overlapping sets
Generate content for a word game. Input is data about relations (A-to-B strength). Output is distant relationships.
For example:
GOOD: apple, airplane, dog, house
BAD: banana, cherry, peach, strawberry.
You do NOT need to speak very much English. This is purely data.
I have two source files. A list of ranked relationships between words, and a separate list of words which may be in the first file. This is real English word data, similar to a thesaurus. https://www.powerthesaurus.org/
Task is to randomly output sets of lines from the 2nd file, which are NON-overlapping concepts.
An ideal algorithm would create a multidimensional mesh, and then randomly extract distant nodes. I don't know how to do that. i.e. output sets of words which are all distant in vector space. See: https://dzone.com/articles/introduction-to-word-vectors
A non-ideal algorithm would randomly pull lines from file 2, measure similarity to other output lines, if dissimilar, keep and remove from file 2. If a line in file 2 is similar to too many test lines, remove it and return to file 2. i.e. a "bag of coins", and you keep randomly testing/replacing coins so they are all different.
No word pair should be more than 0.3 similar; and the total similarity of all words between sets should be <0.5.
Preferred programming language is: ruby, perl, python.
Two input files:
A) relations.txt
#aaa [syn]: aab | aac; [syn-score]: 100.0 | 8.0;
#aab [syn]: aaa | aac; [syn-score]: 75.0 | 5.0;
#bbb [syn]: bba | bbc; [syn-score]: 50.0 | 4.3;
#bba [syn]: bbb | bbc; [syn-score]: 150.0 | 1.2;
#ccc [syn]: ccd | ccz; [syn-score]: 150.0 | 0.4;
... etc.
B) lists.txt
#aaa = aab | aac
#bbb = bbd | bba
#bba = bbd | bbx
#ccc = cca | ccz
#cca = ccd | cce
#ddd = dda | ddb
... etc.
The real file A is 300+ MB, with 855k lines.
The real file B is ~15k lines.
I will want to be able to set N, number of sets; and Y number of packs. N will typically be around 25 sets; and Y will likely be 1000 packs.
Output, with N=2:
#aaa = aab | aac
#cca = ccd | cce
#bbb = bbd | bba
#ddd = dda | ddb
#bba = bbd | bbx
#ccc = cca | ccz
Output, with N=3:
#aaa = aab | aac
#ccc = cca | ccz
#bba = bbd | bbx
#bbb = bbd | bba
#ddd = dda | ddb
#cca = ccd | cce
Current freelance projects in the category Web Programming
Landing page for a psychologistTask. Create a "turnkey" and "launch" landing page for a psychologist with potential for expansion (adding a menu and other pages). Sequential screenshots of the desired design are in the attached files. For this specialist https://barb.ua/uk/zaporozhye/master/izubar… Web Programming ∙ 5 hours 30 minutes back ∙ 16 proposals |
Primatic - Transfer of 3 pages from Figma to WordPress (Elementor)Hello! My name is Anna, I live in Israel and run a small creative studio. I mainly work with small and medium businesses and am currently looking for a freelancer for long-term collaboration. At the moment, there is a project: we need to transfer a finished design from Figma… HTML & CSS, Web Programming ∙ 6 hours 37 minutes back ∙ 24 proposals |
Looking for a Frontend Developer (React + Tailwind) to refine the websiteHello everyone! We are looking for a skilled frontend developer to help us finish the website of a premium plastic surgery clinic. The situation is as follows: the site has been designed and built using two AIs — the logic, structure, and code were created through neural… Web Programming, Web Design ∙ 10 hours 14 minutes back ∙ 37 proposals |
Development of 2 SEO-oriented websites for selling spare parts (ATVs and special equipment)Development of Two Specialized Websites for Selling Spare PartsGeneral Information It is necessary to develop two specialized websites: Spare parts for ATVs, UTVs, SSVs, and other similar equipment. Spare parts for special equipment. Existing company website:… PHP, Web Programming ∙ 20 hours 57 seconds back ∙ 51 proposals |
Need a 1C specialist (Trade Management)
23 USD
Need a 1C specialist (Trade Management) We are looking for an experienced 1C programmer for a small integration project. We have an online store and a 1C UT configuration. It is necessary to set up data exchange between the website and 1C — so that the website can receive… System & Network Administration, Web Programming ∙ 23 hours 12 minutes back ∙ 12 proposals |