如何使用单词嵌入列表在 JSON 文件中搜索字符串并返回最近的出现次数?

问题描述 投票:0回答:1

我在 Python 中看到了一段代码,它生成一个包含嵌入表示(表示字符串的向量)的文件。

“all-MiniLM-L6-v2”模型中生成的文件格式为:

[
   {
      "codigo":1,
      "descricao":"Alain Prost",
      "embedding":[
         -0.04376700147986412,
         0.08378474414348602,
         -0.044959407299757004,
         -0.06955558061599731,
         -0.0011182611342519522,
         0.10521695017814636,
         0.11189017444849014,
         0.1651790291070938,
         0.07515741139650345,
         0.05490146577358246,
         0.02417689561843872,
         -0.016437038779258728,
         0.010290289297699928,
         0.017122231423854828,
         -0.05169348418712616,
         -0.016834666952490807,
         -0.01511311624199152,
         0.007502275053411722,
         0.03960637003183365,
         0.013815234415233135,
         -0.05070938542485237,
         -0.056177735328674316,
         0.015933101996779442,
         -0.007893730886280537,
         1.4036894754099194e-05,
         -0.01063060574233532,
         0.05427253618836403,
         0.016765154898166656,
         0.04841822385787964,
         -0.02379232831299305,
         0.025293899700045586,
         -0.06888816505670547,
         -0.03624174743890762,
         -0.040663089603185654,
         -0.004510633181780577,
         -0.03612743690609932,
         -0.08588571101427078,
         -0.03383230045437813,
         -0.03971630707383156,
         0.0925847589969635,
         0.06980527937412262,
         0.011318318545818329,
         -0.14096367359161377,
         0.029876230284571648,
         -0.01633320190012455,
         -0.010737375356256962,
         0.04669718071818352,
         -0.014320306479930878,
         -0.05380765348672867,
         -0.01826721429824829,
         -0.0775720626115799,
         0.007413752842694521,
         0.010430709458887577,
         -0.07329824566841125,
         -0.038187265396118164,
         -0.02384389564394951,
         0.07746574282646179,
         0.02492334321141243,
         0.002449194435030222,
         -0.05240411311388016,
         0.020897606387734413,
         -0.01624673791229725,
         -0.06399786472320557,
         -0.03406109660863876,
         0.05889088287949562,
         0.045756977051496506,
         -0.08131976425647736,
         0.0538562573492527,
         -0.06892945617437363,
         0.04350525140762329,
         -0.05869260057806969,
         0.024457629770040512,
         0.0017231887904927135,
         0.041741617023944855,
         0.06515597552061081,
         -0.08843974024057388,
         -0.036975421011447906,
         -0.04383429139852524,
         -0.04289741814136505,
         -0.03480835258960724,
         0.04213075712323189,
         -0.0947691947221756,
         -0.10631424933671951,
         -0.05164273455739021,
         0.0527079738676548,
         -0.0026282896287739277,
         0.11123877763748169,
         -0.010186375118792057,
         0.004350247327238321,
         -0.09234373271465302,
         0.00022207570145837963,
         -0.036559659987688065,
         -0.05228490009903908,
         0.03234873339533806,
         -0.005511161405593157,
         0.04750655218958855,
         -0.08976765722036362,
         -0.005845387000590563,
         -0.02803802862763405,
         0.14588715136051178,
         -0.0012976604048162699,
         0.04080767557024956,
         0.04338463768362999,
         0.015407223254442215,
         -0.08320754021406174,
         0.037945766001939774,
         -0.017297346144914627,
         0.024563206359744072,
         0.04263288155198097,
         0.025433938950300217,
         -0.03403696045279503,
         -0.05286381393671036,
         -0.01756090484559536,
         -0.002016932936385274,
         0.0027279567439109087,
         0.047004375606775284,
         -0.04959726706147194,
         -0.015475046820938587,
         0.0725177600979805,
         -0.04801830276846886,
         0.048273105174303055,
         -0.029613768681883812,
         -0.05410566180944443,
         0.05482526868581772,
         0.0076617104932665825,
         0.073040671646595,
         -0.03162190690636635,
         -8.039190253239277e-34,
         -0.013159706257283688,
         -0.016090840101242065,
         0.07397063821554184,
         0.07282368093729019,
         -0.005004068370908499,
         0.0062707713805139065,
         -0.05940960720181465,
         -0.07829747349023819,
         -0.017122328281402588,
         -0.07634077966213226,
         -0.02839534729719162,
         -0.07541434466838837,
         0.011743525043129921,
         -0.026070842519402504,
         0.021514642983675003,
         0.03044724091887474,
         0.037806976586580276,
         0.03549019619822502,
         0.013167202472686768,
         -0.018708810210227966,
         0.007411877159029245,
         0.04208431392908096,
         -0.0017672213725745678,
         0.016767306253314018,
         0.042273279279470444,
         0.00972240325063467,
         0.09876655787229538,
         -0.013753202743828297,
         -0.039335619658231735,
         -0.030701594427227974,
         -0.006173287518322468,
         0.025760365650057793,
         -0.04054010286927223,
         0.056439004838466644,
         0.023311946541070938,
         -0.022928737103939056,
         -0.007852778770029545,
         -0.04520851746201515,
         0.045798882842063904,
         0.008332950063049793,
         0.005317758768796921,
         -0.021758222952485085,
         0.08777586370706558,
         -0.001095705316402018,
         0.008322017267346382,
         -0.047873519361019135,
         0.023781653493642807,
         0.05791536718606949,
         0.1103583350777626,
         -0.03695837780833244,
         0.03424883633852005,
         -0.0043442994356155396,
         -0.045328013598918915,
         -0.006399083416908979,
         -0.0022741626016795635,
         0.026356521993875504,
         -0.06595919281244278,
         0.01489550806581974,
         -0.00993384514003992,
         -0.004256079904735088,
         0.05318630486726761,
         0.03500215709209442,
         -0.030282488092780113,
         0.06818058341741562,
         -0.03611261025071144,
         -0.00042665813816711307,
         -0.03958318755030632,
         0.054165199398994446,
         0.03490123152732849,
         -0.027355331927537918,
         -0.1218971237540245,
         0.059496473520994186,
         0.11048189550638199,
         -0.044817615300416946,
         -0.045876920223236084,
         0.05318649485707283,
         -0.019234681501984596,
         0.025589890778064728,
         -0.09075476229190826,
         0.006619459483772516,
         -0.07048900425434113,
         0.002478431211784482,
         0.014732835814356804,
         0.015378294512629509,
         -0.010561746545135975,
         -0.044879332184791565,
         -0.0440324991941452,
         0.000804506300482899,
         0.04663644731044769,
         0.12025374174118042,
         0.02576148509979248,
         -0.006950514391064644,
         -0.008816791698336601,
         0.01322726346552372,
         -0.10207735002040863,
         6.758107581531859e-35,
         -0.04895230382680893,
         0.00044889742275699973,
         0.06258796155452728,
         0.05086054280400276,
         0.10057681798934937,
         -0.03941198065876961,
         0.021326975896954536,
         0.08152614533901215,
         -0.0004993032780475914,
         0.019457058981060982,
         0.09902072697877884,
         -0.06066109240055084,
         0.10520972311496735,
         -0.1180957779288292,
         -0.04043348878622055,
         0.13587746024131775,
         -0.011231197975575924,
         0.005684691481292248,
         -0.05967259034514427,
         -0.08215924352407455,
         0.024332145228981972,
         0.024530921131372452,
         0.031302567571401596,
         -0.04070316627621651,
         -0.12310207635164261,
         0.03254634514451027,
         0.11270913481712341,
         0.060394853353500366,
         -0.08383730798959732,
         -0.01133598294109106,
         -0.03808245062828064,
         -0.023190151900053024,
         -0.06691887974739075,
         0.013513924553990364,
         -0.05324095860123634,
         0.09535984694957733,
         -0.021769806742668152,
         0.06808806955814362,
         -0.0018341721734032035,
         0.08443459868431091,
         -0.04012518748641014,
         -0.009696738794445992,
         0.037875086069107056,
         -0.026477433741092682,
         0.07446243613958359,
         -0.06514057517051697,
         0.015685996040701866,
         -0.06705299019813538,
         0.024632146582007408,
         -0.014661968685686588,
         -0.018442410975694656,
         0.05574002489447594,
         -0.02014113776385784,
         -0.047132350504398346,
         0.0496378056704998,
         0.0052811079658567905,
         -0.03336593508720398,
         -0.002416495466604829,
         0.008500812575221062,
         0.07484209537506104,
         0.07398315519094467,
         0.056250426918268204,
         0.03129546344280243,
         0.0264076329767704,
         0.030829958617687225,
         -0.06896060705184937,
         -0.11525331437587738,
         -0.02287617139518261,
         0.014295394532382488,
         0.06505643576383591,
         0.08990739285945892,
         0.05023878812789917,
         -0.1306740790605545,
         0.005228940863162279,
         -0.02513446845114231,
         0.09248469024896622,
         -0.04951559379696846,
         0.07476413995027542,
         -0.02717839926481247,
         0.008030343800783157,
         -0.03858125954866409,
         -0.09855242073535919,
         -0.04341096431016922,
         0.01543387770652771,
         -0.024819210171699524,
         0.036512166261672974,
         -0.03962823003530502,
         -0.09858094900846481,
         0.0702538713812828,
         -0.04758270084857941,
         -0.0056264870800077915,
         -0.025418918579816818,
         0.04300766438245773,
         -0.05326545983552933,
         0.02151181921362877,
         -1.2410082739222617e-08,
         -0.022358816117048264,
         0.015648063272237778,
         -0.0415060892701149,
         -0.00010502521035959944,
         -0.0314381904900074,
         -0.06952173262834549,
         0.030622998252511024,
         -0.09376975148916245,
         -0.04358035698533058,
         0.004702138714492321,
         -0.04107971489429474,
         -0.015522287227213383,
         0.04647141695022583,
         -0.03630853071808815,
         0.07640153914690018,
         0.015367956832051277,
         0.0003513091360218823,
         0.07410185784101486,
         -0.024652114138007164,
         0.04225892946124077,
         0.005745219066739082,
         0.03425384312868118,
         -0.017282333225011826,
         -0.028105905279517174,
         -0.019109562039375305,
         -0.022345177829265594,
         0.04238805174827576,
         0.01908213645219803,
         0.004253830295056105,
         -0.004323870409280062,
         -0.00828507263213396,
         0.04277166351675987,
         0.01263809110969305,
         -0.08606499433517456,
         0.06635372340679169,
         0.09709060937166214,
         0.03835307061672211,
         0.05318101495504379,
         -0.0021448535844683647,
         0.0766974613070488,
         0.024480514228343964,
         -0.03913270682096481,
         0.004100404679775238,
         0.029588110744953156,
         0.006501220166683197,
         0.03766942396759987,
         0.0055293552577495575,
         -0.05407750979065895,
         0.003028532490134239,
         -0.004140743054449558,
         -0.0023235157132148743,
         0.05007375031709671,
         -0.01090778037905693,
         0.012557691894471645,
         0.018586203455924988,
         0.053417790681123734,
         -0.03843330964446068,
         0.003068356541916728,
         -0.07908729463815689,
         -0.01524473074823618,
         0.04108268767595291,
         -0.02860739268362522,
         0.06565400958061218,
         0.023170659318566322
      ]
   },
   {
      "codigo":2,
      "descricao":"Ayrton Senna",
      "embedding":[
         -0.11275111883878708,
         -0.04252505674958229,
         -0.009049834683537483,
         0.011212156154215336,
         -0.047949858009815216,
         0.030582023784518242,
         0.13628773391246796,
         -0.008150441572070122,
         -0.0001293766836170107,
         0.03802379593253136,
         0.072489432990551,
         -0.08784235268831253,
         -0.0781305655837059,
         0.06677593290805817,
         -0.06298733502626419,
         0.087885282933712,
         -0.053338438272476196,
         -0.013437110930681229,
         0.02285934053361416,
         -0.03463083133101463,
         -0.1208895593881607,
         0.035654135048389435,
         -0.0034052329137921333,
         0.02075120247900486,
         0.01327497884631157,
         -0.032590851187705994,
         0.004454594571143389,
         0.05418514460325241,
         -0.06094468757510185,
         -0.05599478632211685,
         -0.004106787499040365,
         -0.07678581774234772,
         0.04340159147977829,
         0.017842937260866165,
         0.02949387952685356,
         -0.007257427088916302,
         -0.0644332766532898,
         0.012047283351421356,
         0.014177532866597176,
         0.015570977702736855,
         0.007476386614143848,
         -0.01021003257483244,
         -0.024430135264992714,
         0.01893731951713562,
         -0.03585066273808479,
         -0.040841732174158096,
         0.02237538993358612,
         -0.06412603706121445,
         0.03432679921388626,
         0.0031201448291540146,
         -0.026181157678365707,
         -0.04635085165500641,
         -0.059544868767261505,
         -0.005927531514316797,
         -0.0033280153293162584,
         0.021542759612202644,
         -0.01260500680655241,
         0.033978041261434555,
         -0.03178206831216812,
         -0.025371814146637917,
         0.07174889743328094,
         -0.0024521711748093367,
         -0.09167266637086868,
         -0.046929117292165756,
         0.022732241079211235,
         0.02222401276230812,
         -0.024650216102600098,
         -0.04264489933848381,
         0.024509301409125328,
         -0.026767950505018234,
         0.09544091671705246,
         -0.06721024960279465,
         0.018102342262864113,
         -0.018531465902924538,
         -0.02721196413040161,
         0.005214688368141651,
         0.03094632364809513,
         -0.08467657119035721,
         0.006663993466645479,
         0.06828898191452026,
         -0.009517649188637733,
         -0.08511777967214584,
         -0.03374364972114563,
         -0.027803972363471985,
         0.023442445322871208,
         -0.0266878679394722,
         0.006919735576957464,
         0.010021806694567204,
         -0.036597177386283875,
         -0.00617715111002326,
         0.014031169936060905,
         0.0701993927359581,
         -0.0393521748483181,
         -0.007316326256841421,
         0.014301341958343983,
         0.02702433057129383,
         0.03956086188554764,
         0.060301244258880615,
         -0.055976178497076035,
         0.1338510662317276,
         0.001156043028458953,
         0.041097491979599,
         -0.14731338620185852,
         -0.0029199898708611727,
         -0.00013599869271274656,
         -0.0736226737499237,
         0.03325321152806282,
         -0.14085189998149872,
         0.03928329795598984,
         -0.011393381282687187,
         0.008337186649441719,
         0.022270601242780685,
         -0.06819078326225281,
         0.010874142870306969,
         -0.049424681812524796,
         0.019682565703988075,
         -0.010403553955256939,
         0.09375917166471481,
         0.02362806536257267,
         0.07171869277954102,
         0.020774055272340775,
         0.042299773544073105,
         -0.06543327867984772,
         0.11427047103643417,
         0.05618273466825485,
         -0.03619793802499771,
         -0.07144389301538467,
         5.301082792865114e-34,
         0.014501710422337055,
         -0.03433850780129433,
         0.008394746109843254,
         0.07597401738166809,
         0.10349489003419876,
         0.015405677258968353,
         -0.032848604023456573,
         -0.06884612143039703,
         -0.046885162591934204,
         -0.09671584516763687,
         -0.011314226314425468,
         -0.01856561005115509,
         -0.06512365490198135,
         -0.07238120585680008,
         -0.02506783977150917,
         -0.009671981446444988,
         -0.0677078366279602,
         -0.05653739720582962,
         -0.06995690613985062,
         -0.008146820589900017,
         -0.01214279793202877,
         0.059145353734493256,
         -0.00256781792268157,
         0.08436328917741776,
         -0.0045662252232432365,
         -0.07445189356803894,
         0.01798633486032486,
         0.060066550970077515,
         0.017383728176355362,
         0.04766349866986275,
         -0.015692079439759254,
         -0.04757498577237129,
         -0.02762548439204693,
         0.047303322702646255,
         0.07723086327314377,
         -0.07400372624397278,
         0.011420260183513165,
         -0.04891768470406532,
         -0.016991885378956795,
         0.026902154088020325,
         -0.04760833457112312,
         0.018312858417630196,
         -0.02989778108894825,
         0.0897020772099495,
         -0.04281701147556305,
         0.013710093684494495,
         0.0396006740629673,
         0.06410706043243408,
         0.08556067198514938,
         -0.04379606246948242,
         -0.07834725081920624,
         -0.06623218953609467,
         -0.030430499464273453,
         -0.005324682220816612,
         -0.034603726118803024,
         -0.062134772539138794,
         0.008219441398978233,
         0.04189149662852287,
         0.10299007594585419,
         0.021307796239852905,
         0.0607219822704792,
         -0.04500466585159302,
         -0.0028528186958283186,
         -0.06410374492406845,
         -0.0048947567120194435,
         0.028550991788506508,
         -0.021970335394144058,
         -0.006687256507575512,
         0.09578950703144073,
         -0.08069927245378494,
         0.002758170710876584,
         -0.026523113250732422,
         0.08033037930727005,
         0.013537789694964886,
         -0.03719128668308258,
         0.05603921413421631,
         0.020577840507030487,
         0.02021518349647522,
         -0.10423598438501358,
         -0.059956539422273636,
         -0.0928533598780632,
         -0.019149193540215492,
         0.008638947270810604,
         0.07607108354568481,
         0.023537373170256615,
         -0.03286019340157509,
         -0.029357632622122765,
         -0.06599190086126328,
         0.08896324038505554,
         -0.011197819374501705,
         0.019649725407361984,
         0.0985945537686348,
         0.006205311976373196,
         -0.13322098553180695,
         -0.015043631196022034,
         -1.1596729315441888e-34,
         -0.02202794700860977,
         0.022142373025417328,
         -0.0908736065030098,
         0.06232170760631561,
         0.02226484753191471,
         -0.03699196130037308,
         0.025422628968954086,
         0.03936171904206276,
         0.051816947758197784,
         0.01941952295601368,
         0.04169097915291786,
         -0.0668347030878067,
         0.028993966057896614,
         -0.04779044911265373,
         0.016057901084423065,
         0.11099212616682053,
         0.13915076851844788,
         0.04464653879404068,
         0.01808364875614643,
         0.0003248233115300536,
         -0.027428222820162773,
         0.03427209332585335,
         -0.11964283138513565,
         0.020802685990929604,
         -0.024637149646878242,
         0.04913446679711342,
         -0.03343263268470764,
         0.0007999022491276264,
         -0.0363985113799572,
         0.015618329867720604,
         -0.03916076198220253,
         -0.027130674570798874,
         0.030908452346920967,
         0.00839168019592762,
         -0.019726410508155823,
         0.06671995669603348,
         0.06294506788253784,
         -0.00662987632676959,
         -0.048772092908620834,
         0.10865209251642227,
         0.077969029545784,
         -0.03438835218548775,
         -0.016370991244912148,
         0.08795364946126938,
         -0.007750320713967085,
         -0.09498050808906555,
         -0.07556591928005219,
         0.10646194964647293,
         -0.0030609527602791786,
         -0.012251066043972969,
         0.05219857394695282,
         -0.03321979194879532,
         0.057967476546764374,
         -0.10663087666034698,
         0.032691169530153275,
         -0.009770980104804039,
         0.047311775386333466,
         -0.02411728724837303,
         0.05368872731924057,
         0.06182878091931343,
         0.07617446780204773,
         -0.05318167805671692,
         -0.033945482224226,
         0.03228505700826645,
         -0.007170077878981829,
         0.05959790572524071,
         -0.056909944862127304,
         -0.02985152043402195,
         0.006446316838264465,
         0.03801654651761055,
         0.012191289104521275,
         0.029834797605872154,
         -0.006095391698181629,
         -0.029733596369624138,
         -0.09887736290693283,
         0.009565076790750027,
         0.04332743212580681,
         0.042507629841566086,
         0.06287199258804321,
         -0.01998593844473362,
         -0.03811212256550789,
         -0.014080194756388664,
         0.039666227996349335,
         0.03266460821032524,
         0.07517889142036438,
         0.04624589905142784,
         0.05244888737797737,
         0.019929179921746254,
         0.02101832628250122,
         0.007519490085542202,
         0.06198029965162277,
         0.023592155426740646,
         0.04938758164644241,
         0.027339544147253036,
         -0.01008431427180767,
         -1.4285190808038806e-08,
         0.030376587063074112,
         -0.02963241934776306,
         -0.035167571157217026,
         0.02413598634302616,
         0.0570375956594944,
         0.007684706710278988,
         0.12187618762254715,
         -0.007570839952677488,
         0.029319867491722107,
         0.06720910966396332,
         0.024405328556895256,
         -0.011419138871133327,
         0.03922741860151291,
         0.024336550384759903,
         0.04098387807607651,
         0.03207016363739967,
         -0.008450492285192013,
         0.1041002869606018,
         -0.03652212396264076,
         0.010552185587584972,
         -0.049762122333049774,
         0.06643325090408325,
         -0.04128921404480934,
         -0.05123789608478546,
         -0.029389763250947,
         0.0248995590955019,
         -0.04405771195888519,
         0.1402818262577057,
         0.014684601686894894,
         -0.009909572079777718,
         0.010877342894673347,
         0.005315002519637346,
         0.00048737594624981284,
         -0.04477892816066742,
         -0.06588546186685562,
         0.005400381051003933,
         -0.02504221349954605,
         -0.010384864173829556,
         -0.02279285155236721,
         0.006243698764592409,
         -0.059665076434612274,
         0.024622157216072083,
         0.08627490699291229,
         0.044212888926267624,
         -0.02827167697250843,
         -0.019425155594944954,
         -0.022057976573705673,
         -0.03141951560974121,
         0.043426185846328735,
         0.018655214458703995,
         0.07349660992622375,
         0.028337983414530754,
         0.018872670829296112,
         0.07257463783025742,
         0.003528063651174307,
         -0.010571202263236046,
         -0.01876663975417614,
         0.02528848499059677,
         -0.13014712929725647,
         -0.061667099595069885,
         0.013025691732764244,
         0.00994929950684309,
         -0.007341751828789711,
         -0.06776775419712067
      ]
   }
]

我正在尝试让 ChatGPT 理解它是什么,并用 C# 生成一个 conde,它使用像参数一样的字符串来通过这些向量搜索附近的字符串。

示例:

var drive = searchInEmbeddings(“sena”);

并返回艾尔顿·塞纳。

这里有人刚刚做了类似的东西,可以帮助我吗?

我提到的Python代码是:

import json
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np


with open('embeddings.json', 'r') as f:
    data = json.load(f)


model = SentenceTransformer('all-MiniLM-L6-v2')


def find_most_similar(descriptions, query, top_n=3):
    query_embedding = model.encode([query])
    description_embeddings = np.array([desc['embedding'] for desc in descriptions])
    
    
    similarities = cosine_similarity(query_embedding, description_embeddings)[0]
    
    
    top_indices = np.argsort(similarities)[-top_n:][::-1]
    
    
    return [{"codigo": descriptions[i]['codigo'], "descricao": descriptions[i]['descricao']} for i in top_indices]


query = "sena"
top_matches = find_most_similar(data, query, top_n=3)
print("Question:", query , "\Answers: \nNearest occurrences:\n", top_matches)

并返回:

$ python search.py
Question: sena
Answers:
Nearest occurrences:
 [{'codigo': 3, 'descricao': 'Ayrton Senna'}, {'codigo': 31, 'descricao': 'Niki Lauda'}, {'codigo': 21, 'descricao': 'Kimi Räikkönen'}]

感谢您的帮助。

python c# .net word-embedding
1个回答
0
投票

如果您的问题是,“我怎样才能让 ChatGPT 帮我为此编写 C# 代码...”您可以问它:

请用 C# 编写一个小函数,将字符串到单词嵌入的映射和字符串“a”作为输入,并返回字符串“a”的最近邻居。

接下来你还可以问:

好的,请编写一个 C# 函数,从输入文件中读取字符串到单词嵌入的映射。

对于第一个问题,ChatGPT3.5 将生成不完全正确的代码,但很容易纠正。

© www.soinside.com 2019 - 2024. All rights reserved.