简单地打印单词向量,而不是以数组的形式获取。

问题描述 投票:-1回答:1

一个非常简单的任务,但我似乎做不到。我想获得这样的矢量。

the -0.038194 -0.24487 0.72812 -0.39961 0.083172 0.043953 -0.39141 0.3344 -0.57545 0.087459 0.28787 -0.06731 0.30906 -0.26384 -0.13231 -0.20757 0.33395 -0.33848 -0.31743 -0.48336 0.1464 -0.37304 0.34577 0.052041 0.44946 -0.46971 0.02628 -0.54155 -0.15518 -0.14107 -0.039722 0.28277 0.14393 0.23464 -0.31021 0.086173 0.20397 0.52624 0.17164 -0.082378 -0.71787 -0.41531 0.20335 -0.12763 0.41367 0.55187 0.57908 -0.33477 -0.36559 -0.54857 -0.062892 0.26584 0.30205 0.99775 -0.80481 -3.0243 0.01254 -0.36942 2.2167 0.72201 -0.24978 0.92136 0.034514 0.46745 1.1079 -0.19358 -0.074575 0.23353 -0.052062 -0.22044 0.057162 -0.15806 -0.30798 -0.41625 0.37972 0.15006 -0.53212 -0.2055 -1.2526 0.071624 0.70565 0.49744 -0.42063 0.26148 -1.538 -0.30223 -0.073438 -0.28312 0.37104 -0.25217 0.016215 -0.017099 -0.38984 0.87424 -0.72569 -0.51058 -0.52028 -0.1459 0.8278 0.27062
, -0.10767 0.11053 0.59812 -0.54361 0.67396 0.10663 0.038867 0.35481 0.06351 -0.094189 0.15786 -0.81665 0.14172 0.21939 0.58505 -0.52158 0.22783 -0.16642 -0.68228 0.3587 0.42568 0.19021 0.91963 0.57555 0.46185 0.42363 -0.095399 -0.42749 -0.16567 -0.056842 -0.29595 0.26037 -0.26606 -0.070404 -0.27662 0.15821 0.69825 0.43081 0.27952 -0.45437 -0.33801 -0.58184 0.22364 -0.5778 -0.26862 -0.20425 0.56394 -0.58524 -0.14365 -0.64218 0.0054697 -0.35248 0.16162 1.1796 -0.47674 -2.7553 -0.1321 -0.047729 1.0655 1.1034 -0.2208 0.18669 0.13177 0.15117 0.7131 -0.35215 0.91348 0.61783 0.70992 0.23955 -0.14571 -0.37859 -0.045959 -0.47368 0.2385 0.20536 -0.18996 0.32507 -1.1112 -0.36341 0.98679 -0.084776 -0.54008 0.11726 -1.0194 -0.24424 0.12771 0.013884 0.080374 -0.35414 0.34951 -0.7226 0.37549 0.4441 -0.99059 0.61214 -0.35111 -0.83155 0.45293 0.082577
. -0.33979 0.20941 0.46348 -0.64792 -0.38377 0.038034 0.17127 0.15978 0.46619 -0.019169 0.41479 -0.34349 0.26872 0.04464 0.42131 -0.41032 0.15459 0.022239 -0.64653 0.25256 0.043136 -0.19445 0.46516 0.45651 0.68588 0.091295 0.21875 -0.70351 0.16785 -0.35079 -0.12634 0.66384 -0.2582 0.036542 -0.13605 0.40253 0.14289 0.38132 -0.12283 -0.45886 -0.25282 -0.30432 -0.11215 -0.26182 -0.22482 -0.44554 0.2991 -0.85612 -0.14503 -0.49086 0.0082973 -0.17491 0.27524 1.4401 -0.21239 -2.8435 -0.27958 -0.45722 1.6386 0.78808 -0.55262 0.65 0.086426 0.39012 1.0632 -0.35379 0.48328 0.346 0.84174 0.098707 -0.24213 -0.27053 0.045287 -0.40147 0.11395 0.0062226 0.036673 0.018518 -1.0213 -0.20806 0.64072 -0.068763 -0.58635 0.33476 -1.1432 -0.1148 -0.25091 -0.45907 -0.096819 -0.17946 -0.063351 -0.67412 -0.068895 0.53604 -0.87773 0.31802 -0.39242 -0.23394 0.47298 -0.028803
of -0.1529 -0.24279 0.89837 0.16996 0.53516 0.48784 -0.58826 -0.17982 -1.3581 0.42541 0.15377 0.24215 0.13474 0.41193 0.67043 -0.56418 0.42985 -0.012183 -0.11677 0.31781 0.054177 -0.054273 0.35516 -0.30241 0.31434 -0.33846 0.71715 -0.26855 -0.15837 -0.47467 0.051581 -0.33252 0.15003 -0.1299 -0.54617 -0.37843 0.64261 0.82187 -0.080006 0.078479 -0.96976 -0.57741 0.56491 -0.39873 -0.057099 0.19743 0.065706 -0.48092 -0.20125 -0.40834 0.39456 -0.02642 -0.11838 1.012 -0.53171 -2.7474 -0.042981 -0.74849 1.7574 0.59085 0.04885 0.78267 0.38497 0.42097 0.67882 0.10337 0.6328 -0.026595 0.58647 -0.44332 0.33057 -0.12022 -0.55645 0.073611 0.20915 0.43395 -0.012761 0.089874 -1.7991 0.084808 0.77112 0.63105 -0.90685 0.60326 -1.7515 0.18596 -0.50687 -0.70203 0.66578 -0.81304 0.18712 -0.018488 -0.26757 0.727 -0.59363 -0.34839 -0.56094 -0.591 1.0039 0.20664
to -0.1897 0.050024 0.19084 -0.049184 -0.089737 0.21006 -0.54952 0.098377 -0.20135 0.34241 -0.092677 0.161 -0.13268 -0.2816 0.18737 -0.42959 0.96039 0.13972 -1.0781 0.40518 0.50539 -0.55064 0.4844 0.38044 -0.0029055 -0.34942 -0.099696 -0.78368 1.0363 -0.2314 -0.47121 0.57126 -0.21454 0.35958 -0.48319 1.0875 0.28524 0.12447 -0.039248 -0.076732 -0.76343 -0.32409 -0.5749 -1.0893 -0.41811 0.4512 0.12112 -0.51367 -0.13349 -1.1378 -0.28768 0.16774 0.55804 1.5387 0.018859 -2.9721 -0.24216 -0.92495 2.1992 0.28234 -0.3478 0.51621 -0.43387 0.36852 0.74573 0.072102 0.27931 0.92569 -0.050336 -0.85856 -0.1358 -0.92551 -0.33991 -1.0394 -0.067203 -0.21379 -0.4769 0.21377 -0.84008 0.052536 0.59298 0.29604 -0.67644 0.13916 -1.5504 -0.20765 0.7222 0.52056 -0.076221 -0.15194 -0.13134 0.058617 -0.31869 -0.61419 -0.62393 -0.41548 -0.038175 -0.39804 0.47647 -0.15983
and -0.071953 0.23127 0.023731 -0.50638 0.33923 0.1959 -0.32943 0.18364 -0.18057 0.28963 0.20448 -0.5496 0.27399 0.58327 0.20468 -0.49228 0.19974 -0.070237 -0.88049 0.29485 0.14071 -0.1009 0.99449 0.36973 0.44554 0.28998 -0.1376 -0.56365 -0.029365 -0.4122 -0.25269 0.63181 -0.44767 0.24363 -0.10813 0.25164 0.46967 0.3755 -0.23613 -0.14129 -0.44537 -0.65737 -0.042421 -0.28636 -0.28811 0.063766 0.20281 -0.53542 0.41307 -0.59722 -0.38614 0.19389 -0.17809 1.6618 -0.011819 -2.3737 0.058427 -0.2698 1.2823 0.81925 -0.22322 0.72932 -0.053211 0.43507 0.85011 -0.42935 0.92664 0.39051 1.0585 -0.24561 -0.18265 -0.5328 0.059518 -0.66019 0.18991 0.28836 -0.2434 0.52784 -0.65762 -0.14081 1.0491 0.5134 -0.23816 0.69895 -1.4813 -0.2487 -0.17936 -0.059137 -0.08056 -0.48782 0.014487 -0.6259 -0.32367 0.41862 -1.0807 0.46742 -0.49931 -0.71895 0.86894 0.19539
in 0.085703 -0.22201 0.16569 0.13373 0.38239 0.35401 0.01287 0.22461 -0.43817 0.50164 -0.35874 -0.34983 0.055156 0.69648 -0.17958 0.067926 0.39101 0.16039 -0.26635 -0.21138 0.53698 0.49379 0.9366 0.66902 0.21793 -0.46642 0.22383 -0.36204 -0.17656 0.1748 -0.20367 0.13931 0.019832 -0.10413 -0.20244 0.55003 -0.1546 0.98655 -0.26863 -0.2909 -0.32866 -0.34188 -0.16943 -0.42001 -0.046727 -0.16327 0.70824 -0.74911 -0.091559 -0.96178 -0.19747 0.10282 0.55221 1.3816 -0.65636 -3.2502 -0.31556 -1.2055 1.7709 0.4026 -0.79827 1.1597 -0.33042 0.31382 0.77386 0.22595 0.52471 -0.034053 0.32048 0.079948 0.17752 -0.49426 -0.70045 -0.44569 0.17244 0.20278 0.023292 -0.20677 -1.0158 0.18325 0.56752 0.31821 -0.65011 0.68277 -0.86585 -0.059392 -0.29264 -0.55668 -0.34705 -0.32895 0.40215 -0.12746 -0.20228 0.87368 -0.545 0.79205 -0.20695 -0.074273 0.75808 -0.34243
a -0.27086 0.044006 -0.02026 -0.17395 0.6444 0.71213 0.3551 0.47138 -0.29637 0.54427 -0.72294 -0.0047612 0.040611 0.043236 0.29729 0.10725 0.40156 -0.53662 0.033382 0.067396 0.64556 -0.085523 0.14103 0.094539 0.74947 -0.194 -0.68739 -0.41741 -0.22807 0.12 -0.48999 0.80945 0.045138 -0.11898 0.20161 0.39276 -0.20121 0.31354 0.75304 0.25907 -0.11566 -0.029319 0.93499 -0.36067 0.5242 0.23706 0.52715 0.22869 -0.51958 -0.79349 -0.20368 -0.50187 0.18748 0.94282 -0.44834 -3.6792 0.044183 -0.26751 2.1997 0.241 -0.033425 0.69553 -0.64472 -0.0072277 0.89575 0.20015 0.46493 0.61933 -0.1066 0.08691 -0.4623 0.18262 -0.15849 0.020791 0.19373 0.063426 -0.31673 -0.48177 -1.3848 0.13669 0.96859 0.049965 -0.2738 -0.035686 -1.0577 -0.24467 0.90366 -0.12442 0.080776 -0.83401 0.57201 0.088945 -0.42532 -0.018253 -0.079995 -0.28581 -0.01089 -0.4923 0.63687 0.23642
" -0.30457 -0.23645 0.17576 -0.72854 -0.28343 -0.2564 0.26587 0.025309 -0.074775 -0.3766 -0.057774 0.12159 0.34384 0.41928 -0.23236 -0.31547 0.60939 0.25117 -0.68667 0.70873 1.2162 -0.1824 -0.48442 -0.33445 0.30343 1.086 0.49992 -0.20198 0.27959 0.68352 -0.33566 -0.12405 0.059656 0.33617 0.37501 0.56552 0.44867 0.11284 -0.16196 -0.94346 -0.67961 0.18581 0.060653 0.43776 0.13834 -0.48207 -0.56141 -0.25422 -0.52445 0.097003 -0.48925 0.19077 0.21481 1.4969 -0.86665 -3.2846 0.56854 0.41971 1.2294 0.78522 -0.29369 0.63803 -1.5926 -0.20437 1.5306 0.13548 0.50722 0.18742 0.48552 -0.28995 0.19573 0.0046515 0.092879 -0.42444 0.64987 0.52839 0.077908 0.8263 -1.2208 -0.34955 0.49855 -0.64155 -0.72308 0.26566 -1.3643 -0.46364 -0.52048 -1.0525 0.22895 -0.3456 -0.658 -0.16735 0.35158 0.74337 0.26074 0.061104 -0.39079 -0.84557 -0.035432 0.17036
's 0.58854 -0.2025 0.73479 -0.68338 -0.19675 -0.1802 -0.39177 0.34172 -0.60561 0.63816 -0.26695 0.36486 -0.40379 -0.1134 -0.58718 0.2838 0.8025 -0.35303 0.30083 0.078935 0.44416 -0.45906 0.79294 0.50365 0.32805 0.28027 -0.4933 -0.38482 -0.039284 -0.2483 -0.1988 1.1469 0.13228 0.91691 -0.36739 0.89425 0.5426 0.61738 -0.62205 -0.31132 -0.50933 0.23335 1.0826 -0.044637 -0.12767 0.27628 -0.032617 -0.27397 0.77764 -0.50861 0.038307 -0.33679 0.42344 1.2271 -0.53826 -3.2411 0.42626 0.025189 1.3948 0.65085 0.03325 0.37141 0.4044 0.35558 0.98265 -0.61724 0.53901 0.76219 0.30689 0.33065 0.30956 -0.15161 -0.11313 -0.81281 0.6145 -0.44341 -0.19163 -0.089551 -1.5927 0.37405 0.85857 0.54613 -0.31928 0.52598 -1.4802 -0.97931 -0.2939 -0.14724 0.25803 -0.1817 1.0149 0.77649 0.12598 0.54779 -1.0316 0.064599 -0.37523 -0.94475 0.61802 0.39591
for -0.14401 0.32554 0.14257 -0.099227 0.72536 0.19321 -0.24188 0.20223 -0.89599 0.15215 0.035963 -0.59513 -0.051635 -0.014428 0.35475 -0.31859 0.76984 -0.087369 -0.24762 0.65059 -0.15138 -0.42703 0.18813 0.091562 0.15192 0.11303 -0.15222 -0.62786 -0.23923 0.096009 -0.46147 0.41526 -0.30475 0.1371 0.16758 0.53301 -0.043658 0.85924 -0.41192 -0.21394 -0.51228 -0.31945 0.12662 -0.3151 0.0031429 0.27129 0.17328 -1.3159 -0.42414 -0.69126 0.019017 -0.13375 -0.096057 1.7069 -0.65291 -2.6111 0.26518 -0.61178 2.095 0.38148 -0.55823 0.2036 -0.33704 0.37354 0.6951 -0.001637 0.81885 0.51793 0.27746 -0.37177 -0.43345 -0.42732 -0.54912 -0.30715 0.18101 0.2709 -0.29266 0.30834 -1.4624 -0.18999 0.92277 -0.099217 -0.25165 0.49197 -1.525 0.15326 0.2827 0.12102 -0.36766 -0.61275 -0.18884 0.10907 0.12315 0.090066 -0.65447 -0.17252 2.6336e-05 0.25398 1.1078 -0.073074

这是文本文件的完整链接 这样你就可以看到完整的格式。https:/www.kaggle.comterenceliu4444glove6b100dtxt这是我的代码。

with codecs.open('data/{}.tsv'.format(lcode), 'w', 'utf-8') as fout:
        for i, word in enumerate(model.index2word):
            fout.write(u"{}\t{}\t{}\n".format(str(i), word.encode('utf8').decode('utf8'),
                                              np.array_str(model[word])
                                              ))

我的输出是这样的

the [ 0.28177965 -1.3835016  -0.85463244  0.5744817  -0.42041674  0.4850773
 -0.18238722  0.9088641   1.6516979  -0.24690722  0.5303408  -0.8106607
  0.27385864  0.6186187  -2.061754    1.2491482   0.44255176 -0.25498274
 -0.11942661 -0.1751283   0.2187617  -1.2942451  -0.79252934  1.8655167
 -1.4975996  -0.02266688  0.26935738 -0.36034968 -1.5055205   0.0860498
  1.0129709   1.1270534  -1.3774556  -0.02182451 -0.52671534 -0.7581365
 -0.16326018 -0.2763609   0.5690212  -1.355627    0.43560007  2.4623177
 -0.46482357  0.85816354 -0.5735287  -0.99033487  0.646639   -0.18928614
 -0.6105273  -0.94887084 -0.39465773  0.38946334 -0.5338978  -0.0211645
 -0.06462063  1.1689087  -0.88438195 -0.60245454  1.0320075   0.75902534
 -1.9108475  -0.8921491   0.57644296  1.8618042  -0.5125161  -1.4219466
  0.45342374  0.25558227  1.0577608   0.48511812  0.76758397 -1.0726306
  1.5792096   0.01924564  1.8321682  -0.4707404  -0.41836467  0.07758982
  0.50893927 -0.71105474 -0.33766833  1.4899743   0.60877067 -0.09521568
  0.6654671  -0.0286361  -0.17863822  0.8811929  -1.330545   -1.104361
  0.51000476  0.2639544   1.2233694  -0.10699744 -1.1367066   0.6225027
  0.5847332  -0.03609625  2.3312287   0.1025821 ]
a   [ 1.0829129   0.84877855 -1.1785074   0.13858096  2.008711    0.44480678
  0.41152284 -0.9221507   1.5342509   0.8937895  -0.12867515  1.2286083
 -1.6460084   0.96246207  0.11606621 -0.7079361   0.7204446  -2.17121
  0.21708168 -1.029137   -0.53540015  0.40489924 -0.52271795  1.7237337
 -1.3921518  -1.4322941   1.392808    0.7498414  -1.4813395   1.655896
  1.0292306  -0.10302904 -0.09161732  0.9659639   0.13209064 -0.5149641
 -0.11515223 -1.6309028  -1.1918032   0.34248984  0.6209429   1.0181456
 -0.65688735  0.80660087 -0.6315423  -0.68773484 -0.44171524 -0.8294182
  0.62340856 -1.0040073   0.40221572 -0.30175862  0.02053229  0.31205446
 -0.16386059  0.18476132  0.18067418 -0.28932625  1.0893115   0.11680666
  0.1104597  -0.30494598 -0.06541535  0.75524884  3.3038845  -0.5918715
  1.0957772  -0.51271206  1.3486993   0.6190552  -1.365369   -2.9811475
  1.3973937   1.4510086   0.45045042  0.61286205  1.7809817  -0.639005
 -0.22986257 -1.4068168   0.34073976  0.38807136 -0.10908178 -0.9710727
 -0.2207968   0.66323316  2.2619925   1.8806032  -0.06102083  0.86097974
 -0.07785034  0.3742449   1.800688   -0.92509884  0.1773087   0.38380435
  0.44551063  0.5976865   1.8766458   0.23904268]

我试着从代码中删除数组 但还是不能以那种格式打印出那些单词向量. 顺便说一下,我使用Genism来获取向量。

python word2vec
1个回答
0
投票

这看起来就像已经被写出来的格式(减去一个前导行)一样,由 .save_word2vec_format() 办法 gensim 字向量类。

你应该尝试使用它,也许用 write_first_line 参数为 False,或者干脆在事后编辑文件。比如说

model.save_word2vec_format(your_filename, binary=False, write_first_line=False)

(我还想说的是: 你的示例格式和这种方法, 只会在字段之间使用单个空格, 而不是**制表符, 所以你现有的文件需要的是 .tsv 为'tab-separated-values',会引起误解。)

© www.soinside.com 2019 - 2024. All rights reserved.