有没有办法使用 R 和 rvest 从 Etherscan.io 抓取 iframe?

问题描述 投票:0回答:1

我正在尝试从以下 URL 中抓取信息:https://etherscan.io/token/0xdac17f958d2ee523a2206206994597c13d831ec7#balances

具体是页面下半部分的表格。我想要图中所示的“复制地址”按钮的内容:

我的尝试主要归结为以下几点:

library(rvest)
library(dplyr)

  page = read_html("https://etherscan.io/token/0xdac17f958d2ee523a2206206994597c13d831ec7#balances")
  x = page %>% html_nodes(".js-clipboard") %>% html_attr("data-clipboard-text")

我知道该表是一个 iframe,本质上是另一个 html 文件。但我很难访问该 html 的 URL。当我检查html页面以手动提取该iframe URL时(/token/generic-tokenholders2m=dim&a=0xdac17f958d2ee523a2206206994597c13d831ec7&s=39025187376288180&sid=e88ba71b362fc00233af8a8db21 1da32&p=1) 我还是得不到我想要的内容

我的猜测是,我对 html 结构的挖掘不够深入,无法找到我想要的东西,非常感谢任何建议或帮助。

html r web-scraping iframe rvest
1个回答
0
投票

事实证明有一个嵌入式表格,它不会使用Javascript加载地址!

library(rvest)

url = "https://etherscan.io/token/generic-tokenholders2?&a=0xdac17f958d2ee523a2206206994597c13d831ec7&s=39025187376288180&p=1" 

# the html_nodes gets the elements with class js-clipboard, then html_attr gets the data-clipboard-text attribute for each of those
# it would be straightforward to get the names too, if you want. But also, a good challenge for you, too! :-D
url |> read_html() |> html_nodes(".js-clipboard") |> html_attr("data-clipboard-text")

输出:

[1] "0xF977814e90dA44bFA03b6295A0616a897441aceC"
 [2] "0x47ac0Fb4F2D84898e4D9E7b4DaB3C24507a6D503"
 [3] "0xA7A93fd0a276fc1C0197a5B5623eD117786eeD06"
 [4] "0xcEe284F754E854890e311e3280b767F80797180d"
 [5] "0xD6216fC19DB775Df9774a6E33526131dA7D19a2c"
 [6] "0x40ec5B33f54e0E8A33A975908C5BA1c14e5BbbDf"
 [7] "0x5754284f345afc66a98fbB0a0Afe71e0F007B949"
 [8] "0x3CC936b795A188F0e246cBB2D74C5Bd190aeCF18"
 [9] "0x28C6c06298d514Db089934071355E5743bf21d60"
[10] "0xc5451b523d5FFfe1351337a221688a62806ad91a"
[11] "0x6Fb624B48d9299674022a23d92515e76Ba880113"
[12] "0x461249076B88189f8AC9418De28B365859E46BfD"
[13] "0xc708A1c712bA26DC618f972ad7A187F76C8596Fd"
[14] "0x69a722f0B5Da3aF02b4a205D6F0c285F4ed8F396"
[15] "0x42436286A9c8d63AAfC2eEbBCA193064d68068f2"
[16] "0xCbA38020cd7B6F51Df6AFaf507685aDd148F6ab6"
[17] "0x5a52E96BAcdaBb82fd05763E25335261B270Efcb"
[18] "0x89e51fA8CA5D66cd220bAed62ED01e8951aa7c40"
[19] "0x0D0707963952f2fBA59dD06f2b425ace40b492Fe"
[20] "0xf59869753f41Db720127Ceb8DbB8afAF89030De4"
[21] "0x65A0947BA5175359Bb457D3b34491eDf4cBF7997"
[22] "0xe9172Daf64b05B26eb18f07aC8d6D723aCB48f99"
[23] "0x4D19C0a5357bC48be0017095d3C871D9aFC3F21d"
[24] "0x5C52cC7c96bDE8594e5B77D5b76d042CB5FaE5f2"
[25] "0x99C9fc46f92E8a1c0deC1b1747d010903E884bE1"
[26] "0x0162Cd2BA40E23378Bf0FD41f919E1be075f025F"
[27] "0x68841a1806fF291314946EebD0cdA8b348E73d6D"
[28] "0x96FDC631F02207B72e5804428DeE274cF2aC0bCD"
[29] "0xBDa23B750dD04F792ad365B5F2a6F1d8593796f2"
[30] "0x7eb6c83AB7D8D9B8618c0Ed973cbEF71d1921EF2"
[31] "0x06d3a30cBb00660B85a30988D197B1c282c6dCB6"
[32] "0x3D55CCb2a943d88D39dd2E62DAf767C69fD0179F"
[33] "0x9723b6d608D4841eB4Ab131687a5D4764eb30138"
[34] "0x313Eb1C5e1970EB5CEEF6AEbad66b07c7338d369"
[35] "0x276cdBa3a39aBF9cEdBa0F1948312c0681E6D5Fd"
[36] "0x5041ed759Dd4aFc3a72b8192C143F72f4724081A"
[37] "0xc7C8f8284c5360D0086a2f0A05BdD07AFdE23246"
[38] "0xbEbc44782C7dB0a1A60Cb6fe97d0b483032FF1C7"
[39] "0xee5B5B923fFcE93A870B3104b7CA09c3db80047A"
[40] "0xEEA81C4416d71CeF071224611359F6F99A4c4294"
[41] "0xDD47B8411c2fe553fDDBE8E43099ab5C89B0bB25"
[42] "0x55C11477577636024F8c4e776CdA758c6f81cDaf"
[43] "0x4Ee7bBc295A090aD0F6db12fe7eE4dC8de896400"
[44] "0x77134cbC06cB00b66F4c7e623D5fdBF6777635EC"
[45] "0x8A446971dbB112f3be15bc38C14D44B94D9E94b9"
[46] "0x661Be0562b31E9E8DdC2A7c93803005A1C71D749"
[47] "0x2060ca911d52Ac785484B332480B7329765268aA"
[48] "0x82E1d4DDd636857Ebcf6a0e74B9b0929C158D7FB"
[49] "0x3fe705e2FFcaEe8d7287de047DeF35Db3e794C76"
[50] "0x8558FE88F8439dDcd7453ccAd6671Dfd90657a32"
© www.soinside.com 2019 - 2024. All rights reserved.