在 Rselenium 或 python 中的 selenium 中模拟滚动

问题描述 投票:0回答:1

我正在尝试抓取这个网站。您需要点击搜索栏中的放大镜图标才能看到我要提取的记录。问题是该网站是动态的,我需要多次滚动才能加载整个页面,然后我可以使用

rvest
BeautifulSoap
提取内容但是,到目前为止,线程中的滚动方法都不适合我.

如果可以使用任何包或库在 R 或 Python 中找到解决方案,我将不胜感激。

我试过了

remDr$executeScript("window.scrollTo(0,document.body.scrollHeight);")

其中

remDr
是点击放大镜图标后的页面

我还尝试定义搜索结果,在其中检查页面并提取可以引导到项目列表的 xpath

search_results <- remDr$findElement( using = 'xpath', '//*[@id="search-feature-container"]/div[2]/div[2]/div[3]/div[2]/div[1]' )

然后我运行了这一行,但根本没有滚动:(

search_results$sendKeysToElement(list(key = "down"))

python r selenium-webdriver web-scraping rselenium
1个回答
0
投票

该信息通过 XHR 调用动态地输入到页面中,您可以在浏览器的开发工具 - 网络选项卡中看到。

这是获取所有研究数据的一种方法:

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import json
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

big_df = pd.DataFrame()
s = requests.Session()
s.headers.update(headers)
for x in range(0, 8000, 1000):
    r = s.get(f'https://vivli-prod-cus-srch.search.windows.net/indexes/studies/docs?api-key=C8237BFE70B9CC48489DC7DD84D88379&api-version=2016-09-01&$top=1000&$skip={x}&search=*&$filter=assignedAppType%20eq%20%27Default%27&$count=true&facet=studyDesign&facet=locationsOfStudySites,count:300,sort:value&facet=sponsorType&facet=contributorType&facet=sponsorName,count:500,sort:value&facet=studyType&facet=actualEnrollment,interval:100')
    df = pd.json_normalize(r.json()['value'])
    big_df = pd.concat([big_df, df])
print(big_df)

终端结果(仅限前两行,数据框中有 7K+ 条记录):

    @search.score   id  title   sponsorProtocolId   orgId   orgCode orgName irpOrgName  sponsorName overrideDisplayDefaults nctId   secondaryIds    acronym participantTermCodes    participantTerms    interventionTermCodes   interventionTerms   outcomeTermCodes    outcomeTerms    searchParticipantTermCodes  searchOutcomeTermCodes  searchInterventionTermCodes actualEnrollment    locationsOfStudySites   studyType   studyDesign principalInvestigator   studyStartDate  studyEndDate    sponsorType contributorType studyDoi    phase   conditions  interventionNames   outcomeNames    extractedConditions extractedInterventions  antimicrobials  groupingsOfResistancePatterns   organisms   specimenSources sampleTimes countries   regions yearsDataCollected  containsPediatrics  containsGenotype    assignedAppType numberOfIsolates    program lastUpdatedDate
0   1.0 abd778c4-21ed-4063-9e34-e3e7b177db18    A Randomized, Double-Blind, Parallel-Group, Dose-Response Study to Evaluate the Efficacy and Safety of Two Doses of Topiramate Compared to Placebo and Propranolol in the Prophylaxis of Migraine   CR003205    d1bd067d-3e2d-43b5-80f1-6235e85c2876    JNJ Johnson & Johnson   Yoda Project    Johnson & Johnson Pharmaceutical Research & Development, L.L.C. N   NCT00236561 []      [lr5qxyw6ww35, kk05h7rpym8w, kk05h7rpym8x, kk05h7rpym8y, kk05h7rpym8z, kk05h7rpym90, kk05h7rpym91, r4hp3896n2zy]    [Male and Female, Child 6-12 years, Adolescent 13-18 years, Young Adult 19-24 years, Adult 19-44 years, Middle Aged 45-64 years, Aged 65-79 years, Migraine]    [kn3ptfq7c6lz, r4hp0qywwn28, 11g43clqdpk96, r4hp0r5sbtj7, q25gz0m8n54j, r4hp0r2dwmn5]   [Pharmacological, Topiramate, Oral, Propranolol, No active treatment, Placebo]  [q25g9q497cwj, r4hp3896n2zy, r4hp5zkjq0c3, ZxM7N2m9kOhRe2]  [Physiological or clinical, Migraine, Evaluating Response To Treatment, Assessment Of Quality Of Life]  [lr5qxyw6ww35, kk05h7rpym8w, pwhpjmwdbgkh, kk05h7rpym8x, kk05h7rpym8y, pwhpjmwdbgkg, kk05h7rpym8z, kk05h7rpym90, kk05h7rpym91, pwhpjmwdbgkf, r4hp3896n2zy, r4hp3p8ymhbg, r4hp38gs74r1, r4hp3885vk99, r4hp38mgkgb9, r4hp39w4k8tw, r4hp38c875ch, r4hp38mgkgj7, r4hp38xpp96f, r4hp3853gyf1, r4hp38l4pbqh, r4hp39krwnf7, r4hp38qpgvxq, r4hp387wrzbr, r4hp38mrn1cp, r4hp39tp4ckr, r4hp38819rxs, r4hp39mjd4qj, r4hp39cb1vjv]  [q25g9q497cwj, r4hp3896n2zy, r4hp3p8ymhbg, r4hp38gs74r1, r4hp3885vk99, r4hp38mgkgb9, r4hp39w4k8tw, r4hp38c875ch, r4hp38mgkgj7, r4hp38xpp96f, r4hp3853gyf1, r4hp38l4pbqh, r4hp39krwnf7, r4hp38qpgvxq, r4hp387wrzbr, r4hp38mrn1cp, r4hp39tp4ckr, r4hp38819rxs, r4hp39mjd4qj, r4hp39cb1vjv, r4hp5zkjq0c3, r4hp5zjccp22, r4hp5zjccp1z, r4hp5zm4npzj, r4hp5zhs6j1c, zPNWxozYM3fxBr, r4hp5zjng89p, r4hp5yw4mj85, ZxM7N2m9kOhRe2, 3BgZRR0YwkHzkP]  [kn3ptfq7c6lz, r4hp0qywwn28, r4hp13n1ty7w, r4hp13rf9486, 11g43clqdpk96, r4hp0r5sbtj7, zrcts8tmxp0g, r4hp13n1ty7r, r4hp13mrrc91, r4hp13mrrc8c, r4hp13mgns4j, r4hp13mrrc83, q25gz0m8n54j, r4hp0r2dwmn5, PXmmxKGR3ocNEg]   786 []  Interventional  ParallelGroup       2001-04-01T00:00:00Z    2002-12-31T00:00:00Z    Industry    Unassigned  https://doi.org/10.25934/00004657   Phase3  [Migraine]  [Topiramate, Propranolol, Placebo]  [Migraine, Evaluating Response To Treatment, Migraine, Assessment Of Quality Of Life]   [Migraine, Common Migraine, Classic Migraine, Headache] [topiramate, propranolol]   []  []  []  []  []  []  []  []  None    None    Default 0       
1   1.0 48c15b9e-76d7-45cc-a044-6c253da74ac1    A Phase 3, Randomized, Open-label, Parallel-group, Multicenter Trial to Evaluate the Safety and Efficacy of Infliximab (REMICADE�) in Pediatric Subjects With Moderately to Severely Active Ulcerative Colitis  CR012388    d1bd067d-3e2d-43b5-80f1-6235e85c2876    JNJ Johnson & Johnson   Yoda Project    Centocor, Inc.  N   NCT00336492 [C0168T72]      [lr5qxyw6ww35, kk05h7rpym8v, kk05h7rpym8w, kk05h7rpym8x, r4hp3q5y2klm]  [Male and Female, Child, Preschool 2-5 years, Child 6-12 years, Adolescent 13-18 years, Acute Ulcerative Colitis]   [kn3ptfq7c6lz, r4hp13l4sngc, 11g43clqdpk72] [Pharmacological, Infliximab, Intravenous]  [q25g9q497cwj, r4hp5zkjq0c3, r4hp5zfl2n7g]  [Physiological or clinical, Evaluating Response To Treatment, Activity Analysis]    [lr5qxyw6ww35, kk05h7rpym8v, pwhpjmwdbgkh, kk05h7rpym8w, kk05h7rpym8x, r4hp3q5y2klm, r4hp384nvkyl, r4hp39vkd3t3, r4hp39lc1tgs, r4hp39hf705k, r4hp39kgt2jy, r4hp38mgkgb9, r4hp39w4k8tw, r4hp38c875ch, r4hp38mgkgj7, r4hp38jd5vlp, r4hp38gxry6k, r4hp38bczf6g, r4hp38yky17z, r4hp38z9n01d, r4hp39qlrrnf, r4hp381fy5cs, r4hp381fy5cw, r4hp393pwqm9, r4hp39mjd4qj, r4hp3b0d86ss, r4hp39znk89c, r4hp39b989y6, r4hp38mb0nx2, r4hp39ys9j31, r4hp39ln4dll, r4hp39krwnf7, r4hp39l6j13q, r4hp38hhy39p, r4hp381fy5cl, r4hp38jtt70k, r4hp38f9t7jr, r4hp39zj0gsm, r4hp38nsfl6q, r4hp38n1qmfs, r4hp39ln4dhj, r4hp39j9gr79, r4hp38jp8fh0, r4hp38y8vgc2, r4hp39v3rr24, r4hp3b0twljt, r4hp38819rv0, r4hp3pdb2p7r, r4hp39hf702g, eM3W2jDdq4CnoM]  [q25g9q497cwj, r4hp5zkjq0c3, r4hp5zjccp22, r4hp5zjccp1z, r4hp5zm4npzj, r4hp5zhs6j1c, zPNWxozYM3fxBr, r4hp5zjng89p, r4hp5yw4mj85, r4hp5zfl2n7g, r4hp5yxm1fj5, r4hp5yq9rf4h, r4hp5z5crc2v, r4hp5zbhq1cb, r4hp5z0tyv1k, r4hp5yvdxkr1, r4hp5zjccp2h, r4hp5zkzbcvf]  [kn3ptfq7c6lz, r4hp13l4sngc, r4hp13nhg9tp, YgJdXZMgAyT4za, r4hp13n1ty7z, r4hp13qznrsn, 3r0XoawY07FG2Z, 11g43clqdpk72, PNz3A1OgQesRKw, 11g43clqdpk4z, r4hp5z5nty2h, r4hp5zj2934z, r4hp5zhs6j1c, zPNWxozYM3fxBr]  60  [United States, Canada, Belgium, Denmark, Netherlands]  Interventional  ParallelGroup       2006-09-01T00:00:00Z    2010-04-30T00:00:00Z    Industry    Unassigned  https://doi.org/10.25934/00004723   Phase3  [Acute Ulcerative Colitis]  [Infliximab]    [Evaluating Response To Treatment, Activity Analysis]   [Ulcerative Colitis]    [infliximab]    []  []  []  []  []  []  []  []  None    None    Default 0       

相关文档:pandasrequests

© www.soinside.com 2019 - 2024. All rights reserved.