xpath 相关问题

XPath的主要目的是解决XML文档的各个部分。它还提供操纵弦乐，数字和布尔值的基本设施。 XPath使用紧凑的非XML语法。 XPath在XML文档的抽象逻辑结构上运行，而不是表面语法。

我之前的问题在这里得到了解答如何在body函数中分离freemarker中的XML标签但是我的 xml 带有名称空间，但是当我尝试在模板中添加名称空间时，它无法解析...

xml xpath freemarker

回答 1 投票 0

我有一个如下所示的html页面 A 我有一个如下所示的html页面 <body> <div> <a class="btn btn-info" href="a.php">A</a> <a class="btn btn-info" href="b.php">B</a> <a class="btn btn-info" href="c.php">C</a> <a class="btn btn-info" href="d.php">D</a> </div> <a class="btn btn-info" href="f.php">F</a> </body> 我需要选择最后一个链接F。我已经尝试过如下 link = driver.find_element_by_xpath("//a[last()]") 正在选择D 我也尝试过以下方式 email = driver.find_element_by_xpath("/body//a[last()]") 此时无法定位元素。我怎样才能以简单的方式获得这里的最后一个链接？获取last元素。尝试以下xpath。 link = driver.find_element_by_xpath("(//a)[last()]") 使用 driver.findElements() 方法并将返回值放入数组中。最后一个链接将是数组的最后一个元素。我建议你（出于各种原因）：做yourArray = driver.findElements(By.tag("a")) 对于每个元素，使用 element.getAttribute("href") != null 检查 href 属性是否不为空，如果不是则更新 myIndex var “yourArray”的索引“myIndex”处的元素将是您要搜索的元素。添加@KunduK 接受的答案（因为我还不能发表评论） - 如果你想单击最后一个元素，使用新的 Selenium 版本它将是 driver.find_element("xpath","(//a)[last()]").click()。

python selenium xpath

回答 3 投票 0

使用Ansible修改pom.xml

我想即时修改 pom.xml 文件，以使 Maven 针对特定的 groupId 使用本地安装的 JAR。所以，我需要：更改/设置每个匹配系统依赖项的范围。添加

xml maven xpath ansible pom.xml

回答 1 投票 0

在硒中获取“对象不支持属性或方法“评估””| IE

我正在 IE（Internet Explorer）浏览器上自动化 Web 应用程序。当我在页面加载后尝试查找页面上的任何元素时，出现以下错误。 selenium.common.exceptions.

selenium-webdriver xpath internet-explorer internet-explorer-11

回答 1 投票 0

Selenium WebDriver FindElements(By.XPath()) 不返回任何数据

我正在尝试引用下面 HTML 中的链接元素。我试图引用的元素是其中包含“WANT_TO_FIND_THIS_LINK”的元素。使用这个发现主 div 就很好了： var 产品Hol...

c# selenium-webdriver xpath selenium-chromedriver

回答 1 投票 0

xpath（未知，字符变化）不存在

我正在尝试从 PostgreSQL 表 events_data 中获取一些数据表 public.incidents_data 专栏 |类型 |修改器 |存储|统计数据...

postgresql xpath

回答 1 投票 0

我在使用 SELENIUM 查找元素时遇到问题

我试图找到一个在 HTML 内部相当深的元素，并且我尝试了 Xpath、类、链接文本和 css 方法以及这些方法中的任何一个。我总是遇到同样的错误：selenium.common.excepti...

python html selenium-webdriver xpath

回答 1 投票 0

在 Powershell 中解析 XML：转义 XPath 中的各种符号

我需要解析 VS2022 .vbproj 文件以查找构建配置的详细信息。在 powershell 中，我加载文件并尝试访问必要的节点。但我被误导性的 PS 错误报告所困扰......

xml powershell visual-studio xpath

回答 1 投票 0

无法使用BeautifulSoup访问Div内的img

我正在尝试使用Python中的BeautifulSoup访问图像的SRC。这是图像的嵌套方式：我正在尝试使用 Python 中的 BeautifulSoup 访问图像的 SRC。这是图像的嵌套方式： <div class="artistAndEventInfo-7c13900b"> <a class="artistAndEventInfo-48455a81" href="https://www.bandsintown.com/a/11985-perkele?came_from=257&utm_medium=web&utm_source=artist_event_page&utm_campaign=artist"> <img src="https://assets.bandsintown.com/images/fallbackImage.png" alt=""> </a> 我尝试了三种方法。 1：逻辑是我选择相关图像的父 div，然后选择其中的子 img： image = soup.select_one('[class^=artistAndEventInfo-7c13900b] img') print "band image", image 这将打印“none”。（它应该输出SRC）。 2：使用更明确的第n个类型方法： image = soup.select_one('[class^=artistAndEventInfo-7c13900b] :nth-of-type(1) img') 但是输出仍然是“none”。 3：我也尝试过使用 Selenium： driver.find_element_by_xpath("//div[@class^=artistAndEventInfo-48455a81']") 这给了我错误： selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: Unable to locate an element with the xpath expression //div[@class^=artistAndEventInfo-7c13900b']/img because of the following error: SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//div[@class^=artistAndEventInfo-7c13900b']/img' is not a valid XPath expression. (Session info: chrome=74.0.3729.157) (Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}),platform=Mac OS X 10.11.6 x86_64) 为什么我的代码在所有这些情况下都不起作用？您的 xpath 看起来有错误 //div[@class^=artistAndEventInfo-7c13900b']/img' 应该是 //div[@class='artistAndEventInfo-7c13900b']/img' 如果你想获取图像的src，那么你应该使用下面的代码和更正后的xpath。 print(driver.find_element_xpath("//div[@class='artistAndEventInfo-7c13900b']//img").get_attribute("src")) 如果您想使用选项 1 和 2，请确保您获得如下属性 src。 print image['src'] 使用 BeautifulSoup，你可以这样做： from bs4 import BeautifulSoup html = ''' <div class="artistAndEventInfo-7c13900b"> <a class="artistAndEventInfo-48455a81" href="https://www.bandsintown.com/a/11985-perkele?came_from=257&utm_medium=web&utm_source=artist_event_page&utm_campaign=artist"> <img src="https://assets.bandsintown.com/images/fallbackImage.png" alt=""> </a> ''' soup = BeautifulSoup(html,'html5lib') img = soup.find('img') src = img['src'] print(src) 您的 div 标签类属性值可能是动态的。您可以尝试下面的方法，而不是使用完整的类属性值。 from bs4 import BeautifulSoup html='''<div class="artistAndEventInfo-7c13900b"> <a class="artistAndEventInfo-48455a81" href="https://www.bandsintown.com/a/11985-perkele?came_from=257&utm_medium=web&utm_source=artist_event_page&utm_campaign=artist"> <img src="https://assets.bandsintown.com/images/fallbackImage.png" alt=""> </a>''' soup=BeautifulSoup(html,'lxml') image = soup.select_one('div[class^=artistAndEventInfo-] img') print(image['src'])

python html selenium-webdriver xpath beautifulsoup

回答 4 投票 0

如何在网页游戏中查找元素

我正在尝试使用Python（pyautongui，selenium）为网页游戏创建一个宏。这个宏需要读取网页游戏中的一个元素，但是用普通的meta似乎无法读取该元素...

python xpath macros element

回答 1 投票 0

Python selenium - 通过 xPath 获取元素并访问它

我正在使用Python和Selenium来抓取网页，在某些情况下，我无法让它工作。我想访问带有文本“PInt”的元素，这是下面代码中的第二个链接。 xPath...

python selenium-webdriver web-scraping xpath

回答 4 投票 0

将 XML 表转换为 R 中的 tibble

我正在尝试将具有不同行数和列数的 XML 表转换为数据帧。我可以使用格式良好、可预测的表格来做到这一点，如下两个表格： ... 我正在尝试将具有不同行数和列数的 XML 表转换为数据帧。我可以使用格式良好、可预测的表格来做到这一点，就像这两个表格一样： <table xml:id="a"> <row role="label"> <cell cols="2">Stuff</cell> </row> <row> <cell>Thing</cell> <cell>1</cell> </row> <row> <cell>Another thing</cell> <cell>2</cell> </row> </table> <table xml:id="b"> <row role="label"> <cell cols="2">Nonsense</cell> </row> <row> <cell>Thing</cell> <cell>3</cell> </row> <row> <cell>Anything</cell> <cell>2</cell> </row> <row> <cell>Another thing</cell> <cell>2</cell> </row> </table> 我可以将它们更改为这样的小标题： # A tibble: 5 × 4 id label cell.1 cell.2 <chr> <chr> <chr> <dbl> 1 a Stuff Thing 1 2 a Stuff Another thing 2 3 b Nonsense Thing 3 4 b Nonsense Anything 2 5 b Nonsense Another thing 2 使用此代码： x <- "table.xml" file <- read_xml(x) cells <- file %>% xml_find_all(".//cell") output <- lapply(cells, function(d){ id <- d %>% xml_find_first(".//parent::row/parent::table")%>% xml_attr("id") label <- d %>% xml_find_first(".//parent::row/preceding-sibling::row[@role='label']")%>% xml_text() cell.1 <- d %>% xml_find_first(".//parent::row/cell")%>% xml_text() cell.2 <- d %>% xml_find_all(".//following-sibling::cell")%>% xml_double() tibble(id, label, cell.1, cell.2) }) answer <- do.call(rbind, output) 但是，这种方法依赖于提供的属性 (@role='label')、一致的单元格数量等。我需要在一堆格式不规则的 XML 表上运行此脚本。如果我向上例中的某一行添加一个额外的单元格，我的方法就会失败。我怀疑我可能以错误的方式处理这个问题。例如，我可以用 xml2::as_list() 来做到这一点吗？我的尝试还没有成功。这是一种使用 rvest 包和 tidyverse 转换中的 html_table 的可能方法。注意：您显示的 XML 无效。 ### Packages library(xml2) library(rvest) library(stringr) library(dplyr) library(purrr) ### Parse the XML and transform the result as character a=read_xml("C:/Users/YourName/Downloads/YourFile.xml") b=as.character(a) ### Replace the content of the XML to conform the tables to HTML tables structure b=str_replace_all(b,"cell cols","td colspan") b=str_replace_all(b,"row","tr") b=str_replace_all(b,"cell","td") ### Parse the result of the transformation c=read_xml(b) ### Get all ids of the tables attr=html_elements(c,xpath = "//table") %>% html_attrs() %>% unlist() ### Get all the tables temp=c%>% html_elements(xpath = "//table") %>% html_table() ### Declare a function to transform the tables ### Transform the last column from character to numeric transform=function(x,y){x %>% slice(-1) %>% mutate(id=y, label=x[1,1][[1]],.before=1, X2=as.numeric(X2))} ### Apply the function done=map2(.x = temp,.y = attr,.f = transform) ### Stack the tables and rename the columns end=bind_rows(gigachad) %>% rename_with(.fn = ~str_replace(.x,"X","cell."), .cols = starts_with("X")) 输出： # A tibble: 5 × 4 id label cell.1 cell.2 <chr> <chr> <chr> <dbl> 1 a Stuff Thing 1 2 a Stuff Another thing 2 3 b Nonsense Thing 3 4 b Nonsense Anything 2 5 b Nonsense Another thing 2

r xml xpath tidyverse xml2

回答 1 投票 0

我想在Python Selenium WebDriver中只提取li的内容而不提取span的内容

所以这是示例 html 文档：列出内容跨越文本... 这是示例 html 文档： <div id="div1"> <ul class="lists"> <li class="listItem1"> List content <span class="span1">span text</span> </li> </ul> </div> 我只想提取“列表内容” 如果我这样做： elementList = driver.find_element(By.XPATH, "/div/ul/li") elementList.text 我明白：List content span text 如果我这样做： elementList = driver.find_element(By.XPATH, "/div/ul/li[1]") 我还是明白List content span text 我该怎么做才能得到没有跨度文本的“列出内容” elementList = driver.find_element(By.XPATH, "//li[@class='listItem1']").text span_text = driver.find_element(By.XPATH, "//span[@class='span1']").text elementList = elementList.replace(span_text, "") 这可能有点啰嗦，但我认为它应该可以解决问题。尝试 text() 选项，它将返回 li 标签中存在的文本 elementList = driver.find_element(By.XPATH, "//div/ul/li/text()") 或 elementList = driver.find_element(By.XPATH, "normalize-space(//div/ul/li/text())")

python html selenium-webdriver xpath xpath-2.0

回答 2 投票 0

为什么last()选择以下XML中的两个标签？

给出以下 XML：嗨！这是 xpather 测试版... 这个网络应用程序使您能够查询 XML/HTML 文档...

xml xpath

回答 1 投票 0

使用路径等待（Puppeteer）

我正在使用 Puppeteer 22.6.0 和 NodeJS 进行网页抓取，我试图暂停脚本直到特定的 h1 元素可见，问题是页面上有多个 h1 元素，并且唯一的

javascript node.js web-scraping xpath puppeteer

回答 1 投票 0

如何向 xpath 添加注释？

例如，我有一个 xpath，并希望在它附近添加注释来识别它。 /html/body/div/table/tr/td/a{这是一个链接}