从带有 Beautiful Soup 的标签获取文本,但其兄弟姐妹除外

问题描述 投票:0回答:1

我需要从

<small>
(产品名称)获取除数量和价值之外的文本,但我没有找到任何代码。

我尝试在以下示例 HTML 中使用

soup.find
soup.select

<tbody>
       <tr>
        <td class="no-border" colspan="2">
         <small>
          1:   EFURIX CREM 15 GR S VALEA
          <span class="pull-right">
          </span>
         </small>
         <small>
          1,00 x R$22,50
         </small>
         <td class="no-border text-right">
          <small>
           R$22,50
          </small>
         </td>
        </td>
       </tr>
       <tr>
        <td class="no-border" colspan="2">
         <small>
          2:   ASDRON XPE FR 100ML
          <span class="pull-right">
          </span>
         </small>
         <small>
          1,00 x R$50,32
         </small>
         <td class="no-border text-right">
          <small>
           R$50,32
          </small>
         </td>
        </td>
       </tr>
       <tr>
        <td class="no-border" colspan="2">
         <small>
          3:   DIAD  0,75MGC/ 2 COMP
          <span class="pull-right">
          </span>
         </small>
         <small>
          1,00 x R$5,00
         </small>
         <td class="no-border text-right">
          <small>
           R$5,00
          </small>
         </td>
        </td>
       </tr>
      </tbody>
python web-scraping beautifulsoup
1个回答
0
投票

您可以使用带有伪类的

css selector
来始终获得第一个
<small>
:

soup.select('tr > td:first-of-type > small:first-of-type')

要根据您的示例获取所有标题,请使用

list comprhension
:

[title.get_text(strip=True).split('   ')[-1] for title in soup.select('tr > td:first-of-type > small:first-of-type')]

结果:

['EFURIX CREM 15 GR S VALEA', 'ASDRON XPE FR 100ML', 'DIAD  0,75MGC/ 2 COMP']
© www.soinside.com 2019 - 2024. All rights reserved.