我需要从
<small>
(产品名称)获取除数量和价值之外的文本,但我没有找到任何代码。
我尝试在以下示例 HTML 中使用
soup.find
或 soup.select
:
<tbody>
<tr>
<td class="no-border" colspan="2">
<small>
1: EFURIX CREM 15 GR S VALEA
<span class="pull-right">
</span>
</small>
<small>
1,00 x R$22,50
</small>
<td class="no-border text-right">
<small>
R$22,50
</small>
</td>
</td>
</tr>
<tr>
<td class="no-border" colspan="2">
<small>
2: ASDRON XPE FR 100ML
<span class="pull-right">
</span>
</small>
<small>
1,00 x R$50,32
</small>
<td class="no-border text-right">
<small>
R$50,32
</small>
</td>
</td>
</tr>
<tr>
<td class="no-border" colspan="2">
<small>
3: DIAD 0,75MGC/ 2 COMP
<span class="pull-right">
</span>
</small>
<small>
1,00 x R$5,00
</small>
<td class="no-border text-right">
<small>
R$5,00
</small>
</td>
</td>
</tr>
</tbody>
css selector
来始终获得第一个 <small>
:
soup.select('tr > td:first-of-type > small:first-of-type')
要根据您的示例获取所有标题,请使用
list comprhension
:
[title.get_text(strip=True).split(' ')[-1] for title in soup.select('tr > td:first-of-type > small:first-of-type')]
结果:
['EFURIX CREM 15 GR S VALEA', 'ASDRON XPE FR 100ML', 'DIAD 0,75MGC/ 2 COMP']