我在使用Web抓取Python方面遇到了麻烦

Question

我对编码非常陌生，我试图编写一个从coinmarketcap输入当前Litecoin价格的代码。但是，我无法让它工作，它打印和清空列表。

import urllib
import re

htmlfile = urllib.urlopen('https://coinmarketcap.com/currencies/litecoin/')

htmltext = htmlfile.read()

regex = 'span class="text-large2" data-currency-value="">$304.08</span>'

pattern = re.compile(regex)

price = re.findall(pattern, htmltext)

print(price)

出来“[]”。问题可能很小，但我非常感谢你的帮助。

Answer 1

正则表达式通常不是处理HTML的最佳工具。我建议看看像BeautifulSoup这样的东西。

例如：

import urllib
import bs4

f = urllib.urlopen("https://coinmarketcap.com/currencies/litecoin/")
soup = bs4.BeautifulSoup(f)
print(soup.find("", {"data-currency-value": True}).text)

目前打印“299.97”。

对于这个简单的情况，这可能不如使用re那样好。但是，请参阅Using regular expressions to parse HTML: why not?

Answer 2

您需要更改RegEx并在括号中添加组以捕获值。

尝试匹配像：<span class="text-large2" data-currency-value>300.59</span>，你需要这个RegEx：

regex = 'span class="text-large2" data-currency-value>(.*?)</span>'

(.*?)组用于记录数字。

你得到：

['300.59']

我在使用Web抓取Python方面遇到了麻烦

问题描述投票：-3回答：2

2个回答

最新问题

我在使用Web抓取Python方面遇到了麻烦

问题描述 投票：-3回答：2

2个回答

最新问题

问题描述投票：-3回答：2