使用LXML返回标题文本

问题描述 投票:0回答:1

[我正在做一个学校项目,我正在使用LXML,它是.xpath函数来尝试在您可以选择的youtube搜索中获取热门视频的标题。我的问题是,当它遍历前5名并返回视频的标题值时,无论我做什么,我似乎都无法返回实际标题。我尝试执行/text()/string/title/text(),因为我要获取的文本位于标题中,但是我所做的一切都只是返回一个空白列表[]

这是我的python代码:

from lxml import html
import requests

string = input("Enter what you want to search up on Youtube: \n")
string.replace(" ", "+")
page = requests.get('https://www.youtube.com/results?search_query=' + string)
tree = html.fromstring(page.content)
for x in range(5):
  v = tree.xpath('/html/body/ytd-app/div/ytd-page-manager/ytd-search/div[1]/ytd-two-column-search-results-renderer/div/ytd-section-list-renderer/div[2]/ytd-item-section-renderer/div[3]/ytd-video-renderer[1]/div[1]/div/div[' + str(x) + ']/div/h3/a')
  print(v)

这是我要返回的东西:

Enter what you want to search up on Youtube:
rainbow
[]
[]
[]
[]
[]

这是我要从中提取TITLE TEXT的内容的HTML:

<a id="video-title" class="yt-simple-endpoint style-scope ytd-video-renderer" title="Hide and Seek in Rainbow Six Siege... Let's Go!!" href="/watch?v=g8MM_RS7zmw" aria-label="Hide and Seek in Rainbow Six Siege... Let's Go!! by Get_Flanked 8 hours ago 21 minutes 54,654 views">
                Hide and Seek in Rainbow Six Siege... Let's Go!!
              </a>

这是我第一次创建其中的一个,我只是一个学生,所以如果我格式化不正确或做错了什么,请放轻松。感谢您的帮助!

python lxml
1个回答
0
投票

考虑使用youtube数据API,他们确实有python库。

否则,如果您想使用某种类型的刮板,则需要一个可以执行javascript的刮板。 requests仅下载html文本文件,不运行javascript。

例如硒。

import selenium.webdriver

options = selenium.webdriver.FirefoxOptions()
options.add_argument("--headless")

driver = selenium.webdriver.Firefox(firefox_options=options)

driver.get('https://www.youtube.com/results?search_query=montypython')

[x.text for x in driver.find_elements_by_xpath('//*[@id="video-title"]')]
[x.text for x in driver.find_elements_by_id('video-title')]
print(dir(driver))

# how to get html tag attributes for example href
x.get_attribute("href")

>>> [x.get_attribute('title') for x in driver.find_elements_by_id('video-title')]
['Monty Python And The Holy Grail 1975 HD', 'Monty Python and the Holy Grail', "Monty Python's - The Funniest Joke in the World (la blague qui tue)", 'Argument', 'Monty Python - The Black Knight - Tis But A Scratch', 'Monty Python- Cheese Shop', 'Monty Python: The Parrot Sketch & The Lumberjack Song movie versions HQ', 'Biggus Dickus - Monty Python, Life of Brian.', 'Monty Python - Bridge of Death', 'Life of Brian 1979 (sub indo)', 'John Cleese - How To Irritate People 1968', 'Monty Python and The Holy Grail - Black Knight HD', 'Eric Idle - "Always Look On The Bright Side Of Life" - STEREO HQ', 'Monty pythons, Mr creosote, Full version,', 'Monty Python   Ministry of Silly Walks NL', 'Monty Python - careers advice', 'Monty Python and the Holy Grail - Bunny Attack Scene (HD)', 'Monty Python Society For Putting Things On Top of Other Things', 'Monty Python - Constitutional Peasants Scene (HD)']

另请参见:https://stackoverflow.com/help/how-to-askhttps://stackoverflow.com/tour

只要您的问题显示出一定的努力并且清晰明了,您的问题就可能会或可能不会找到答案,这取决于其他人是否能够理解所问的内容并有时间回答。

© www.soinside.com 2019 - 2024. All rights reserved.