Scrapy 项目未运行 - python

问题描述 投票:0回答:0

我试图运行我的 Scrapy 项目,但我一直收到错误。我附上了错误。任何帮助将不胜感激!

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.11/bin/scrapy", line 8, in <module>
    sys.exit(execute())
             ^^^^^^^^^
  File "/Users/username/Development/code/company/redditScraping/redditScraping/spiders/redditSpider.py", line 39
    break
    ^^^^^
SyntaxError: 'break' outside loop

此外,错误还一直说中断在第 39 行的循环之外,但第 39 行没有写任何内容。我已经更改了代码,但它仍然认为中断在第 39 行。我什至尝试提交我的代码,但它仍然一直在说。

这是我写的代码:

import scrapy
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

class RedditSpider(scrapy.Spider):
    name = "reddit"
    start_urls = [
        'https://www.reddit.com/r/all/'
    ]
    custom_settings = {
        'DOWNLOAD_DELAY': 2.0
    }

    def parse (self, response):
        yield("before username forloop")
        for link in response.css('._3ryJoIoycVkA88fy40qNJc'):
            username = link.css('::attr(href)').get()
            if username:
                yield {'username': username}
                break
        for title in response.css('._eYtD2XCVieq6emjKBH3m::text').extract():
            yield {'title': title}
            break
        for link in response.css('.SQnoC3ObvgnGjWt90zD9Z._2INHSNB8V5eaWp4P0rY_mE'):
            post_link = link.css('::attr(href)').get()
            if post_link:
                yield {'post_link': post_link}
                break
        self.scroll()


    def scroll (self):
        driver = webdriver.Chrome('/usr/local/bin/chromedriver')
        driver.get('https://www.reddit.com/r/all/')

        time.sleep(3)

        element = driver.find_element("tag name", "body")
        while True:
            element.send_keys(Keys.PAGE_DOWN)
            time.sleep(3)
            print("before if statement in scroll method")
            if len(driver.find_elements_by_css_selector('._3JgI-GOrkmyIeDeyzXdyUD')) == 0:
                break

python web-scraping scrapy
© www.soinside.com 2019 - 2024. All rights reserved.