在Visual Studio代码调试Scrapy项目

Question

我已经在Windows计算机上，关于这一点我想提出一个新的Scrapy履带Visual Studio代码。履带工作正常，但我想调试代码，为此我在我的launch.json文件中添加此：

{
    "name": "Scrapy with Integrated Terminal/Console",
    "type": "python",
    "request": "launch",
    "stopOnEntry": true,
    "pythonPath": "${config:python.pythonPath}",
    "program": "C:/Users/neo/.virtualenvs/Gers-Crawler-77pVkqzP/Scripts/scrapy.exe",
    "cwd": "${workspaceRoot}",
    "args": [
        "crawl",
        "amazon",
        "-o",
        "amazon.json"
    ],
    "console": "integratedTerminal",
    "env": {},
    "envFile": "${workspaceRoot}/.env",
    "debugOptions": [
        "RedirectOutput"
    ]
}

不过，我不能打任何断点。 PS：我把JSON脚本从这里：http://www.stevetrefethen.com/blog/debugging-a-python-scrapy-project-in-vscode

Answer 1

里面你scrapy项目文件夹中创建具有以下一个runner.py模块： import os from scrapy.cmdline import execute os.chdir(os.path.dirname(os.path.realpath(__file__))) try: execute( [ 'scrapy', 'crawl', 'SPIDER NAME', '-o', 'out.json', ] ) except SystemExit: pass
放置在线路断点要调试
与vscode调试器中运行runner.py

Answer 2

我做到了。最简单的方法是让一个亚军脚本runner.py

import scrapy
from scrapy.crawler import CrawlerProcess

from g4gscraper.spiders.g4gcrawler import G4GSpider

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
    'FEED_FORMAT': 'json',
    'FEED_URI': 'data.json'
})

process.crawl(G4GSpider)
process.start() # the script will block here until the crawling is finished

然后，我添加断点蜘蛛而我发起调试该文件。参考：https://doc.scrapy.org/en/latest/topics/practices.html

Answer 3

不需要修改launch.json，默认的“巨蟒：当前文件（综合码头）”的作品完美。对于Python3项目，请记得将runner.py文件放置在同一水平scrapy.cfg文件（这是项目的根）。

所述runner.py代码@naqushab上述一样。注意processs.crawl（类名），其中的类名是你要设置的断点的蜘蛛类。

在Visual Studio代码调试Scrapy项目

问题描述投票：0回答：3

3个回答

最新问题

在Visual Studio代码调试Scrapy项目

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3