在Visual Studio代码调试Scrapy项目

问题描述 投票:0回答:3

我已经在Windows计算机上,关于这一点我想提出一个新的Scrapy履带Visual Studio代码。履带工作正常,但我想调试代码,为此我在我的launch.json文件中添加此:

{
    "name": "Scrapy with Integrated Terminal/Console",
    "type": "python",
    "request": "launch",
    "stopOnEntry": true,
    "pythonPath": "${config:python.pythonPath}",
    "program": "C:/Users/neo/.virtualenvs/Gers-Crawler-77pVkqzP/Scripts/scrapy.exe",
    "cwd": "${workspaceRoot}",
    "args": [
        "crawl",
        "amazon",
        "-o",
        "amazon.json"
    ],
    "console": "integratedTerminal",
    "env": {},
    "envFile": "${workspaceRoot}/.env",
    "debugOptions": [
        "RedirectOutput"
    ]
}

不过,我不能打任何断点。 PS:我把JSON脚本从这里:http://www.stevetrefethen.com/blog/debugging-a-python-scrapy-project-in-vscode

python python-3.x visual-studio scrapy visual-studio-code
3个回答
7
投票
  1. 里面你scrapy项目文件夹中创建具有以下一个runner.py模块: import os from scrapy.cmdline import execute os.chdir(os.path.dirname(os.path.realpath(__file__))) try: execute( [ 'scrapy', 'crawl', 'SPIDER NAME', '-o', 'out.json', ] ) except SystemExit: pass
  2. 放置在线路断点要调试
  3. 与vscode调试器中运行runner.py

0
投票

我做到了。最简单的方法是让一个亚军脚本runner.py

import scrapy
from scrapy.crawler import CrawlerProcess

from g4gscraper.spiders.g4gcrawler import G4GSpider

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
    'FEED_FORMAT': 'json',
    'FEED_URI': 'data.json'
})

process.crawl(G4GSpider)
process.start() # the script will block here until the crawling is finished

然后,我添加断点蜘蛛而我发起调试该文件。参考:https://doc.scrapy.org/en/latest/topics/practices.html


-1
投票

不需要修改launch.json,默认的“巨蟒:当前文件(综合码头)”的作品完美。对于Python3项目,请记得将runner.py文件放置在同一水平scrapy.cfg文件(这是项目的根)。

所述runner.py代码@naqushab上述一样。注意processs.crawl(类名),其中的类名是你要设置的断点的蜘蛛类。

© www.soinside.com 2019 - 2024. All rights reserved.