我已经使用 Puppeteer 和 Node.js 构建了一个抓取工具,现在我想对其进行 dockerize。我尝试了多种方法来解决这个问题,但是当 puppeteer 尝试启动浏览器进行抓取时遇到问题。
我当前的基本 Dockerfile,没有 Puppeteer 或任何其他依赖项: 我尝试了多种方法来从各个方面更新此 Dockerfile(添加 chrome、puppeteer),但不起作用
# Use Node.js runtime as the base image
FROM node:18
# Set the working directory in the container
WORKDIR /usr/src/app
# Copy package.json and package-lock.json to the working directory
COPY package*.json ./
# Install dependencies
RUN npm install
# Copy the rest of the application code
COPY . .
# Expose the port the app runs on
EXPOSE 8080
# Command to run the application
CMD ["node", "scraper.js"]
代码: 触发/启动浏览器的片段
// Launch browser
const browser = await launch({ headless: true, defaultViewport: null });
有人可以帮助我吗?我该如何解决这个问题才能理想地工作?
尝试了各种可能的方法遇到错误:
抓取过程中出现错误:
Error: Failed to launch the browser process!
web-crawler-1 | rosetta error: failed to open elf at /lib64/ld-linux-x86-64.so.2
web-crawler-1 |
web-crawler-1 |
web-crawler-1 |
web-crawler-1 | TROUBLESHOOTING: https://pptr.dev/troubleshooting
web-crawler-1 |
web-crawler-1 | at Interface.onClose (file:///usr/src/app/node_modules/@puppeteer/browsers/lib/esm/launch.js:301:24)
web-crawler-1 | at Interface.emit (node:events:529:35)
web-crawler-1 | at Interface.close (node:internal/readline/interface:534:10)
web-crawler-1 | at Socket.onend (node:internal/readline/interface:260:10)
web-crawler-1 | at Socket.emit (node:events:529:35)
web-crawler-1 | at endReadableNT (node:internal/streams/readable:1400:12)
web-crawler-1 | at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
这个解决方案对我有用。
要在 Docker 容器内运行 Puppeteer,您应该手动安装 Google Chrome,因为与 Debian 提供的 Chromium 软件包相比,Chrome 只提供最新的稳定版本。
在 Dockerfile 上安装浏览器:
FROM node:slim AS app
# We don't need the standalone Chromium
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
# Install Google Chrome Stable and fonts
# Note: this installs the necessary libs to make the browser work with Puppeteer.
RUN apt-get update && apt-get install curl gnupg -y \
&& curl --location --silent https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install google-chrome-stable -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# Install your app here...
此外,如果您像我一样使用基于 ARM 的 CPU (Apple M1),则在构建 Docker 映像时应该使用
--platform linux/amd64
参数。
构建命令:
docker build --platform linux/amd64 -t <image-name> .
注意:更新
Dockerfile
后,请确保更新 puppeteer script
,同时启动 puppeteer 浏览器,添加可执行路径以及我们最近在计算机上安装的 chrome 的路径。
const browser = await launch({
headless: true,
defaultViewport: null,
executablePath: '/usr/bin/google-chrome',
args: ['--no-sandbox'],
});