我在访问 Kubernetes pod 内特定端口上运行的 ScrapyRT 服务时遇到困难。我的设置包括一个 Kubernetes 集群,其中有一个运行 Scrapy 应用程序的 pod,该应用程序使用 ScrapyRT 侦听指定端口上的传入请求。这些请求旨在触发相应端口上的蜘蛛。
尽管正确设置了 Kubernetes 服务并引用其中的 Scrapy Pod,我仍无法接收到该 Pod 的任何传入请求。我的理解是,在Kubernetes组网中,应该先创建一个service,然后再创建pod,通过service实现pod间的通信和外部访问。这是正确的吗?
相关配置如下:
scrapy-pod Dockerfile:
# Use Ubuntu as the base image
FROM ubuntu:latest
# Avoid prompts from apt
ENV DEBIAN_FRONTEND=noninteractive
# # Update package repository and install Python, pip, and other utilities
RUN apt-get update && \
apt-get install -y curl software-properties-common iputils-ping net-tools dnsutils vim build-essential python3 python3-pip && \
rm -rf /var/lib/apt/lists/*
# Install nvm (Node Version Manager) - EXPRESS
ENV NVM_DIR /usr/local/nvm
ENV NODE_VERSION 16.20.1
RUN mkdir -p $NVM_DIR
RUN curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
# Install Node.js and npm - EXPRESS
RUN . "$NVM_DIR/nvm.sh" && nvm install $NODE_VERSION && nvm alias default $NODE_VERSION && nvm use default
# Add Node and npm to path so the commands are available - EXPRESS
ENV NODE_PATH $NVM_DIR/versions/node/v$NODE_VERSION/lib/node_modules
ENV PATH $NVM_DIR/versions/node/v$NODE_VERSION/bin:$PATH
# Install Yarn - EXPRESS
RUN npm install --global yarn
# Set the working directory in the container to /usr/src/app
WORKDIR /usr/src/app
# Copy the current directory contents into the container at /usr/src/app
COPY . .
# Install any needed packages specified in requirements.txt
RUN pip3 install --no-cache-dir -r requirements.txt
# Copy the start_services.sh script into the container
COPY start_services.sh /start_services.sh
# Make the script executable
RUN chmod +x /start_services.sh
# Install any needed packages specified in package.json using Yarn - EXPRESS
RUN yarn install
# Expose all the necessary ports
EXPOSE 14805 14807 12085 14806 13905 12080 14808 8000
# Define environment variable - EXPRESS
ENV NODE_ENV production
# Run the script when the container starts
CMD ["/start_services.sh"]
start_services.sh:
#!/bin/bash
# Start ScrapyRT instances on different ports
scrapyrt -p 14805 &
scrapyrt -p 14807 &
scrapyrt -p 12085 &
scrapyrt -p 14806 &
scrapyrt -p 13905 &
scrapyrt -p 12080 &
scrapyrt -p 14808 &
# Keep the container running since the ScrapyRT processes are in the background
tail -f /dev/null
服务yaml文件:
apiVersion: v1
kind: Service
metadata:
name: scrapy-service
spec:
selector:
app: scrapy-pod
ports:
- name: port-14805
protocol: TCP
port: 14805
targetPort: 14805
- name: port-14807
protocol: TCP
port: 14807
targetPort: 14807
- name: port-12085
protocol: TCP
port: 12085
targetPort: 12085
- name: port-14806
protocol: TCP
port: 14806
targetPort: 14806
- name: port-13905
protocol: TCP
port: 13905
targetPort: 13905
- name: port-12080
protocol: TCP
port: 12080
targetPort: 12080
- name: port-14808
protocol: TCP
port: 14808
targetPort: 14808
- name: port-8000
protocol: TCP
port: 8000
targetPort: 8000
type: ClusterIP
部署yaml文件:
apiVersion: apps/v1
kind: Deployment
metadata:
name: scrapy-deployment
labels:
app: scrapy-pod
spec:
replicas: 1
selector:
matchLabels:
app: scrapy-pod
template:
metadata:
labels:
app: scrapy-pod
spec:
containers:
- name: scrapy-pod
image: mydockerhub/privaterepository-scrapy:latest
imagePullPolicy: Always
ports:
- containerPort: 14805
- containerPort: 14806
- containerPort: 14807
- containerPort: 12085
- containerPort: 13905
- containerPort: 12080
- containerPort: 8000
envFrom:
- secretRef:
name: scrapy-env-secret
- secretRef:
name: express-env-secret
imagePullSecrets:
- name: my-docker-credentials
scrapy-pod 在 Powershell 终端中的日志:
> k logs scrapy-deployment-56b9d66858-p59gs -f
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Log opened.
2024-01-09 21:53:27+0000 [-] Site starting on 12080
2024-01-09 21:53:27+0000 [-] Site starting on 14808
2024-01-09 21:53:27+0000 [-] Site starting on 14805
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f4cbdf44d60>
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7fef9b620a00>
2024-01-09 21:53:27+0000 [-] Site starting on 13905
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Site starting on 14807
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f0892ff4df0>
2024-01-09 21:53:27+0000 [-] Site starting on 14806
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f00d3b99000>
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7fba9e321180>
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7f1782514f10>
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Site starting on 12085
2024-01-09 21:53:27+0000 [-] Starting factory <twisted.web.server.Site object at 0x7fb2054cd060>
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
2024-01-09 21:53:27+0000 [-] Running with reactor: AsyncioSelectorReactor.
问题: 尽管有这些配置,但似乎没有请求到达 Scrapy pod。 kubectl日志中的日志显示ScrapyRT实例在指定端口上成功启动。但是,当我从运行 Python Jupyter Notebook 的单独调试 pod 发送请求时,它们对其他 pod 成功,但对 Scrapy pod 失败。
问题: 如何才能成功连接到Scrapy pod?什么可能会阻止请求到达它?
任何见解或建议将不胜感激。
很少有东西可以尝试 -
selector
字段是否与部署 YAML (scrapy-deployment) 中的标签匹配。标签应该相同才能正确选择 Pod。ClusterIP
,这意味着它是一个内部服务。如果您需要外部访问,这将不起作用。请仔细检查它。尝试将其更改为 NodePort
或 LoadBalancer
以获得外部访问。请告诉我上述故障排除是否有帮助。