Docker容器中本地模式的PySpark的Bizarre异常

问题描述 投票:1回答:1

我要执行的test.py中有一个Spark应用程序。简而言之,我已经安装了[[Spark 2.3.0,然后想要执行test.py。当我在我的显影机Mac Book上进行操作时,一切都很好。但是,当尝试在Mac BookDocker容器中做同样的事情时,我会遇到以下异常,我从Google中搜索了可能的提示而没有取得成功。

到目前为止收集的线索

    在我的笔记本电脑上运行
  1. test.py

而没有Docker绝对没问题。保存时似乎与记录号有关。I.在三个动作中,所有动作都有相似的逻辑。在
  • Docker
  • 容器中,第一个动作执行得很好。但是,当Spark尝试将结果保存到AWS S3时,下两个每次都会失败。二。如果我在最后两个动作(例如5000)中将数据帧限制为输出,则程序将执行得很好。 Spark应用程序的逻辑

    从MySQL或S3检索数据,进行一些计算,然后将结果存储到

    AWS S3

    异常消息

    Exception happened during processing of request from ('127.0.0.1', 52218) 2018-04-24 06:17:19,572 678 py4j.java_gateway INFO:Error while receiving. Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 1062, in send_command raise Py4JNetworkError("Answer from Java side is empty") py4j.protocol.Py4JNetworkError: Answer from Java side is empty 2018-04-24 06:17:19,626 678 root ERROR:Exception while sending command. Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 1062, in send_command raise Py4JNetworkError("Answer from Java side is empty") py4j.protocol.Py4JNetworkError: Answer from Java side is empty During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 908, in send_command response = connection.send_command(command) File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 1067, in send_command "Error while receiving", e, proto.ERROR_ON_RECEIVE) py4j.protocol.Py4JNetworkError: Error while receiving Traceback (most recent call last): Traceback (most recent call last): File "py/apmain.py", line 36, in <module> File "/usr/lib64/python3.6/socketserver.py", line 317, in _handle_request_noblock self.process_request(request, client_address) File "/usr/lib64/python3.6/socketserver.py", line 348, in process_request self.finish_request(request, client_address) File "/usr/lib64/python3.6/socketserver.py", line 361, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/lib64/python3.6/socketserver.py", line 696, in __init__ self.handle() File "/opt/spark/python/pyspark/accumulators.py", line 235, in handle num_updates = read_int(self.rfile) File "/opt/spark/python/pyspark/serializers.py", line 685, in read_int raise EOFError EOFError ---------------------------------------- main() File "py/apmain.py", line 30, in main engine_main.main(sys.argv) File "/opt/ap2126/py/dtt/ml/framework/engine_main.py", line 46, in main raise e File "/opt/ap2126/py/dtt/ml/framework/engine_main.py", line 36, in main engine.execute(tags) File "/opt/ap2126/py/dtt/ml/framework/engine.py", line 152, in execute passing_datas = layer_handler.execute(passing_datas, tags, is_load_cache=True) File "/opt/ap2126/py/dtt/ml/framework/layer_handler.py", line 319, in execute is_load_cache) File "/opt/ap2126/py/dtt/ml/framework/layer_handler.py", line 87, in _execute_workers_as_sequence output_data = worker_handler.worker.do_job(input_datas, results) File "/opt/ap2126/py/ap2126/label_extraction.py", line 498, in do_job self._get_lbls_from_structured_job('mysql') File "/opt/ap2126/py/ap2126/label_extraction.py", line 583, in _get_lbls_from_structured_job .json(job_save_temp_path) File "/opt/spark/python/pyspark/sql/readwriter.py", line 775, in json self._jwrite.json(path) File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 1160, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/opt/spark/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/usr/lib/python3.6/site-packages/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name)) py4j.protocol.Py4JError: An error occurred while calling o818.json 2018-04-24 06:17:23,151 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,152 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,153 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,154 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,155 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,156 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,156 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,158 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,159 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,159 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,216 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,217 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,218 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,219 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,220 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,222 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused 2018-04-24 06:17:23,223 678 py4j.java_gateway ERROR:An error occurred while trying to connect to the Java server (127.0.0.1:38899) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 852, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/py4j/java_gateway.py", line 990, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused
    Dockerfile的一部分

    . . FROM centos:centos7 . . RUN yum -y install https://centos7.iuscommunity.org/ius-release.rpm && \ yum -y install python36u python36u-pip python36u-devel && \ python3.6 -m pip install --upgrade pip && \ echo "export PYTHONIOENCODING=utf-8" >> ~/.bashrc && \ echo "alias python='python3'" >> ~/.bashrc && \ # Installation of the Java Runtime Environment su -c "yum -y install java-1.8.0-openjdk" && \ yum -y install mlocate; updatedb && \ echo "export JAVA_HOME=\"$(locate bin/java | grep jvm | sed 's+/bin/java++g')\"" >> ~/.bashrc && \ # Installation of Spark for running `pyspark` with success curl -s http://ftp.twaren.net/Unix/Web/apache/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz | tar -zx -C /opt/ && \ ln -s /opt/spark-2.3.0-bin-hadoop2.7 /opt/spark && \ echo "export SPARK_HOME=/opt/spark >> ~/.bashrc"; source ~/.bashrc && \ echo "export PYTHONPATH="$SPARK_HOME"/python/" >> ~/.bashrc; source ~/.bashrc && \ echo "export PYTHONPATH="$PYTHONPATH":./py/" >> ~/.bashrc && \ echo "export PYSPARK_PYTHON=/usr/bin/python3" >> ~/.bashrc; source ~/.bashrc && \ # For the interaction with AWS S3 on Spark curl -s http://central.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.3/hadoop-aws-2.7.3.jar -o ${SPARK_HOME}/jars/hadoop-aws-2.7.3.jar && \ curl -s http://central.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar -o ${SPARK_HOME}/jars/aws-java-sdk-1.7.4.jar . . .
    我如何运行Docker

    docker run -it --rm ${image id}

    macos docker pyspark
    1个回答
    0
    投票
    这是由于运行中的容器的内存限制,docker守护进程杀死了pyspark进程的同伴Java进程。 python对应项检测到套接字关闭并抛出EOF。尝试重新连接到无效进程时,将启动其他异常。

    您应该在运行容器(https://docs.docker.com/config/containers/resource_constraints/)时检查进程的内存消耗(特别是实际上执行大多数spark操作的java进程,并根据需要调整内存设置(--memory,-memory-swap)。 。

    © www.soinside.com 2019 - 2024. All rights reserved.