我创建一个 AWS Glue Notebook,我在块中运行步骤 1:
%glue_version 3.0
%worker_type G.1X
%number_of_workers 5
%%configure
{
"region": "ap-xxxxxxx-2",
"iam_role": "arn:aws:iam::xxxxxxx:role/IAM_Glue_Role"
}
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
输出是这样的:
Current idle_timeout is 2880 minutes.
idle_timeout has been set to 2880 minutes.
Setting Glue version to: 3.0
Previous worker type: G.1X
Setting new worker type to: G.1X
Previous number of workers: 5
Setting new number of workers to: 5
但是我也遇到这样的错误:
The following exception was encountered while parsing the configurations provided: invalid syntax (<unknown>, line 6)
Traceback (most recent call last):
File "/home/jupyter-user/.local/lib/python3.7/site-packages/aws_glue_interactive_sessions_kernel/glue_pyspark/GlueKernel.py", line 436, in configure
configs = ast.literal_eval(configs_json)
File "/usr/lib64/python3.7/ast.py", line 46, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/usr/lib64/python3.7/ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 7
import sys
^
SyntaxError: invalid syntax
请帮我解决这个问题
问题是在笔记本单元的顶部使用魔术命令会阻止内核这样解释您的 Spark 代码。
您应该在笔记本中将配置代码(如下)作为单独的单元格运行,然后将 spark 代码作为另一个单独的单元格运行。
%glue_version 3.0
%worker_type G.1X
%number_of_workers 5
%%configure
{
"region": "ap-xxxxxxx-2",
"iam_role": "arn:aws:iam::xxxxxxx:role/IAM_Glue_Role"
}