将 Spark 数据帧转换为 awsglue 动态框架

问题描述 投票:0回答:3

我尝试将我的 Spark 数据帧转换为动态数据帧,以胶合木地板文件的形式输出,但出现错误

“DataFrame”对象没有属性“fromDF””

我的代码大量使用 Spark 数据帧。有没有办法从 Spark 数据帧转换为动态框架,以便我可以将其写为glueparquet?如果可以的话,请提供一个例子,并指出我在下面做错了什么?

代码:

# importing libraries

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

glueContext = GlueContext(SparkContext.getOrCreate())

# updated 11/19/19 for error caused in error logging function

spark = glueContext.spark_session

from pyspark.sql import Window
from pyspark.sql.functions import col
from pyspark.sql.functions import first
from pyspark.sql.functions  import date_format
from pyspark.sql.functions import lit,StringType
from pyspark.sql.types import *
from pyspark.sql.functions import substring, length, min,when,format_number,dayofmonth,hour,dayofyear,month,year,weekofyear,date_format,unix_timestamp


base_pth='s3://test/'

bckt_pth1=base_pth+'test_write/glueparquet/'


test_df=glueContext.create_dynamic_frame.from_catalog(
                 database='test_inventory',
                 table_name='inventory_tz_inventory').toDF()

test_df.fromDF(test_df, glueContext, "test_nest")


glueContext.write_dynamic_frame.from_options(frame = test_nest,
                                             connection_type = "s3",
                                             connection_options = {"path": bckt_pth1+'inventory'},
                                             format = "glueparquet")

错误:

'DataFrame' object has no attribute 'fromDF'
Traceback (most recent call last):
  File "/mnt/yarn/usercache/livy/appcache/application_1574556353910_0001/container_1574556353910_0001_01_000001/pyspark.zip/pyspark/sql/dataframe.py", line 1300, in __getattr__
    "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'fromDF'
apache-spark pyspark aws-glue
3个回答
34
投票

fromDF
是一个类函数。以下是如何将
Dataframe
转换为
DynamicFrame

from awsglue.dynamicframe import DynamicFrame

DynamicFrame.fromDF(test_df, glueContext, "test_nest")

AWS 文档


13
投票

为了巩固 Scala 用户的答案,以下是如何将 Spark Dataframe 转换为 DynamicFrame(DynamicFrame 的 scala API 中不存在 fromDF 方法):

import com.amazonaws.services.glue.DynamicFrame  
val dynamicFrame = DynamicFrame(df, glueContext)

希望对你有帮助!


0
投票

导入动态DataFrame类

从 awsglue.dynamicframe 导入 DynamicFrame

#从 Spark 数据帧转换为 Glue 动态帧 dyfCustomersConvert = DynamicFrame.fromDF(df,glueContext,“转换”)

#显示转换后的Glue动态框架 dyfCustomersConvert.show()

© www.soinside.com 2019 - 2024. All rights reserved.