如何将数据框作为表保存到databricks数据库

问题描述 投票:0回答:1

我使用下面的代码将 ms sql 表保存到 databricks 表。

driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"

database_host = "myservername"
database_port = "1433" # update if you use a non-default port
database_name = "mydbname"
table = "mytablename"
user = "username"
password = "password"

url = f"jdbc:sqlserver://{database_host}:{database_port};database={database_name}"

remote_table = (spark.read
  .format("jdbc")
  .option("driver", driver)
  .option("url", url)
  .option("dbtable", table)
  .option("user", user)
  .option("password", password)
  .load()
)
display(remote_table)


import pandas as pd
pandas_df = remote_table.toPandas()
pandas_df


df1 = pd.DataFrame(pandas_df)
spark_df = spark.createDataFrame(df1)


spark_df.write.saveAsTable("databricks_database_name.test")


%sql
select * from edl_dev_app_ent_deat_src.test2

在最后一个sql代码之后,我得到了这个错误。如何将表保存到databricks表? IllegalStateException:在 [Id#13513、RunId#13514、ParentDisplayValue#13515、ParentLink#13516、MadeSla#13517、UServiceofferingDisplayValue#13518、UServiceofferingLink#13519、WatchList#13520、ScCatalog#13521、SnE 中找不到 UMihandled#13586签署文档#13522 ,UponReject#13523,SysUpdatedOn#13524,TaskEffectiveNumber#13525,UMultipleComments#13526,ApprovalHistory#13527,Skills#13528,Number#13529,SysUpdatedBy#13530,OpenedByDisplayValue#13531,OpenedByLink#13532,UserInput#135 33、SysCreatedOn#13534、SysDomainDisplayValue #13535、SysDomainLink#13536、状态#13537、RouteReason#13538、SysCreatedBy#13539、订单#13540、CalendarStc#13541、ClosedAt#13542、CmdbCiDisplayValue#13543、CmdbCiLink#13544、CmdbCiBusinessApp#135 45、合同#13546、影响#13547 ,Active#13548,WorkNotesList#13549,BusinessServiceDisplayValue#13550,BusinessServiceLink#13551,Priority#13552,SysDomainPath#13553,TimeWorked#13554,ExpectedStart#13555,OpenedAt#13556,BusinessDuration#13557,GroupList#13558,WorkEnd#13559 ,审批集#13560、WorkNotes#13561、UniversalRequest#13562、RequestDisplayValue#13563、RequestLink#13564、ShortDescription#13565、CorrelationDisplay#13566、WorkStart#13567、AssignmentGroupDisplayValue#13568、AssignmentGroupLink#13569、AdditionalAssigneeList#13570、Description#1第3571章,阿特#13572 ,CalendarDuration#13573,CloseNotes#13574,ServiceOfferingDisplayValue#13575,ServiceOfferingLink#13576,SysClassName#13577,ClosedByDisplayValue#13578,ClosedByLink#13579,FollowUp#13580,URptDuration#13581,SysId#13582, ContactType#13583、SnEsignEsignatureConfiguration#13584、紧急#13585,公司#13587,重新分配计数#13588,ActivityDue#13589,AssignedToDisplayValue#13590,AssignedToLink#13591,评论#13592,批准#13593,SlaDue#13594,CommentsAndWorkNotes#13595,DueDate#13596,SysModCount#1第3597章,请求项显示值#13598 ,RequestItemLink#13599,UEbondSrCreate#13600,SysTags#13601,CatItemDisplayValue#13603,CatItemLink#13604,升级#13605,UponApproval#13606,CorrelationId#13607,Location#13608,VarRequestedFor#13609,VarProjectTitle#13 610、VarPrimaryContact#13611、VarApplication #13612、VarWhichSiteTemplateIsIt#13613、VarDescriptionOfRequest#13614、VarSolutionURL#13615、VarHowAutomatedIsYourCurrentProcess#13616、VarWhatValueWouldAutomatingYourProcessProvide#13617、VarHowManyFTEsDoesYourCurrentProcessRequire#13618、VarHowManyEWsDoes YourCurrentProcessRequire#13619、VarHowMuchTimeDoesYourCurrentProcessRequire#13620、VarWhatIsYourAnnualizedProcessFrequency#13621、VarAreYouEstimatingAnyCostSavings#13622、VarLBenefitAnalysis#13623、VarLProductivityHoursSaved#13624 ,VarLComplianceRiskPercentage#13625,VarLDidYouAttachTheScreenshotOfTheErrorOrIssue#13626,VarLPointOfContact#13627,VarLOverviewOfTheRequest#13628,VarLStakeholders#13629,VarLRequirements#13630,VarLPermissions#13631,VarLDidYouAttachThe需求详情#13632]

我尝试了pandas df,但它也不起作用,当我使用下面的代码后运行sql select table时,我遇到了同样的错误。

spark.createDataFrame(pandas_df).write.saveAsTable("edl_dev_app_ent_deat_src.test")

/databricks/spark/python/pyspark/sql/pandas/conversion.py:626:FutureWarning:iteritems 已弃用,并将在未来版本中删除。请改用 .items。 [(c, t) for (_, c), t in zip(pdf_slice.iteritems(), arrow_types)]

databricks
1个回答
0
投票

从您的问题来看,尚不清楚错误是什么以及哪一行产生了该错误。 Remote_table 已经是一个 Spark 数据帧。但是,您要将其转换为 pandas 数据帧,然后再转换回 Spark 数据帧,然后再写入增量表。您可以将remote_table直接写入增量表。

完整的示例如下所示。

dbtable = <table_name>
user = <user>
password = <password>
jdbcurl = "jdbc:sqlserver://<server_name>.database.windows.net:1433" + ";database=" + <database_name> + ";user=" + user + "@" + <server_name> + ";password=" + password + ";encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"

remote_table = (spark.read
  .format("jdbc")
  .option("url", jdbcurl)
  .option("dbtable", dbtable)
  .option("user", user)
  .option("password", password)
  .load()
)
remote_table.show(5)
remote_table.write.saveAsTable("edl_dev_app_ent_deat_src.test2")
© www.soinside.com 2019 - 2024. All rights reserved.