Spark Sql Connection

问题描述 投票:0回答:1

正在尝试将spark与pyspark中的oracle数据库连接,但是遇到驱动程序错误,请问有人可以帮助我。是Spark的新手,刚刚开始学习。下面是我的代码,

import pyspark

sc = pyspark.SparkContext('local[*]')
SqlContext = pyspark.SQLContext(sc)
Driver = 'C:\Hadoop\drivers\ojdbc14.jar'
OracleConnection = 'jdbc:oracle:thin:hr/hr@localhost:1521/xe'
Query = 'select * from employees'
OrcDb = SqlContext.read.format('jdbc') \
    .option('url', OracleConnection) \
    .option('dbtable', Query) \
    .option('driver', Driver) \
    .load()

OrcDb.printSchema()

下面是错误,

文件“ C:/Users/Macaulay/PycharmProjects/Spark/SparkSqlOracle.py”,第8行,在OrcDb = SqlContext.read.format('jdbc')\载入中的文件“ C:\ Hadoop \ Spark \ spark-3.0.0-preview2-bin-hadoop2.7 \ python \ lib \ pyspark.zip \ pyspark \ sql \ readwriter.py”,行166文件“ C:\ Hadoop \ Spark \ spark-3.0.0-preview2-bin-hadoop2.7 \ python \ lib \ py4j-0.10.8.1-src.zip \ py4j \ java_gateway.py”在第[[致电装饰中的第98行的文件“ C:\ Hadoop \ Spark \ spark-3.0.0-preview2-bin-hadoop2.7 \ python \ lib \ pyspark.zip \ pyspark \ sql \ utils.py”get_return_value中的第326行的文件“ C:\ Hadoop \ Spark \ spark-3.0.0-preview2-bin-hadoop2.7 \ python \ lib \ py4j-0.10.8.1-src.zip \ py4j \ protocol.py”py4j.protocol.Py4JJavaError:调用o29.load时发生错误。:java.lang.ClassNotFoundException:C:\ Hadoop \ drivers \ ojdbc14.jar在java.net.URLClassLoader $ 1.run(未知源)在java.net.URLClassLoader $ 1.run(未知源)在java.security.AccessController.doPrivileged(本机方法)在java.net.URLClassLoader.findClass(未知来源)在java.lang.ClassLoader.loadClass(未知来源)在java.lang.ClassLoader.loadClass(未知来源)在org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry $ .register(DriverRegistry.scala:45)在org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions。$ anonfun $ driverClass $ 1(JDBCOptions.scala:99)中在org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions。$ anonfun $ driverClass $ 1 $ adapted(JDBCOptions.scala:99)中在org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions $$ Lambda $ 729 / 1345147223.apply(未知来源)在scala.Option.foreach(Option.scala:407)在org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions。(JDBCOptions.scala:99)在org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions。(JDBCOptions.scala:35)在org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32)在org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339)在org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:240)在org.apache.spark.sql.DataFrameReader。$ anonfun $ load $ 2(DataFrameReader.scala:229)在org.apache.spark.sql.DataFrameReader $$ Lambda $ 719 / 1893144191.apply(来源不明)在scala.Option.getOrElse(Option.scala:189)在org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:229)在org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:179)在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处在sun.reflect.NativeMethodAccessorImpl.invoke(未知来源)在sun.reflect.DelegatingMethodAccessorImpl.invoke(未知来源)在java.lang.reflect.Method.invoke(未知来源)在py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)在py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)在py4j.Gateway.invoke(Gateway.java:282)在py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)在py4j.commands.CallCommand.execute(CallCommand.java:79)在py4j.GatewayConnection.run(GatewayConnection.java:238)在java.lang.Thread.run(未知来源)

oracle pyspark pyspark-sql ojdbc
1个回答
0
投票
发现了问题。

JDBC驱动程序应放置在Spark Jar目录中,而不是提供驱动程序的路径,我们必须提供驱动程序的服务名称。这种方法解决了这个问题。

下面是代码,

import pyspark from pyspark.sql.session import SparkSession sc = pyspark.SparkContext('local[*]') SqlContext = pyspark.SQLContext(sc) spark = SparkSession(sc) Driver = 'oracle.jdbc.driver.OracleDriver' # Driver's service name OracleConnection = 'jdbc:oracle:thin:@//localhost:1521/xe' User = 'hr' Password = 'hr' Query = 'select * from employees' OrcDb = spark.read.format('jdbc') \ .option('url', OracleConnection) \ .option('dbtable', Query) \ .option('user', User) \ .option('Password', Password) \ .option('driver', Driver) \ .load() OrcDb.printSchema()

© www.soinside.com 2019 - 2024. All rights reserved.