我正在尝试使用gobblin将mysql转换为hdfs数据。使用以下步骤运行mysql-to-gobblin.pull时:
1)启动hadoop:sbin\start-all.cmd
2)启动mysql服务:sudo service mysql start
3)设置GOBBLIN_WORK_DIR:export GOBBLIN_WORK_DIR=/mnt/c/users/name/incubator-gobblin/GOBBLIN_WORK_DIR
4)设置GOBBLIN_JOB_CONFIG_DIRexport GOBBLIN_JOB_CONFIG_DIR=/mnt/c/users/name/incubator-gobblin/GOBBLIN_JOB_CONFIG_DIR
5)独立启动bin/gobblin.sh service standalone start --jars /mnt/C/Users/name/incubator-gobblin/build/gobblin-sql/libs/gobblin-sql-0.15.0.jar
给出以下错误
ERROR [JobScheduler-0] org.apache.gobblin.scheduler.JobScheduler$NonScheduledJobRunner 637 - Failed to run job GobblinMySql org.apache.gobblin.runtime.JobException: Failed to run job GobblinMySql Caused by: java.lang.ClassNotFoundException: org.apache.gobblin.source.extractor.extract.jdbc.MysqlSource
下面是mysql-to-gobblin.pull文件
# Job properties job.name=GobblinMySql job.group=MySql job.description=Data pull from MySql # Extract properties extract.table.type=snapshot_only extract.table.name=user # Property to consider the extract as full dump extract.is.full=true # Source properties # Source properties - source class to extract data from Mysql Source source.class=org.apache.gobblin.source.extractor.extract.jdbc.MysqlSource # Source properties source.max.number.of.partitions=1 source.querybased.partition.interval=1 source.querybased.is.compression=true source.querybased.watermark.type=timestamp # Converter properties - Record from mysql source will be processed by the below series of converters converter.classes=gobblin.converter.avro.JsonIntermediateToAvroConverter # date columns format converter.avro.timestamp.format=yyyy-MM-dd HH:mm:ss'.0' converter.avro.date.format=yyyy-MM-dd converter.avro.time.format=HH:mm:ss # Qualitychecker properties qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy qualitychecker.task.policy.types=OPTIONAL,OPTIONAL # Publisher properties data.publisher.type=gobblin.publisher.BaseDataPublisher source.querybased.schema=praveen_schema source.entity=user source.querybased.extract.type=snapshot writer.builder.class=org.apache.gobblin.writer.SimpleDataWriterBuilder writer.file.path.type=tablename writer.destination.type=HDFS writer.output.format=txt data.publisher.type=org.apache.gobblin.publisher.BaseDataPublisher mr.job.max.mappers=1 metrics.reporting.file.enabled=true metrics.log.dir=/gobblin-kafka/metrics metrics.reporting.file.suffix=txt bootstrap.with.offset=earliest fs.uri=hdfs://localhost:9000 writer.fs.uri=hdfs://localhost:9000 state.store.fs.uri=hdfs://localhost:9000 mr.job.root.dir=/gobblin-kafka/working state.store.dir=/gobblin-kafka/state-store task.data.root.dir=/jobs/kafkaetl/gobblin/gobblin-kafka/task-data data.publisher.final.dir=/gobblintest/job-output
我正在从
/mnt/c/users/name/incubator-gobblin/build/gobblin-distribution/distributions/gobblin-dist
目录运行此命令。
我需要在这里进行哪些更改?我该如何解决?
我正在尝试使用gobblin将mysql转换为hdfs数据。使用以下步骤运行mysql-to-gobblin.pull时:1)启动hadoop:sbin \ start-all.cmd 2)启动mysql服务:sudo服务mysql ...
解决方案是添加此jar或依赖项以摆脱原因:java.lang.ClassNotFoundException:org.apache.gobblin.source.extractor.extract.jdbc.MysqlSource