关于使用scala的spark nlp错误

Question

我是spark-nlp的初学者，我正在按照johnsnowlabs中的示例进行学习。我在数据块中使用SCALA。

[当我遵循以下示例时，

import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler().
    setInputCol("text").
    setOutputCol("document")

val regexTokenizer = new Tokenizer().
    setInputCols(Array("sentence")).
    setOutputCol("token")
val sentenceDetector = new SentenceDetector().
    setInputCols(Array("document")).
    setOutputCol("sentence")

val finisher = new Finisher()
    .setInputCols("token")
    .setIncludeMetadata(true)


finisher.withColumn("newCol", explode(arrays_zip($"finished_token", $"finished_ner")))

我在运行最后一行时遇到以下错误：

command-786892578143744:2: error: value withColumn is not a member of com.johnsnowlabs.nlp.Finisher
finisher.withColumn("newCol", explode(arrays_zip($"finished_token", $"finished_ner")))

这可能是什么原因？

[当我尝试执行示例时，仅省略了这一行，因此添加了额外的代码行

val pipeline = new Pipeline().
    setStages(Array(
        documentAssembler,
        sentenceDetector,
        regexTokenizer,
        finisher
    ))

val data1 = Seq("hello, this is an example sentence").toDF("text")

pipeline.fit(data1).transform(data1).toDF("text")

我在运行最后一行时遇到另一个错误：

java.lang.IllegalArgumentException: requirement failed: The number of columns doesn't match.

有人可以帮助我解决此问题吗？

谢谢

Answer 1

我认为您有两个问题，1.首先，您尝试将withColumn应用于注释器，而应在数据框上执行。2.我认为这是转换后来自toDF（）的问题。您需要更多列，而只提供1。也可能根本不需要那个toDF（）。

Alberto。

关于使用scala的spark nlp错误

问题描述投票：0回答：1

1个回答

最新问题

关于使用scala的spark nlp错误

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1