在Spark Streaming中禁用适用于AWS Kinesis的CloudWatch

问题描述 投票:4回答:1

我想知道如果有可能吗?

这是代码:numStreams我通过使用AmazonKinesisClient API得到它

 // Create the Kinesis DStreams
    List<JavaDStream<byte[]>> streamsList = new ArrayList<>(numStreams);
    for (int i = 0; i < numStreams; i++) {
      streamsList.add(
              KinesisUtils.createStream(jssc, kinesisAppName, streamName, endpointUrl, regionName,
              InitialPositionInStream.TRIM_HORIZON, kinesisCheckpointInterval,
              StorageLevel.MEMORY_AND_DISK_2(),accessesKey,secretKey)
      );
    }

我尝试查看API,但我找不到任何禁用Apache Streaming CloudWatch的参考。

这是我尝试摆脱的警告:

17/01/23 17:46:29 WARN CWPublisherRunnable:无法将16个基准发布到CloudWatch com.amazonaws.AmazonServiceException:用户:arn:aws:iam ::: user / Kinesis_Service无权执行:cloudwatch:PutMetricData(Service :AmazonCloudWatch;状态代码:403;错误代码:AccessDenied;请求ID:*****)at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1377)at com.amazonaws.http.AmazonHttpClient.executeOneRequest( AmazonHttpClient.java:923)com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:701)at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:453)at com.amazonaws.http.AmazonHttpClient.executeWithTimer (AmazonHttpClient.java:415)com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:364)位于com.amazonaws.services的com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.doInvoke(AmazonCloudWatchClient.java:984)。 cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:954)at at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.putMetricData(AmazonCloudWatchClient.java:853)位于com.amazonaws.services.kinesis的com.amazonaws.services.kinesis.metrics.impl.DefaultCWMetricsPublisher.publishMetrics(DefaultCWMetricsPublisher.java:63)。 metrics.impl.CWPublisherRunnable.runOnce(CWPublisherRunnable.java:144)位于java.lang.Thread.run的com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable.run(CWPublisherRunnable.java:90)(未知来源)

apache-spark spark-streaming amazon-kinesis
1个回答
1
投票

前言:我知道这是一个古老的问题,但只是面对这一点所以为任何遇到Spark <= 2.3.3问题的人发布解决方案

在构建客户端时,可以使用withMetrics方法在KCL(Kinesis客户端)库级别禁用Cloudwatch度量报告。

不幸的是,Spark KinesisInputDStream方法没有提供更改此设置的方法,并且更糟糕的是,默认级别为“DETAILED”,每10秒发送10个度量标准。

我为禁用它而采取的方法是从KinesisInputDStream的方法cloudWatchCredentials提供无效凭据。 IE:.cloudWatchCredentials(SparkAWSCredentials.builder.basicCredentials("DISABLED", "DISABLED").build())

然后是每个tick的CloudWatchAsyncClient日志警告问题,我通过在spark log4j.properties config中设置以下内容来禁用它:

# Set Kinesis logging metrics to Warn - Since we intentionally provide
# wrong credentials in order to disable cloudwatch logging. Bad credential
# warning are logged at WARN level - so we still get errors.
log4j.logger.com.amazonaws.services.kinesis.metrics=ERROR

这将仅抑制metrics包类的警告(例如您提到的那个),但是如果需要,则不会抑制错误。

这远不是一个理想的解决方案,但这使我们在部署现有Spark版本时部署了解决方案。

接下来的步骤:打开Spark的票证,以便他们可以允许我们为下一个版本禁用它。

编辑 - 创建:https://issues.apache.org/jira/browse/SPARK-27420用于跟踪

© www.soinside.com 2019 - 2024. All rights reserved.