Running Spark Streaming Jobs on a Kerberos-Enabled Cluster

Use the following steps to run a Spark Streaming job on a Kerberos-enabled cluster.

Select or create a user account to be used as principal.

This should not be the kafka or spark service account.
Generate a keytab for the user.
Create a Java Authentication and Authorization Service (JAAS) login configuration file: for example, key.conf.
Add configuration settings that specify the user keytab.
The keytab and configuration files are distributed using YARN local resources. Because they reside in the current directory of the Spark YARN container, you should specify the location as ./v.keytab.

The following example specifies keytab location ./v.keytab for principal vagrant@example.com:
```
KafkaClient {

   com.sun.security.auth.module.Krb5LoginModule required

   useKeyTab=true

   keyTab="./v.keytab"

   storeKey=true

   useTicketCache=false

   serviceName="kafka"

   principal="vagrant@EXAMPLE.COM";

};
```

In your spark-submit command, pass the JAAS configuration file and keytab as local resource files, using the --filesoption, and specify the JAAS configuration file options to the JVM options specified for the driver and executor:

spark-submit \

    --files key.conf#key.conf,v.keytab#v.keytab \

    --driver-java-options "-Djava.security.auth.login.config=./key.conf" \

    --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./key.conf" \

...

Pass any relevant Kafka security options to your streaming application.
For example, the KafkaWordCount example accepts PLAINTEXTSASL as the last option in the command line:
```
KafkaWordCount /vagrant/spark-examples.jar c6402:2181 abc ts 1 PLAINTEXTSASL
```

Parent topic: Using Spark Streaming

Running Spark Streaming Jobs on a Kerberos-Enabled Cluster的更多相关文章

Kafka：ZK+Kafka+Spark Streaming集群环境搭建（十三）kafka+spark streaming打包好的程序提交时提示虚拟内存不足（Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical memory used; 2.2 GB of 2.1 G）
异常问题:Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical mem ...
Spark Streaming官方文档学习--下
Accumulators and Broadcast Variables 这些不能从checkpoint重新恢复如果想启动检查点的时候使用这两个变量,就需要创建这写变量的懒惰的singleton实例 ...
Spark Streaming Backpressure分析
1.为什么引入Backpressure 默认情况下,Spark Streaming通过Receiver以生产者生产数据的速率接收数据,计算过程中会出现batch processing time > ...
Spark Streaming编程指南
Overview A Quick Example Basic Concepts Linking Initializing StreamingContext Discretized Streams (D ...
<Spark><Spark Streaming><作业分析><JobHistory>
Intro 这篇是对一个Spark (Streaming)作业的log进行分析.用来加深对Spark application运行过程,优化空间的各种理解. Here to Start 从我这个初学者写 ...
Spark Streaming性能优化: 如何在生产环境下应对流数据峰值巨变
1.为什么引入Backpressure 默认情况下,Spark Streaming通过Receiver以生产者生产数据的速率接收数据,计算过程中会出现batch processing time > ...
Spark Streaming job的生成及数据清理总结
关于这次总结还是要从一个bug说起....... 场景描述:项目的基本处理流程为:从文件系统读取每隔一分钟上传的日志并由Spark Streaming进行计算消费,最后将结果写入InfluxDB中,然 ...
Spark Streaming数据清理内幕彻底解密
本讲从二个方面阐述: 数据清理原因和现象数据清理代码解析 Spark Core从技术研究的角度讲对Spark Streaming研究的彻底,没有你搞不定的Spark应用程序. Spark Stre ...
spark第六篇：Spark Streaming Programming Guide
预览 Spark Streaming是Spark核心API的扩展,支持高扩展,高吞吐量,实时数据流的容错流处理.数据可以从Kafka,Flume或TCP socket等许多来源获取,并且可以使用复杂的 ...

随机推荐

Ftp修改为主被动模式命令
FTP是有两种数据连接模式的,主动模式和被动模式. PORT(主动)方式:客户端向服务器的FTP端口(默认是21)发送连接请求,服务器接受连接,建立一条命令链路.当需要传送数据时,客户端在命令链路上用 ...
Linux新手随手笔记1.8
配置网卡服务将网卡的配置文件,保存成模板,叫做会话. nmcli命令查看网卡信息.nmcli是一款基于命令行的网络配置工具只有一个网卡信息,下面我们再添加一个. 公司:静态IP地址家庭:DHCP ...
eShopOnContainers 知多少[2]：Run起来
环境准备 Win10(开启Hyper-V) .NET Core SDK Docker for Windows VS2017 or VS Code Git SQL Server Management S ...
mapbox.gl文字标注算法基本介绍
Well-placed labels can be the difference between a sloppy map and a beautiful one. Labels need to cl ...
这些好用的 Chrome 插件，提升你的工作效率
本文首发于我的公众号 Linux云计算网络(id: cloud_dev),专注于干货分享,号内有 10T 书籍和视频资源,后台回复「1024」即可领取,欢迎大家关注,二维码文末可以扫. Google ...
C# 《编写高质量代码改善建议》整理&笔记 --(五)类型设计
1.区分接口和抽象类的应用场合区别: ①接口支持多继承,抽象类则不能. ②接口可以包含方法,属性,索引器,事件的签名,但不能有实现,抽象类则可以. ③接口在增加新方法后,所有的继承者都必须重构,否则 ...
五行Python代码教你用微信来控制电脑摄像头
如果说强大的标准库奠定了Python发展的基石,丰富的第三方库则是python不断发展的保证.今天就来通过itchart库来实现通过微信对电脑的一些操作. 一.安装库安装itchat itchat ...
学习ASP.NET Core Razor 编程系列十一——把新字段更新到数据库
学习ASP.NET Core Razor 编程系列目录学习ASP.NET Core Razor 编程系列一学习ASP.NET Core Razor 编程系列二——添加一个实体学习ASP.NET ...
可编辑且宽度自适应input
默认的input项是比较难看的,并且它的宽度还无法随着输入而变化,这样未免有些呆板,不过借助JavaScript可以达到宽度自适应的效果,下面为了方便使用了jQuery: <div class= ...
Django 无名参数与有名参数
无名参数配置 urls ,我们需要导入 url 模块,以()定义一个无名的变量 from django.contrib import admin from django.urls import pa ...

Running Spark Streaming Jobs on a Kerberos-Enabled Cluster

Running Spark Streaming Jobs on a Kerberos-Enabled Cluster的更多相关文章

随机推荐

热门专题