1.spark 2.2内存占用计算公式

https://blog.csdn.net/lingbo229/article/details/80914283

2.spark on yarn内存分配**

本文转自:http://blog.javachen.com/2015/06/09/memory-in-spark-on-yarn.html?utm_source=tuicool

此文解决了Spark yarn-cluster模式运行时,内存不足的问题。

Spark yarn-cluster模式运行时,注意yarn.app.mapreduce.am.resource.mb的设置。默认为1G

本文主要了解Spark On YARN部署模式下的内存分配情况,因为没有深入研究Spark的源代码,所以只能根据日志去看相关的源代码,从而了解“为什么会这样,为什么会那样”。

说明

按照Spark应用程序中的driver分布方式不同,Spark on YARN有两种模式: yarn-client模式、yarn-cluster模式。

当在YARN上运行Spark作业,每个Spark executor作为一个YARN容器运行。Spark可以使得多个Tasks在同一个容器里面运行。

下图是yarn-cluster模式的作业执行图,图片来源于网络:

关于Spark On YARN相关的配置参数,请参考Spark配置参数。本文主要讨论内存分配情况,所以只需要关注以下几个内心相关的参数:

  • spark.driver.memory:默认值512m
  • spark.executor.memory:默认值512m
  • spark.yarn.am.memory:默认值512m
  • spark.yarn.executor.memoryOverhead:值为executorMemory * 0.07, with minimum of 384
  • spark.yarn.driver.memoryOverhead:值为driverMemory * 0.07, with minimum of 384
  • spark.yarn.am.memoryOverhead:值为AM memory * 0.07, with minimum of 384

注意:

  • --executor-memory/spark.executor.memory 控制 executor 的堆的大小,但是 JVM 本身也会占用一定的堆空间,比如内部的 String 或者直接 byte buffer,spark.yarn.XXX.memoryOverhead属性决定向 YARN 请求的每个 executor 或dirver或am 的额外堆内存大小,默认值为 max(384, 0.07 * spark.executor.memory)
  • 在 executor 执行的时候配置过大的 memory 经常会导致过长的GC延时,64G是推荐的一个 executor 内存大小的上限。
  • HDFS client 在大量并发线程时存在性能问题。大概的估计是每个 executor 中最多5个并行的 task 就可以占满写入带宽。

另外,因为任务是提交到YARN上运行的,所以YARN中有几个关键参数,参考YARN的内存和CPU配置

  • yarn.app.mapreduce.am.resource.mb:AM能够申请的最大内存,默认值为1536MB
  • yarn.nodemanager.resource.memory-mb:nodemanager能够申请的最大内存,默认值为8192MB
  • yarn.scheduler.minimum-allocation-mb:调度时一个container能够申请的最小资源,默认值为1024MB
  • yarn.scheduler.maximum-allocation-mb:调度时一个container能够申请的最大资源,默认值为8192MB

测试

Spark集群测试环境为:

  • master:64G内存,16核cpu
  • worker:128G内存,32核cpu
  • worker:128G内存,32核cpu
  • worker:128G内存,32核cpu
  • worker:128G内存,32核cpu

注意:YARN集群部署在Spark集群之上的,每一个worker节点上同时部署了一个NodeManager,并且YARN集群中的配置如下:

  1. <property>
  2. <name>yarn.nodemanager.resource.memory-mb</name>
  3. <value>106496</value> <!-- 104G -->
  4. </property>
  5. <property>
  6. <name>yarn.scheduler.minimum-allocation-mb</name>
  7. <value>2048</value>
  8. </property>
  9. <property>
  10. <name>yarn.scheduler.maximum-allocation-mb</name>
  11. <value>106496</value>
  12. </property>
  13. <property>
  14. <name>yarn.app.mapreduce.am.resource.mb</name>
  15. <value>2048</value>
  16. </property>

将spark的日志基本调为DEBUG,并将log4j.logger.org.apache.hadoop设置为WARN建设不必要的输出,修改/etc/spark/conf/log4j.properties:

  1. # Set everything to be logged to the console
  2. log4j.rootCategory=DEBUG, console
  3. log4j.appender.console=org.apache.log4j.ConsoleAppender
  4. log4j.appender.console.target=System.err
  5. log4j.appender.console.layout=org.apache.log4j.PatternLayout
  6. log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
  7. # Settings to quiet third party logs that are too verbose
  8. log4j.logger.org.eclipse.jetty=WARN
  9. log4j.logger.org.apache.hadoop=WARN
  10. log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
  11. log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
  12. log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

接下来是运行测试程序,以官方自带的SparkPi例子为例,下面主要测试client模式,至于cluster模式请参考下面的过程。运行下面命令:

  1. spark-submit --class org.apache.spark.examples.SparkPi \
  2. --master yarn-client \
  3. --num-executors 4 \
  4. --driver-memory 2g \
  5. --executor-memory 3g \
  6. --executor-cores 4 \
  7. /usr/lib/spark/lib/spark-examples-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar \
  8. 100000

观察输出日志(无关的日志被略去):

  1. 15/06/08 13:57:01 INFO SparkContext: Running Spark version 1.3.0
  2. 15/06/08 13:57:02 INFO SecurityManager: Changing view acls to: root
  3. 15/06/08 13:57:02 INFO SecurityManager: Changing modify acls to: root
  4. 15/06/08 13:57:03 INFO MemoryStore: MemoryStore started with capacity 1060.3 MB
  5. 15/06/08 13:57:04 DEBUG YarnClientSchedulerBackend: ClientArguments called with: --arg bj03-bi-pro-hdpnamenn:51568 --num-executors 4 --num-executors 4 --executor-memory 3g --executor-memory 3g --executor-cores 4 --executor-cores 4 --name Spark Pi
  6. 15/06/08 13:57:04 DEBUG YarnClientSchedulerBackend: [actor] handled message (24.52531 ms) ReviveOffers from Actor[akka://sparkDriver/user/CoarseGrainedScheduler#864850679]
  7. 15/06/08 13:57:05 INFO Client: Requesting a new application from cluster with 4 NodeManagers
  8. 15/06/08 13:57:05 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (106496 MB per container)
  9. 15/06/08 13:57:05 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
  10. 15/06/08 13:57:05 INFO Client: Setting up container launch context for our AM
  11. 15/06/08 13:57:07 DEBUG Client: ===============================================================================
  12. 15/06/08 13:57:07 DEBUG Client: Yarn AM launch context:
  13. 15/06/08 13:57:07 DEBUG Client: user class: N/A
  14. 15/06/08 13:57:07 DEBUG Client: env:
  15. 15/06/08 13:57:07 DEBUG Client: CLASSPATH -> <CPS>/__spark__.jar<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/*<CPS>$HADOOP_COMMON_HOME/lib/*<CPS>$HADOOP_HDFS_HOME/*<CPS>$HADOOP_HDFS_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$HADOOP_YARN_HOME/*<CPS>$HADOOP_YARN_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>:/usr/lib/spark/lib/spark-assembly.jar::/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hive/lib/*:/usr/lib/flume-ng/lib/*:/usr/lib/paquet/lib/*:/usr/lib/avro/lib/*
  16. 15/06/08 13:57:07 DEBUG Client: SPARK_DIST_CLASSPATH -> :/usr/lib/spark/lib/spark-assembly.jar::/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hive/lib/*:/usr/lib/flume-ng/lib/*:/usr/lib/paquet/lib/*:/usr/lib/avro/lib/*
  17. 15/06/08 13:57:07 DEBUG Client: SPARK_YARN_CACHE_FILES_FILE_SIZES -> 97237208
  18. 15/06/08 13:57:07 DEBUG Client: SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1433742899916_0001
  19. 15/06/08 13:57:07 DEBUG Client: SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE
  20. 15/06/08 13:57:07 DEBUG Client: SPARK_USER -> root
  21. 15/06/08 13:57:07 DEBUG Client: SPARK_YARN_MODE -> true
  22. 15/06/08 13:57:07 DEBUG Client: SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1433743027399
  23. 15/06/08 13:57:07 DEBUG Client: SPARK_YARN_CACHE_FILES -> hdfs://mycluster:8020/user/root/.sparkStaging/application_1433742899916_0001/spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar#__spark__.jar
  24. 15/06/08 13:57:07 DEBUG Client: resources:
  25. 15/06/08 13:57:07 DEBUG Client: __spark__.jar -> resource { scheme: "hdfs" host: "mycluster" port: 8020 file: "/user/root/.sparkStaging/application_1433742899916_0001/spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar" } size: 97237208 timestamp: 1433743027399 type: FILE visibility: PRIVATE
  26. 15/06/08 13:57:07 DEBUG Client: command:
  27. 15/06/08 13:57:07 DEBUG Client: /bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp '-Dspark.eventLog.enabled=true' '-Dspark.executor.instances=4' '-Dspark.executor.memory=3g' '-Dspark.executor.cores=4' '-Dspark.driver.port=51568' '-Dspark.serializer=org.apache.spark.serializer.KryoSerializer' '-Dspark.driver.appUIAddress=http://bj03-bi-pro-hdpnamenn:4040' '-Dspark.executor.id=<driver>' '-Dspark.kryo.classesToRegister=scala.collection.mutable.BitSet,scala.Tuple2,scala.Tuple1,org.apache.spark.mllib.recommendation.Rating' '-Dspark.driver.maxResultSize=8g' '-Dspark.jars=file:/usr/lib/spark/lib/spark-examples-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar' '-Dspark.driver.memory=2g' '-Dspark.eventLog.dir=hdfs://mycluster:8020/user/spark/applicationHistory' '-Dspark.app.name=Spark Pi' '-Dspark.fileserver.uri=http://X.X.X.X:49172' '-Dspark.tachyonStore.folderName=spark-81ae0186-8325-40f2-867b-65ee7c922357' -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.deploy.yarn.ExecutorLauncher --arg 'bj03-bi-pro-hdpnamenn:51568' --executor-memory 3072m --executor-cores 4 --num-executors 4 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
  28. 15/06/08 13:57:07 DEBUG Client: ===============================================================================

Will allocate AM container, with 896 MB memory including 384 MB overhead日志可以看到,AM占用了896 MB内存,除掉384 MB的overhead内存,实际上只有512 MB,即spark.yarn.am.memory的默认值,另外可以看到YARN集群有4个NodeManager,每个container最多有106496 MB内存。

Yarn AM launch context启动了一个Java进程,设置的JVM内存为512m,见/bin/java -server -Xmx512m

这里为什么会取默认值呢?查看打印上面这行日志的代码,见org.apache.spark.deploy.yarn.Client:

  1. private def verifyClusterResources(newAppResponse: GetNewApplicationResponse): Unit = {
  2. val maxMem = newAppResponse.getMaximumResourceCapability().getMemory()
  3. logInfo("Verifying our application has not requested more than the maximum " +
  4. s"memory capability of the cluster ($maxMem MB per container)")
  5. val executorMem = args.executorMemory + executorMemoryOverhead
  6. if (executorMem > maxMem) {
  7. throw new IllegalArgumentException(s"Required executor memory (${args.executorMemory}" +
  8. s"+$executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster!")
  9. }
  10. val amMem = args.amMemory + amMemoryOverhead
  11. if (amMem > maxMem) {
  12. throw new IllegalArgumentException(s"Required AM memory (${args.amMemory}" +
  13. s"+$amMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster!")
  14. }
  15. logInfo("Will allocate AM container, with %d MB memory including %d MB overhead".format(
  16. amMem,
  17. amMemoryOverhead))
  18. }

args.amMemory来自ClientArguments类,这个类中会校验输出参数:

  1. private def validateArgs(): Unit = {
  2. if (numExecutors <= 0) {
  3. throw new IllegalArgumentException(
  4. "You must specify at least 1 executor!\n" + getUsageMessage())
  5. }
  6. if (executorCores < sparkConf.getInt("spark.task.cpus", 1)) {
  7. throw new SparkException("Executor cores must not be less than " +
  8. "spark.task.cpus.")
  9. }
  10. if (isClusterMode) {
  11. for (key <- Seq(amMemKey, amMemOverheadKey, amCoresKey)) {
  12. if (sparkConf.contains(key)) {
  13. println(s"$key is set but does not apply in cluster mode.")
  14. }
  15. }
  16. amMemory = driverMemory
  17. amCores = driverCores
  18. } else {
  19. for (key <- Seq(driverMemOverheadKey, driverCoresKey)) {
  20. if (sparkConf.contains(key)) {
  21. println(s"$key is set but does not apply in client mode.")
  22. }
  23. }
  24. sparkConf.getOption(amMemKey)
  25. .map(Utils.memoryStringToMb)
  26. .foreach { mem => amMemory = mem }
  27. sparkConf.getOption(amCoresKey)
  28. .map(_.toInt)
  29. .foreach { cores => amCores = cores }
  30. }
  31. }

从上面代码可以看到当 isClusterMode 为true时,则args.amMemory值为driverMemory的值;否则,则从spark.yarn.am.memory中取,如果没有设置该属性,则取默认值512m。isClusterMode 为true的条件是 userClass 不为空,def isClusterMode: Boolean = userClass != null,即输出参数需要有--class参数,而从下面日志可以看到ClientArguments的输出参数中并没有该参数。

  1. 15/06/08 13:57:04 DEBUG YarnClientSchedulerBackend: ClientArguments called with: --arg bj03-bi-pro-hdpnamenn:51568 --num-executors 4 --num-executors 4 --executor-memory 3g --executor-memory 3g --executor-cores 4 --executor-cores 4 --name Spark Pi

故,要想设置AM申请的内存值,要么使用cluster模式,要么在client模式中,是有--conf手动设置spark.yarn.am.memory属性,例如:

  1. spark-submit --class org.apache.spark.examples.SparkPi \
  2. --master yarn-client \
  3. --num-executors 4 \
  4. --driver-memory 2g \
  5. --executor-memory 3g \
  6. --executor-cores 4 \
  7. --conf spark.yarn.am.memory=1024m \
  8. /usr/lib/spark/lib/spark-examples-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar \
  9. 100000

打开YARN管理界面,可以看到:

a. Spark Pi 应用启动了5个Container,使用了18G内存、5个CPU core

b. YARN为AM启动了一个Container,占用内存为2048M

c. YARN启动了4个Container运行任务,每一个Container占用内存为4096M

为什么会是2G +4G *4=18G呢?第一个Container只申请了2G内存,是因为我们的程序只为AM申请了512m内存,而yarn.scheduler.minimum-allocation-mb参数决定了最少要申请2G内存。至于其余的Container,我们设置了executor-memory内存为3G,为什么每一个Container占用内存为4096M呢?

为了找出规律,多测试几组数据,分别测试并收集executor-memory为3G、4G、5G、6G时每个executor对应的Container内存申请情况:

  • executor-memory=3g:2G+4G * 4=18G
  • executor-memory=4g:2G+6G * 4=26G
  • executor-memory=5g:2G+6G * 4=26G
  • executor-memory=6g:2G+8G * 4=34G

关于这个问题,我是查看源代码,根据org.apache.spark.deploy.yarn.ApplicationMaster -> YarnRMClient -> YarnAllocator的类查找路径找到YarnAllocator中有这样一段代码:

  1. // Executor memory in MB.
  2. protected val executorMemory = args.executorMemory
  3. // Additional memory overhead.
  4. protected val memoryOverhead: Int = sparkConf.getInt("spark.yarn.executor.memoryOverhead",
  5. math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, MEMORY_OVERHEAD_MIN))
  6. // Number of cores per executor.
  7. protected val executorCores = args.executorCores
  8. // Resource capability requested for each executors
  9. private val resource = Resource.newInstance(executorMemory + memoryOverhead, executorCores)

因为没有具体的去看YARN的源代码,所以这里猜测Container的大小是根据executorMemory + memoryOverhead计算出来的,大概的规则是每一个Container的大小必须为yarn.scheduler.minimum-allocation-mb值的整数倍,当executor-memory=3g时,executorMemory + memoryOverhead为3G+384M=3456M,需要申请的Container大小为yarn.scheduler.minimum-allocation-mb * 2 =4096m=4G,其他依此类推。

注意:

  • Yarn always rounds up memory requirement to multiples of yarn.scheduler.minimum-allocation-mb, which by default is 1024 or 1GB.
  • Spark adds an overhead to SPARK_EXECUTOR_MEMORY/SPARK_DRIVER_MEMORY before asking Yarn for the amount.

另外,需要注意memoryOverhead的计算方法,当executorMemory的值很大时,memoryOverhead的值相应会变大,这个时候就不是384m了,相应的Container申请的内存值也变大了,例如:当executorMemory设置为90G时,memoryOverhead值为math.max(0.07 * 90G, 384m)=6.3G,其对应的Container申请的内存为98G。

回头看看给AM对应的Container分配2G内存原因,512+384=896,小于2G,故分配2G,你可以在设置spark.yarn.am.memory的值之后再来观察。

打开Spark的管理界面 http://ip:4040 ,可以看到driver和Executor中内存的占用情况:

从上图可以看到Executor占用了1566.7 MB内存,这是怎样计算出来的?参考Spark on Yarn: Where Have All the Memory Gone?这篇文章,totalExecutorMemory的计算方式为:

  1. //yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
  2. val MEMORY_OVERHEAD_FACTOR = 0.07
  3. val MEMORY_OVERHEAD_MIN = 384
  4. //yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
  5. protected val memoryOverhead: Int = sparkConf.getInt("spark.yarn.executor.memoryOverhead",
  6. math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, MEMORY_OVERHEAD_MIN))
  7. ......
  8. val totalExecutorMemory = executorMemory + memoryOverhead
  9. numPendingAllocate.addAndGet(missing)
  10. logInfo(s"Will allocate $missing executor containers, each with $totalExecutorMemory MB " +
  11. s"memory including $memoryOverhead MB overhead")

这里我们给executor-memory设置的3G内存,memoryOverhead的值为math.max(0.07 * 3072, 384)=384,其最大可用内存通过下面代码来计算:

  1. //core/src/main/scala/org/apache/spark/storage/BlockManager.scala
  2. /** Return the total amount of storage memory available. */
  3. private def getMaxMemory(conf: SparkConf): Long = {
  4. val memoryFraction = conf.getDouble("spark.storage.memoryFraction", 0.6)
  5. val safetyFraction = conf.getDouble("spark.storage.safetyFraction", 0.9)
  6. (Runtime.getRuntime.maxMemory * memoryFraction * safetyFraction).toLong
  7. }

即,对于executor-memory设置3G时,executor内存占用大约为 3072m * 0.6 * 0.9 = 1658.88m,注意:实际上是应该乘以Runtime.getRuntime.maxMemory的值,该值小于3072m。

上图中driver占用了1060.3 MB,此时driver-memory的值是位2G,故driver中存储内存占用为:2048m * 0.6 * 0.9 =1105.92m,注意:实际上是应该乘以Runtime.getRuntime.maxMemory的值,该值小于2048m。

这时候,查看worker节点CoarseGrainedExecutorBackend进程启动脚本:

  1. $ jps
  2. 46841 Worker
  3. 21894 CoarseGrainedExecutorBackend
  4. 9345
  5. 21816 ExecutorLauncher
  6. 43369
  7. 24300 NodeManager
  8. 38012 JournalNode
  9. 36929 QuorumPeerMain
  10. 22909 Jps
  11. $ ps -ef|grep 21894
  12. nobody 21894 21892 99 17:28 ? 00:04:49 /usr/java/jdk1.7.0_71/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms3072m -Xmx3072m -Djava.io.tmpdir=/data/yarn/local/usercache/root/appcache/application_1433742899916_0069/container_1433742899916_0069_01_000003/tmp -Dspark.driver.port=60235 -Dspark.yarn.app.container.log.dir=/data/yarn/logs/application_1433742899916_0069/container_1433742899916_0069_01_000003 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@bj03-bi-pro-hdpnamenn:60235/user/CoarseGrainedScheduler --executor-id 2 --hostname X.X.X.X --cores 4 --app-id application_1433742899916_0069 --user-class-path file:/data/yarn/local/usercache/root/appcache/application_1433742899916_0069/container_1433742899916_0069_01_000003/__app__.jar

可以看到每个CoarseGrainedExecutorBackend进程分配的内存为3072m,如果我们想查看每个executor的jvm运行情况,可以开启jmx。在/etc/spark/conf/spark-defaults.conf中添加下面一行代码:

  1. spark.executor.extraJavaOptions -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false

然后,通过jconsole监控jvm堆内存运行情况,这样方便调试内存大小。

总结

由上可知,在client模式下,AM对应的Container内存由spark.yarn.am.memory加上spark.yarn.am.memoryOverhead来确定,executor加上spark.yarn.executor.memoryOverhead的值之后确定对应Container需要申请的内存大小,driver和executor的内存加上spark.yarn.driver.memoryOverheadspark.yarn.executor.memoryOverhead的值之后再乘以0.54确定storage memory内存大小。在YARN中,Container申请的内存大小必须为yarn.scheduler.minimum-allocation-mb的整数倍。

下面这张图展示了Spark on YARN 内存结构,图片来自How-to: Tune Your Apache Spark Jobs (Part 2)

至于cluster模式下的分析,请参考上面的过程。希望这篇文章对你有所帮助

spark on yarn模式下内存资源管理(笔记2)的更多相关文章

  1. spark on yarn模式下内存资源管理(笔记1)

    问题:1. spark中yarn集群资源管理器,container资源容器与集群各节点node,spark应用(application),spark作业(job),阶段(stage),任务(task) ...

  2. spark on yarn模式下配置spark-sql访问hive元数据

    spark on yarn模式下配置spark-sql访问hive元数据 目的:在spark on yarn模式下,执行spark-sql访问hive的元数据.并对比一下spark-sql 和hive ...

  3. spark 在yarn模式下提交作业

    1.spark在yarn模式下提交作业需要启动hdfs集群和yarn,具体操作参照:hadoop 完全分布式集群搭建 2.spark需要配置yarn和hadoop的参数目录 将spark/conf/目 ...

  4. spark on yarn模式里需要有时手工释放linux内存

    为什么要提出这个问题? spark跑YARN模式或Client模式提交任务不成功(application state: ACCEPTED) 然后执行 [spark@master spark--bin- ...

  5. Spark on YARN模式的安装(spark-1.6.1-bin-hadoop2.6.tgz + hadoop-2.6.0.tar.gz)(master、slave1和slave2)(博主推荐)

    说白了 Spark on YARN模式的安装,它是非常的简单,只需要下载编译好Spark安装包,在一台带有Hadoop YARN客户端的的机器上运行即可.  Spark on YARN简介与运行wor ...

  6. spark跑YARN模式或Client模式提交任务不成功(application state: ACCEPTED)

    不多说,直接上干货! 问题详情 电脑8G,目前搭建3节点的spark集群,采用YARN模式. master分配2G,slave1分配1G,slave2分配1G.(在安装虚拟机时) export SPA ...

  7. flink on yarn模式下两种提交job方式

    yarn集群搭建,参见hadoop 完全分布式集群搭建 通过yarn进行资源管理,flink的任务直接提交到hadoop集群 1.hadoop集群启动,yarn需要运行起来.确保配置HADOOP_HO ...

  8. spark跑YARN模式或Client模式提交任务不成功(application state: ACCEPTED)(转)

    不多说,直接上干货! 问题详情 电脑8G,目前搭建3节点的spark集群,采用YARN模式. master分配2G,slave1分配1G,slave2分配1G.(在安装虚拟机时) export SPA ...

  9. Spark- Spark Yarn模式下跑yarn-client无法初始化SparkConext,Over usage of virtual memory

    在spark yarn模式下跑yarn-client时出现无法初始化SparkContext错误. // :: INFO mapreduce.Job: Task Id : attempt_142829 ...

随机推荐

  1. Windows 7旗舰版安装Visual Studio 2013 Ultimate的系统必备及注意事项

    系统必备: 1.Windows7 SP1 2.IE 10

  2. 操作系统学习笔记:I/O输入系统

    计算机两大主要任务:IO操作和计算处理.许多情况下,主要是IO操作,计算处理只是附带的(而操作系统的两大任务是管理物理设备和为应用程序提供一个虚拟机器的抽象).操作系统在IO方面的作用是管理IO操作和 ...

  3. Java抽象类和接口的区别(好长时间没看这种文章了)

    Java抽象类和接口的区别(好长时间没看这种文章了) abstract class和interface是Java语言中对于抽象类定义进行支持的两种机制,正是由于这两种机制的存在,才赋予了Java强大的 ...

  4. 部署到Linux并配置Java定时任务

    Java项目部署到Linux并配置定时任务 https://blog.csdn.net/u013850277/article/details/53447391 1.在Eclipse中将程序开发好,并进 ...

  5. poyla计数问题

    关于poyla定理,首先推荐两篇很好的文章阅读 2001-----符文杰<poyla原理及其应用> 2008-----陈瑜希<poyla计数法的应用> 在然后就是自己的学习笔记 ...

  6. bzoj2743 [HEOI2012]采花——树状数组+离线

    题目:https://www.lydsy.com/JudgeOnline/problem.php?id=2743 和 HH的项链 那道题很像,也是类似的做法: 但不同的是这里的点只有有前驱时才起作用: ...

  7. linux 忘记root(这里以centos 6.5为例)密码的解决办法

    在使用linux的过程中有时候会忘记root用户的密码(尤其是进行交接而文档内容不全的时候),这个时候我们就可以进入单用户模式来重置root用户密码.下面来讲解重置root密码的方式,也可以说是破解r ...

  8. 无参数的lambda匿名函数

    lambda 语法: lambda [arg1[,arg2,arg3....argN]]:expression 1.单个参数的: g = lambda x:x*2 print g(3) 结果是6 2. ...

  9. Hibernate4 拦截器(Interceptor) 实现实体类增删改的日志记录

    转自:https://blog.csdn.net/he90227/article/details/44783099 开发应用程序的过程中,经常会对一些比较重要的数据修改都需要写日志.在实际工作的工程中 ...

  10. 12_传智播客iOS视频教程_注释和函数的定义和调用

    OC的注释和C语言的注释一模一样.它也分单行注释和多行注释. OC程序里面当然可以定义一个函数.并且定义的方式方法和调用的方式方法和我们C语言是一模一样的.OC有什么好学的?一样还学个什么呢? 重点是 ...