Spark2.x（五十七）：User capacity has reached its maximum limit（用户容量已达到最大限制）

背景：

目前服务器资源是43个节点，每个节点配置信息如下：24VCores 64G

yarn配置情况：

yarn.scheduler.minimum-allocation-mb	单个容器可申请的最小内存 1G
yarn.scheduler.maximum-allocation-mb	单个容器可申请的最大内存 51G
yarn.nodemanager.resource.cpu-vcores	NodeManager总的可用虚拟CPU个数 21vcores
yarn.nodemanager.resource.memory-mb	每个节点可用的最大内存，RM中的两个值不应该超过此值 51G

已经成功启动任务：34个app(每个app driver内存7g，executor个数1,executor内存20g)

另外成功启动分发程序：1个app（driver内存6g，executor个数25，executor内存8g）

总内存占用：8g*25+6g+34*(20g+7g)=1124g

问题描述：

然后起启动新的任务数时，yarn出现了错误：

[Web Jul  :: + ] Application is Activated, waiting for resources to assigned for AM.

User capacity has reached its maximum limit.Details：

AM Partition = <DEFAULT_PARTITION> ;

Partition Resource = <memory:2285568MB(2232G), vCores=> ;

Queue's Absolute capacity = 50.0% ;

Queue's Absolute used capacity = 50.31362% ;

Queue's Absolute max capacity=100.0% ;

Queue's capacity (absolute resource) = <memory:1142784MB(1116G), vCores:490> ;

Queue's used capacity (absolute resource) = <memory:1149952MB(1124G), vCores:70> ;

Queue's max capacity (absolute resource) = <memory:2285568(2322G), vCores:981> ;

从错误信息中可以看到是“用户容量已达到最大限制”

查看yarn调度器conf/yarn-site.xml

yarn.resourcemanager.scheduler.class=>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler

conf/capacity-scheduler.xml配置队列信息如下：

#设置有多少资源可以用来运行app master，即控制当前激活状态的应用。默认是10%。

yarn.scheduler.capacity.maximum-am-resource-percent=0.4

#设置系统中可以同时运行和等待的应用数量。默认是10000.

yarn.scheduler.capacity.maximum-applications=10000

#调度器尝试进行调度的次数。一般都是跟集群的节点数量有关。默认40（一个机架上的节点数）

yarn.scheduler.capacity.node-locality-delay=40

yarn.scheduler.capacity.queue-mappings-override.enable=false

#资源计算方法，默认是org.apache.hadoop.yarn.util.resource.DefaultResourseCalculator,它只会计算内存。DominantResourceCalculator则会计算内存和CPU。

yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator

yarn.scheduler.capacity.root.accessible-node-labels=*

#设置队列的管理员的ACL控制，管理员可以控制队列的所有应用程序。同样，它也具有继承性。

#注意：ACL的设置是user1,user2 group1,group2这种格式。如果是*则代表任何人。空格表示任何人都不允许。默认是*.

yarn.scheduler.capacity.root.acl_administer_queue=*

#访问控制列表ACL控制谁可以向该队列提交任务。如果一个用户可以向该队列提交，那么也可以提交任务到它的子队列。

yarn.scheduler.capacity.root.acl_submit_applications=*

yarn.scheduler.capacity.root.capacity=100

yarn.scheduler.capacity.root.default.acl_submit_applications=*

yarn.scheduler.capacity.root.default.capacity=0

yarn.scheduler.capacity.root.default.maximum-capacity=20

#每个任务占用的最少资源。比如，你设置成了25%。那么如果有两个用户提交任务，那么每个任务资源不超过50%。如果3个用户提交任务，那么每个任务资源不超过33%。

#如果4个用户提交任务，那么每个任务资源不超过25%。如果5个用户提交任务，那么第五个用户需要等待才能提交。默认是100，即不去做限制。

yarn.scheduler.capacity.root.default.priority=0

yarn.scheduler.capacity.root.default.state=RUNNING

yarn.scheduler.capacity.root.default.user-limit-factor=1

yarn.scheduler.capacity.root.dx.acl_administer_queue=*

yarn.scheduler.capacity.root.dx.acl_submit_applications=*

#它是队列的资源容量占比(百分比)。系统繁忙时，每个队列都应该得到设置的量的资源；当系统空闲时，该队列的资源则可以被其他的队列使用。同一层的所有队列加起来必须是100%。

yarn.scheduler.capacity.root.dx.capacity=50

#队列资源的使用上限。由于系统空闲时，队列可以使用其他的空闲资源，因此最多使用的资源量则是该参数控制。默认是-1，即禁用。

yarn.scheduler.capacity.root.dx.maximum-capacity=100 

#每个任务占用的最少资源。比如，你设置成了25%。那么如果有两个用户提交任务，那么每个任务资源不超过50%。如果3个用户提交任务，那么每个任务资源不超过33%。

#如果4个用户提交任务，那么每个任务资源不超过25%。如果5个用户提交任务，那么第五个用户需要等待才能提交。默认是100，即不去做限制。

yarn.scheduler.capacity.root.dx.minimum-user-limit-percent=100

yarn.scheduler.capacity.root.dx.ordering-policy=fifo

yarn.scheduler.capacity.root.dx.priority=0

#队列的状态，可以使RUNNING或者STOPPED.如果队列是STOPPED状态，那么新应用不会提交到该队列或者子队列。同样，如果root被设置成STOPPED，

#那么整个集群都不能提交任务了。现有的应用可以等待完成，因此队列可以优雅的退出关闭。

yarn.scheduler.capacity.root.dx.state=RUNNING

#每个用户最多使用的队列资源占比，如果设置为50.那么每个用户使用的资源最多就是50%。

yarn.scheduler.capacity.root.dx.user-limit-factor=1

yarn.scheduler.capacity.root.dc.acl_administer_queue=*

yarn.scheduler.capacity.root.dc.acl_submit_applications=*

yarn.scheduler.capacity.root.dc.capacity=50

yarn.scheduler.capacity.root.dc.maximum-capacity=50

yarn.scheduler.capacity.root.dc.minimum-user-limit-percent=100

yarn.scheduler.capacity.root.dc.ordering-policy=fifo

yarn.scheduler.capacity.root.dc.priority=0

yarn.scheduler.capacity.root.dc.state=RUNNING

yarn.scheduler.capacity.root.dc.user-limit-factor=1

yarn.scheduler.capacity.root.priority=0

yarn.scheduler.capacity.root.queues=default,dx,ocdc

问题分析：

疑惑之处：

#队列资源的使用上限。由于系统空闲时，队列可以使用其他的空闲资源，因此最多使用的资源量则是该参数控制。默认是-1，即禁用。

yarn.scheduler.capacity.root.dx.maximum-capacity=100

这个参数看起来像是dx队列最大可用容量可以占用整个集群的100%容量，但是实际上从上边错误信息来看，并不是这么实现的；

而是允许dx队列使用的最大资源是50%，然后直接抛出yarn异常，说dx队列的容量已经达到最大，而不是自动扩大。

`yarn.scheduler.capacity.A.capacity`	队列A的最低保障容量（百分比）（所有队列总和相加为100） Queue capacity in percentage (%) as a float (e.g. 12.5). The sum of capacities for all queues, at each level, must be equal to 100. Applications in the queue may consume more resources than the queue's capacity if there are free resources, providing elasticity. 以百分比（%）表示的队列容量（如12.5）。各级所有队列的容量总和必须等于100。如果有空闲资源，队列中的应用程序可能会消耗比队列容量更多的资源，从而提供弹性。
`yarn.scheduler.capacity.A.maximum-capacity`	队列A可以获取到的最大容量（不予保障）（集群空闲时从其他队列借取） Maximum queue capacity in percentage (%) as a float. This limits the elasticity for applications in the queue. Defaults to -1 which disables it. 以百分比（%）表示的最大队列容量。这限制了队列中应用程序的弹性。默认为-1，禁用它。
`yarn.scheduler.capacity.A.minimum-user-limit-percent`	队列A对单一用户的最小容量控制（在资源紧张时，每个用户的资源上限控制），例如：该值配置为25，则5个用户同时提交任务时，每个用户可以获取到的资源不超过25% Queue's max capacity (absolute resource) = <memory:2285568(2322G), vCores:981> ;Each queue enforces a limit on the percentage of resources allocated to a user at any given time, if there is demand for resources. The user limit can vary between a minimum and maximum value. The the former (the minimum value) is set to this property value and the latter (the maximum value) depends on the number of users who have submitted applications. For e.g., suppose the value of this property is 25. If two users have submitted applications to a queue, no single user can use more than 50% of the queue resources. If a third user submits an application, no single user can use more than 33% of the queue resources. With 4 or more users, no user can use more than 25% of the queues resources. A value of 100 implies no user limits are imposed. The default is 100. Value is specified as a integer. 如果对资源有需求，每个队列都强制限制在任何给定时间分配给用户的资源百分比。用户限制可以在最小值和最大值之间变化。前者（最小值）设置为该属性值，后者（最大值）取决于已提交应用程序的用户数。例如，假设该属性的值为25。如果两个用户向一个队列提交了应用程序，则任何一个用户都不能使用超过50%的队列资源。如果第三个用户提交了一个应用程序，那么任何一个用户都不能使用超过33%的队列资源。对于4个或更多用户，任何用户都不能使用超过25%的队列资源。值为100表示不施加用户限制。默认值为100。值被指定为整数。
`yarn.scheduler.capacity.A.user-limit-factor`	队列A对单一用户的最大容量控制（在资源空闲时，每个用户可以申请到的资源上限），例如：该值配置为1.5，则单个用户可以申请到的最大资源为：min（1.5*`capacity，maximum-capacity`） The multiple of the queue capacity which can be configured to allow a single user to acquire more resources. By default this is set to 1 which ensures that a single user can never take more than the queue's configured capacity irrespective of how idle th cluster is. Value is specified as a float. 可以配置为允许单个用户获取更多资源的队列容量的倍数。默认情况下，该值设置为1，以确保无论集群的空闲程度如何，单个用户都不能占用超过队列配置的容量。值被指定为浮点。


从上边配置和列表描述我们得知：
latest-maximum-capacity（最终单个队列能拥有的最大容量）
=min(capacity*user-limit-factor,maximum-capacity)
=min(1*50,100)=50%
上边公式参数备注：
1）capacity代表：yarn.scheduler.capacity.root.dx.capacity=50
2）maximum-capacity代表：yarn.scheduler.capacity.root.dx.maximum-capacity=100
3）user-limit-factor代表：yarn.scheduler.capacity.root.dx.user-limit-factor=1
从这个公式上来分析，恰好能和上边遇到的错误信息保持一致，因此这也是最终dx队里只能使用总容量的50%的原因。

问题解决：

修改队列配置
如果想要修改队列或者调度器的配置，可以修改

vi $HADOOP_CONF_DIR/capacity-scheduler.xml

#它是队列的资源容量占比(百分比)。系统繁忙时，每个队列都应该得到设置的量的资源；当系统空闲时，该队列的资源则可以被其他的队列使用。同一层的所有队列加起来必须是100%。

yarn.scheduler.capacity.root.dx.capacity=95

yarn.scheduler.capacity.root.dc.capacity=5

或者

设置参数yarn.scheduler.capacity.dx.user-limit-factor,提高该配置值。

vi $HADOOP_CONF_DIR/capacity-scheduler.xml

#它是队列的资源容量占比(百分比)。系统繁忙时，每个队列都应该得到设置的量的资源；当系统空闲时，该队列的资源则可以被其他的队列使用。同一层的所有队列加起来必须是100%。

yarn.scheduler.capacity.dx.user-limit-factor=2

修改完成后，需要执行下面的命令：

$HADOOP_YARN_HOME/bin/yarn rmadmin -refreshQueues

注意：

队列不能被删除，只能新增。

更新队列的配置需要是有效的值

同层级的队列容量限制想加需要等于100%。

一旦设置完这些队列属性，就可以在web ui上看到了。可以访问下面的连接：xxx:8088/scheduler

关于Yarn Capacity更多，更官方问题请参考官网文档：《Hadoop: Capacity Scheduler》