tensorflow节点布放（device assignment of node）算法：simpler

tensorflow v0.9中目前在用的devcie assignment算法是simple placer算法，相比于白皮书中cost model算法实现简单。simpler placer算法优先选择/gpu:0设备，但不支持 multi gpu assignment。

白皮书提到的cost model可以根据设备资源代价、数据传输代价平衡分配设备，在v0.9版本中有部分实现，但还未开放使用，见 core/graph/costmodel.cc

simple_placer的实现代码在文件python/core/common_runtime/simple_placer.cc，其中包含device_assignment的核心功能。

core/common_runtime/simple_placer_test.cc测试片段如下

 ////////////////////////////////////////////////////////////////////////////////

 //

 // A SimplePlacerTest method has three phases:

 //

 // 1. Build a TensorFlow graph, with no (or partial) device assignments.

 // 2. Attempt to compute a placement using the SimplePlacer.

 // 3. EITHER: test that the constraints implied by the graph are respected;

 //    or that an appropriate error was reported.

 //

 ////////////////////////////////////////////////////////////////////////////////

 class SimplePlacerTest : public ::testing::Test {

  protected:

   SimplePlacerTest() {

     // Build a set of 10 GPU and 10 CPU devices.

     // NOTE: this->local_devices_ owns the device objects;

     // this->devices_ contains borrowed pointers to the device

     // objects.

     for (int i = ; i < ; ++i) {    // 添加了10 cpu和10 gpu的fake devices

       local_devices_.emplace_back(FakeDevice::MakeCPU(

           strings::StrCat("/job:a/replica:0/task:0/cpu:", i)));

       devices_.AddDevice(local_devices_.back().get());

       // Insert the GPUs in reverse order.

       local_devices_.emplace_back(FakeDevice::MakeGPU(

           strings::StrCat("/job:a/replica:0/task:0/gpu:",  - i)));

       devices_.AddDevice(local_devices_.back().get());

     }

   }

   ...

 }

 ...

 // Test that a graph with no constraints will successfully assign nodes to the

 // "best available" device (i.e. prefer GPU over CPU).

 TEST_F(SimplePlacerTest, TestNoConstraints) {

   Graph g(OpRegistry::Global());

   {  // Scope for temporary variables used to construct g.   // 用GraphDefBuilder构建graph的结构

     GraphDefBuilder b(GraphDefBuilder::kFailImmediately);

     Node* input = ops::SourceOp("TestInput", b.opts().WithName("in"));

     ops::UnaryOp("TestRelu", ops::NodeOut(input, ), b.opts().WithName("n1"));

     ops::UnaryOp("TestRelu", ops::NodeOut(input, ), b.opts().WithName("n2"));

     TF_EXPECT_OK(BuildGraph(b, &g));   //  BuildGraph函数将GraphDefBuilder的图写入到Graph中

   }

   TF_EXPECT_OK(Place(&g));   // Place函数将graph中的node布放到设备列表中

   EXPECT_DEVICE_TYPE(g, "in", DEVICE_CPU);   // 期望：input节点在CPU中，n1节点在GPU中，n2节点在GPU中，故而GPU优先级大于CPU

   EXPECT_DEVICE_TYPE(g, "n1", DEVICE_GPU);

   EXPECT_DEVICE_TYPE(g, "n2", DEVICE_GPU);

 }

其中BuildGraph函数将GraphDefBuilder 对象中的graph 结构定义写入到Graph中。Place函数将graph中的node布放到设备列表中，其中device assignment算法的核心在SimplePlacer::Run函数中

  // Builds the given graph, and (if successful) indexes the node

   // names for use in placement, and later lookup.

   Status BuildGraph(const GraphDefBuilder& builder, Graph* out_graph) {

     TF_RETURN_IF_ERROR(builder.ToGraph(out_graph));

     nodes_by_name_.clear();

     for (Node* node : out_graph->nodes()) {

       nodes_by_name_[node->name()] = node->id();

     }

     return Status::OK();

   }

   // Invokes the SimplePlacer on "graph". If no DeviceSet is specified, the

   // placement will use the default DeviceSet (of 10 CPU and 10 GPU devices).

   //

   // REQUIRES: "*graph" was produced by the most recent call to BuildGraph.

   Status Place(Graph* graph, DeviceSet* devices, SessionOptions* options) {

     SimplePlacer placer(graph, devices, options);

     return placer.Run();

   }

SimplePlacer::Run()在core/common_runtime/simple_placer.cc文件中，具体实现分为4个步骤：

步骤1和2：遍历graph的node，将node加入到ColocationGraph对象中（不包含source和sink节点）。

 // 1. First add all of the nodes. Note that steps (1) and (2)

 // requires two passes over the nodes because the graph (and hence

 // the constraints) may not be acyclic.  这里graph可能是有环的？

 for (Node* node : graph_->nodes()) {

     // Skip the source and sink nodes.

     if (!node->IsOp()) { continue; }

     status = colocation_graph.AddNode(*node);

     if (!status.ok()) return AttachDef(status, node->def());

   }

 // 2. Enumerate the constraint edges, and use them to update the disjoint node set.         // disjoint set（并查集，即不相交的节点集合），一种树型数据结构，

 ...

 ColocationGraph maintains the connected components of a colocation constraint graph, and uses this information to assign a satisfying device placement to the nodes of the graph.

 The implementation uses the union- find algorithm to maintain the connected components efficiently and incrementally as edges (implied by ColocationGraph::ColocateNodes() invocations) are added.

 参考：并查集wiki

步骤3：如下图和code所示，source和sink节点分配在cpu上，已指定device的节点不再重新分配。分配方式有方面，见Heuristic A和Heuristic B。

  . For each node, assign a device based on the constraints in thedisjoint node set.

   std::vector<Device*> devices;

   std::vector<Node*> second_pass;

   for (Node* node : graph_->nodes()) {

     // Skip the source and sink nodes.

     if (!node->IsOp()) {

       continue;

     }

     // Skip nodes that already have an assigned name.

     if (!node->assigned_device_name().empty()) {

       continue;

     }

     // Heuristic A: prefer to place "generators" with their only

     // consumers.

     //

     // If this is a node with no inputs and a single (non-ref)

     // consumer, we save this for a second pass, so that the

     // consumer's placement is chosen.

     if (IsGeneratorNode(node)) {    // generator node: no input, one output, not a reference-type node

       second_pass.push_back(node);

       continue;

     }

     status = colocation_graph.GetDevicesForNode(node, &devices);

     ...

     // Returns the first device in sorted devices list so we will always

     // choose the same device.

     //

     // TODO(vrv): Factor this assignment out into a pluggable

     // algorithm, so that SimplePlacer is responsible for enforcing

     // preconditions and we can experiment with other algorithms when

     // given a choice of devices. Once we have a better idea of the

     // types of heuristics we want to use and the information needed

     // to perform good placement we can add an interface for this.

     string assigned_device = devices[]->name();

     // Heuristic B: If the node only operates on metadata, not data,

     // then it is desirable to place that metadata node with its

     // input.

     if (IsMetadataNode(node)) {

       // Make sure that the input device type is in the list of supported

       // device types for this node.

       const Node* input = (*node->in_edges().begin())->src();

       // TODO(vrv): if the input is empty, consider postponing this

       // node's assignment to the second pass, so that we handle the

       // case where a metadata node's input comes from a backedge

       // of a loop.

       const string& input_device_name = input->assigned_device_name();

       if (CanAssignToDevice(input_device_name, devices)) {

         assigned_device = input_device_name;

       }

     }

     AssignAndLog(assigned_device, node);   // 将assigned_device分配个node节点，在步骤3中没有对符合Heuristic A的GeneratorNode分配设备，而是在步骤4中完成的

   }

 bool IsGeneratorNode(const Node* node) {

   return node->num_inputs() ==  && node->num_outputs() ==  && node->out_edges().size() ==  && !IsRefType(node->output_type());

 }

 bool IsMetadataNode(const Node* node) {

   const string& node_type = node->type_string();

   return (node_type == "Size" || node_type == "Shape" || node_type == "Rank");

 }

步骤4：给步骤3中的Generator Node分配device。

// 4. Perform a second pass assignment for those nodes explicitly skipped during the first pass.

...

部分参考：

http://bettercstomorrow.com/2016/07/14/distributed-tensorflow-internal-architecture-summary/

http://bettercstomorrow.com/2016/07/06/distributed-tensorflow-internal-architecture-6/ （韩文的-_-）

”tensorflow: large-scale machine learning on heterogeneous distributed systems“

来自为知笔记(Wiz)

tensorflow节点布放（device assignment of node）算法：simpler_placer的更多相关文章

获取所有树叶子节点注册添加事件 if ($(node).tree('isLeaf', node.target)) 是否叶子节点
//获取所有树叶子节点注册添加事件 if ($(node).tree('isLeaf', node.target)) 是否叶子节点 $(function () { $('.easyui-tree') ...
[图解tensorflow源码] Simple Placer节点布放算法
笔记︱基于网络节点的node2vec、论文、算法python实现
看到一个很有意思的算法,而且腾讯朋友圈lookalike一文中也有提及到,于是蹭一波热点,学习一下.论文是也发KDD2016 . . 一.主要论文:node2vec: Scalable Feature ...
TensorFlow实现knn（k近邻）算法
首先先介绍一下knn的基本原理: KNN是通过计算不同特征值之间的距离进行分类. 整体的思路是:如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于 ...
HDU 5289 Assignment （ST算法区间最值+二分）
题目链接:pid=5289">http://acm.hdu.edu.cn/showproblem.php?pid=5289 题面: Assignment Time Limit: 400 ...
kaggle赛题Digit Recognizer：利用TensorFlow搭建神经网络（附上K邻近算法模型预测）
一.前言 kaggle上有传统的手写数字识别mnist的赛题,通过分类算法,将图片数据进行识别.mnist数据集里面,包含了42000张手写数字0到9的图片,每张图片为28*28=784的像素,所以整 ...
Kubernetes 二进制部署（一）单节点部署（Master 与 Node 同一机器）
0. 前言最近受“新冠肺炎”疫情影响,在家等着,入职暂时延后,在家里办公和学习尝试通过源码编译二进制的方式在单一节点(Master 与 Node 部署在同一个机器上)上部署一个 k8s 环境,整理 ...
k8s kubernetes给node节点添加标签和删除node节点标签
node节点IP 192.168.1.205 给节点添加标签的命令添加label语法 kubectl label nodes <node-name> <label-key>= ...
TensorFlow从0到1之回归算法（11）
回归是数学建模.分类和预测中最古老但功能非常强大的工具之一.回归在工程.物理学.生物学.金融.社会科学等各个领域都有应用,是数据科学家常用的基本工具. 回归通常是机器学习中使用的第一个算法.通过学习因 ...

随机推荐

Hibernate 一对多，多对多，多对一检索策略
一.概述我们先来谈谈检索数据时的两个问题: 1.不浪费内存 2.更好的检索效率以上说的问题都是我们想要避免的,接下来就引出了我们要讨论的话题---------------hibernate检索 ...
小任务之使用SVG画柱状图~
function drawBar(data) { var barGraph = document.querySelector("#bar-graph"); var graphWid ...
bzoj2119 股市的预测
传送门感觉智商莫名其妙的就变低了……写这题的时候死活想不出来…… 做法其实不难…… 题目要求形如ABA的串的个数,我们可以枚举A的长度,利用标记关键点的方法统计答案.设枚举到的答案为k,每k个点标记 ...
js内存空间详细图解-笔记
原文参考http://mp.weixin.qq.com/s/NGqdjhoU3MR9LD0yH6tKIw 栈-先进后出堆-类比成书于书架(形象),只要知道Key就可以找到value 基础数据类型(Un ...
ntp时钟服务器
NTP服务器时钟校准的基本流程: (1):NTP客户端向NTP服务器发出一个时间请求包(UDP包),其中包含了该包离开客户端时的时间戳. (2):当服务器接收到该包时.填入包到达时的时间戳.包离开时的 ...
hdu 3367 Pseudoforest （最大生成树最多存在一个环）
题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=3367 Pseudoforest Time Limit: 10000/5000 MS (Java/Oth ...
ToolBar 简单使用
ToolBar 简单使用 ToolBar 是在 android 5.0之后推出的一款用来替代 ActionBar 的 View.ActionBar 是Activity的一部分,不能用在其他视图层次上( ...
两步让你的mobile traffic通过fiddler代理传送
mobile app运行时由于调试网络相关的内容非常不便,所以如果能够让iphone通过桌面主机来跑traffic,那么在pc上就能非常清楚地检查mobile app和后端之间有什么问题了. 幸运的是 ...
排查在 Azure 中创建、重启 Linux VM 或调整其大小时发生的分配故障
创建 VM.重启已停止(解除分配)的 VM 和重设 VM 大小时,Azure 会为订阅分配计算资源. 执行这些操作时,即使尚未达到 Azure 订阅限制,也可能偶尔收到错误. 本文说明一些常见分配故障 ...
angular2 遗留问题
1.angular build [2017-07-26] a.改写js/css的引用目录的前缀(比如统一增加 /abc/xxx/*.js) b.build时,可以控制index/js/css的生成 ...

tensorflow节点布放（device assignment of node）算法：simpler_placer

tensorflow节点布放（device assignment of node）算法：simpler_placer的更多相关文章

随机推荐

热门专题