PartIV
实现 处理worker 失败情况。
worker 处理失败,master 应该重新分配该任务给其他的worker 处理。
rpc 失败情况复杂,可能worker 结果回应丢失了,也有可能 worker还在处理,但是
master rpc 超时了。考虑两个worker都拿到相同的task 且都有结果文件的情况,
要确保输出任务的原子性。当前这个程序框架无法保证任务输出的原子性,只考虑失败的worker没任何输出。
 
测试命令
$ go test -run Failure
master@master:~/study/6.824/src/mapreduce$
master@master:~/study/6.824/src/mapreduce$ go test -run Failure
2019/03/24 08:59:25 rpc.Register: method "CleanupFiles" has 1 input parameters; needs exactly three
2019/03/24 08:59:25 rpc.Register: method "Lock" has 1 input parameters; needs exactly three
2019/03/24 08:59:25 rpc.Register: method "Unlock" has 1 input parameters; needs exactly three
2019/03/24 08:59:25 rpc.Register: method "Wait" has 1 input parameters; needs exactly three
2019/03/24 08:59:25 rpc.Register: method "Lock" has 1 input parameters; needs exactly three
2019/03/24 08:59:25 rpc.Register: method "Unlock" has 1 input parameters; needs exactly three
/var/tmp/824-1000/mr2499-master: Starting Map/Reduce task test
2019/03/24 08:59:25 rpc.Register: method "Lock" has 1 input parameters; needs exactly three
2019/03/24 08:59:25 rpc.Register: method "Unlock" has 1 input parameters; needs exactly three
Schedule: 20 mapPhase tasks (10 I/Os)
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #0 on file 824-mrinput-0.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #1 on file 824-mrinput-1.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: mapPhase task #1 done
/var/tmp/824-1000/mr2499-worker1: mapPhase task #0 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #3 on file 824-mrinput-3.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #2 on file 824-mrinput-2.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: mapPhase task #2 done
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #4 on file 824-mrinput-4.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #3 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #5 on file 824-mrinput-5.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #5 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #6 on file 824-mrinput-6.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: mapPhase task #4 done
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #7 on file 824-mrinput-7.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #6 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #8 on file 824-mrinput-8.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: mapPhase task #7 done
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #9 on file 824-mrinput-9.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: mapPhase task #9 done
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #10 on file 824-mrinput-10.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #8 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #11 on file 824-mrinput-11.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: mapPhase task #10 done
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #12 on file 824-mrinput-12.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #11 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #13 on file 824-mrinput-13.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: mapPhase task #12 done
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #14 on file 824-mrinput-14.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #13 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #15 on file 824-mrinput-15.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: mapPhase task #14 done
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #16 on file 824-mrinput-16.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #15 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #17 on file 824-mrinput-17.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: mapPhase task #16 done
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #18 on file 824-mrinput-18.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #17 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #19 on file 824-mrinput-19.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: mapPhase task #18 done
/var/tmp/824-1000/mr2499-worker1: mapPhase task #19 done
Schedule: mapPhase done
Schedule: 10 reducePhase tasks (20 I/Os)
/var/tmp/824-1000/mr2499-worker1: given reducePhase task #1 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker1: reducePhase task #1 done
/var/tmp/824-1000/mr2499-worker1: given reducePhase task #2 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker1: reducePhase task #2 done
/var/tmp/824-1000/mr2499-worker1: given reducePhase task #3 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker1: reducePhase task #3 done
/var/tmp/824-1000/mr2499-worker1: given reducePhase task #4 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker1: reducePhase task #4 done
/var/tmp/824-1000/mr2499-worker1: given reducePhase task #5 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker1: reducePhase task #5 done
/var/tmp/824-1000/mr2499-worker1: given reducePhase task #6 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker1: reducePhase task #6 done
/var/tmp/824-1000/mr2499-worker1: given reducePhase task #7 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker1: reducePhase task #7 done
/var/tmp/824-1000/mr2499-worker1: given reducePhase task #8 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker1: reducePhase task #8 done
/var/tmp/824-1000/mr2499-worker1: given reducePhase task #9 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker1: reducePhase task #9 done
/var/tmp/824-1000/mr2499-worker1: given reducePhase task #0 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker1: reducePhase task #0 done
Schedule: reducePhase done
Master: RPC /var/tmp/824-1000/mr2499-worker0 shutdown error
Merge: read mrtmp.test-res-0
Merge: read mrtmp.test-res-1
Merge: read mrtmp.test-res-2
Merge: read mrtmp.test-res-3
Merge: read mrtmp.test-res-4
Merge: read mrtmp.test-res-5
Merge: read mrtmp.test-res-6
Merge: read mrtmp.test-res-7
Merge: read mrtmp.test-res-8
Merge: read mrtmp.test-res-9
/var/tmp/824-1000/mr2499-master: Map/Reduce task completed
2019/03/24 08:59:25 rpc.Register: method "CleanupFiles" has 1 input parameters; needs exactly three
2019/03/24 08:59:25 rpc.Register: method "Lock" has 1 input parameters; needs exactly three
2019/03/24 08:59:25 rpc.Register: method "Unlock" has 1 input parameters; needs exactly three
2019/03/24 08:59:25 rpc.Register: method "Wait" has 1 input parameters; needs exactly three
/var/tmp/824-1000/mr2499-master: Starting Map/Reduce task test
Schedule: 20 mapPhase tasks (10 I/Os)
2019/03/24 08:59:25 rpc.Register: method "Lock" has 1 input parameters; needs exactly three
2019/03/24 08:59:25 rpc.Register: method "Unlock" has 1 input parameters; needs exactly three
2019/03/24 08:59:25 rpc.Register: method "Lock" has 1 input parameters; needs exactly three
2019/03/24 08:59:25 rpc.Register: method "Unlock" has 1 input parameters; needs exactly three
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #0 on file 824-mrinput-0.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #1 on file 824-mrinput-1.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #0 done
/var/tmp/824-1000/mr2499-worker0: mapPhase task #1 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #2 on file 824-mrinput-2.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #3 on file 824-mrinput-3.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #2 done
/var/tmp/824-1000/mr2499-worker0: mapPhase task #3 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #4 on file 824-mrinput-4.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #5 on file 824-mrinput-5.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #4 done
/var/tmp/824-1000/mr2499-worker0: mapPhase task #5 done
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #6 on file 824-mrinput-6.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #7 on file 824-mrinput-7.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #7 done
/var/tmp/824-1000/mr2499-worker0: mapPhase task #6 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #8 on file 824-mrinput-8.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #9 on file 824-mrinput-9.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #8 done
/var/tmp/824-1000/mr2499-worker0: mapPhase task #9 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #10 on file 824-mrinput-10.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #11 on file 824-mrinput-11.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #10 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #12 on file 824-mrinput-12.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: mapPhase task #11 done
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #13 on file 824-mrinput-13.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #12 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #14 on file 824-mrinput-14.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: mapPhase task #13 done
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #15 on file 824-mrinput-15.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #14 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #16 on file 824-mrinput-16.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: mapPhase task #15 done
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #17 on file 824-mrinput-17.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #16 done
/var/tmp/824-1000/mr2499-worker1: given mapPhase task #18 on file 824-mrinput-18.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker0: mapPhase task #17 done
/var/tmp/824-1000/mr2499-worker0: given mapPhase task #19 on file 824-mrinput-19.txt (nios: 10)
/var/tmp/824-1000/mr2499-worker1: mapPhase task #18 done
/var/tmp/824-1000/mr2499-worker0: mapPhase task #19 done
Schedule: mapPhase done
Schedule: 10 reducePhase tasks (20 I/Os)
2019/03/24 08:59:26 rpc.Register: method "Lock" has 1 input parameters; needs exactly three
2019/03/24 08:59:26 rpc.Register: method "Lock" has 1 input parameters; needs exactly three
2019/03/24 08:59:26 rpc.Register: method "Unlock" has 1 input parameters; needs exactly three
2019/03/24 08:59:26 rpc.Register: method "Unlock" has 1 input parameters; needs exactly three
/var/tmp/824-1000/mr2499-worker3: given reducePhase task #2 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker2: given reducePhase task #3 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker3: reducePhase task #2 done
/var/tmp/824-1000/mr2499-worker2: reducePhase task #3 done
/var/tmp/824-1000/mr2499-worker3: given reducePhase task #5 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker2: given reducePhase task #4 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker3: reducePhase task #5 done
/var/tmp/824-1000/mr2499-worker3: given reducePhase task #6 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker2: reducePhase task #4 done
/var/tmp/824-1000/mr2499-worker2: given reducePhase task #7 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker3: reducePhase task #6 done
/var/tmp/824-1000/mr2499-worker3: given reducePhase task #8 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker2: reducePhase task #7 done
/var/tmp/824-1000/mr2499-worker2: given reducePhase task #9 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker3: reducePhase task #8 done
/var/tmp/824-1000/mr2499-worker3: given reducePhase task #0 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker2: reducePhase task #9 done
/var/tmp/824-1000/mr2499-worker2: given reducePhase task #1 on file (nios: 20)
/var/tmp/824-1000/mr2499-worker3: reducePhase task #0 done
/var/tmp/824-1000/mr2499-worker2: reducePhase task #1 done
Schedule: reducePhase done
Master: RPC /var/tmp/824-1000/mr2499-worker1 shutdown error
Master: RPC /var/tmp/824-1000/mr2499-worker0 shutdown error
Merge: read mrtmp.test-res-0
Merge: read mrtmp.test-res-1
Merge: read mrtmp.test-res-2
Merge: read mrtmp.test-res-3
Merge: read mrtmp.test-res-4
Merge: read mrtmp.test-res-5
Merge: read mrtmp.test-res-6
Merge: read mrtmp.test-res-7
Merge: read mrtmp.test-res-8
Merge: read mrtmp.test-res-9
/var/tmp/824-1000/mr2499-master: Map/Reduce task completed
PASS
ok _/home/master/study/6.824/src/mapreduce 2.845s
master@master:~/study/6.824/src/mapreduce$
 
schedule.go

package mapreduce

import (
"fmt"
"strings"
"sync"
"time"
) //
// schedule() starts and waits for all tasks in the given phase (mapPhase
// or reducePhase). the mapFiles argument holds the names of the files that
// are the inputs to the map phase, one per map task. nReduce is the
// number of reduce tasks. the registerChan argument yields a stream
// of registered workers; each item is the worker's RPC address,
// suitable for passing to call(). registerChan will yield all
// existing registered workers (if any) and new ones as they register.
//
func schedule(jobName string, mapFiles []string, nReduce int, phase jobPhase, registerChan chan string) {
var ntasks int
var n_other int // number of inputs (for reduce) or outputs (for map)
switch phase {
case mapPhase:
ntasks = len(mapFiles)
n_other = nReduce
case reducePhase:
ntasks = nReduce
n_other = len(mapFiles)
} fmt.Printf("Schedule: %v %v tasks (%d I/Os)\n", ntasks, phase, n_other) // All ntasks tasks have to be scheduled on workers. Once all tasks
// have completed successfully, schedule() should return.
//
// Your code here (Part III, Part IV).
// //没有任务直接返回
if ntasks <= 0 {
return
}
//完成标志,应该取值为负数,不能与任务下标重叠
finishFlag := -2
//创建可工作worker队列
usableWorkers := make(chan string)
//创建任务队列
waitTaskChan := make(chan int, ntasks)
//初始化任务队列
for k:=0; k<ntasks; k++ {
waitTaskChan <- k
}
//创建任务完成计数跟踪线程通知主线程处理结果,主线线程堵塞获取待处理任务
handleCount := sync.WaitGroup{}
handleCount.Add(ntasks)
go func() {
handleCount.Wait()
waitTaskChan <- finishFlag
}() //获取待处理任务
for taskIndex := range waitTaskChan {
// 标志为处理所有的任务处理完成 if taskIndex == finishFlag {
break;
}
//两个信道怎么堵塞读取:worker怎么改为不为自旋的方式?
for {
var work string;
//如果有待处理的任务,则获取可用的worker
select {
//应该先读新注册的worker.
case work =<- registerChan:
case work =<- usableWorkers:
}
if strings.TrimSpace(work) != "" {
//创建处理任务线程
go func(index int) {
rqArs := new(DoTaskArgs)
rqArs.JobName=jobName
if phase == mapPhase {
rqArs.File = mapFiles[index]
}
rqArs.Phase=phase
rqArs.TaskNumber=index
rqArs.NumOtherPhase=n_other
handleResult := false
defer func(){
if handleResult {
//worker 处理成功,则完成计数减一,worker返回可用的队列
handleCount.Done()
usableWorkers <- work
} else {
//usableWorkers <- work //处理失败的 worker 不应该放回可用worker队列?
waitTaskChan <- index
}
}()
handleResult = call(work, "Worker.DoTask",
rqArs, nil)
}(taskIndex)
break
} else {
//释放cpu
time.After(time.Second)
}
}
}
fmt.Printf("Schedule: %v done\n", phase)
}
 

LAB1 partIV的更多相关文章

  1. 6.828 lab1 bootload

    MIT6.828 lab1地址:http://pdos.csail.mit.edu/6.828/2014/labs/lab1/ 第一个练习,主要是让我们熟悉汇编,嗯,没什么好说的. Part 1: P ...

  2. Machine Learning #Lab1# Linear Regression

    Machine Learning Lab1 打算把Andrew Ng教授的#Machine Learning#相关的6个实验一一实现了贴出来- 预计时间长度战线会拉的比較长(毕竟JOS的7级浮屠还没搞 ...

  3. ucore lab1 bootloader学习笔记

    ---恢复内容开始--- 开机流程回忆 以Intel 80386为例,计算机加电后,CPU从物理地址0xFFFFFFF0(由初始化的CS:EIP确定,此时CS和IP的值分别是0xF000和0xFFF0 ...

  4. LAB1 partV

    partV 创建文档反向索引.word -> document 与 前面做的 单词统计类似,这个是单词与文档位置的映射关系. mapF 文档解析相同,返回信息不同而已. reduceF 返回归约 ...

  5. 6.824 LAB1 环境搭建

    MIT 6.824 LAB1 环境搭建 vmware 虚拟机 linux ubuntu server   安装 go 官方安装步骤: 下载此压缩包并提取到 /usr/local 目录,在 /usr/l ...

  6. 软件测试:lab1.Junit and Eclemma

    软件测试:lab1.Junit and Eclemma Task: Install Junit(4.12), Hamcrest(1.3) with Eclipse Install Eclemma wi ...

  7. MIT 6.824 lab1:mapreduce

    这是 MIT 6.824 课程 lab1 的学习总结,记录我在学习过程中的收获和踩的坑. 我的实验环境是 windows 10,所以对lab的code 做了一些环境上的修改,如果你仅仅对code 感兴 ...

  8. 清华大学OS操作系统实验lab1练习知识点汇总

    lab1知识点汇总 还是有很多问题,但是我觉得我需要在查看更多资料后回来再理解,学这个也学了一周了,看了大量的资料...还是它们自己的80386手册和lab的指导手册觉得最准确,现在我就把这部分知识做 ...

  9. JOS lab1 part2 分析

    lab1的Exercise 2就是让我们熟悉gdb的si操作,并知道BIOS的几条指令在做什么就够了,所以我们也会尽可能的去分析每一行代码. 首先进入到6.8282/lab这个目录下,输入指令make ...

随机推荐

  1. 2Sum问题

    2Sum问题是3Sum和4Sum的基础,很多OJ都是以此为最简单的练手题的. 题目描述: 从一个数组里找出两个和为target的数. LeetCode上的描述: Given an array of i ...

  2. Hadoop学习------Hadoop安装方式之(三):分布式部署

    这里为了方便直接将单机部署过的虚拟机直接克隆,当然也可以不这样做,一个个手工部署. 创建完整克隆——>下一步——>安装位置.等待一段时间即可. 我这边用了三台虚拟机,分别起名master, ...

  3. 运维seq语法2

    2017-03-02 09:47:42   # seq 1 10结果是1 2 3 4 5 6 7 8 9 10例二:#!/bin/bashfor i in `seq 1 10`;doecho $i;d ...

  4. 禁用ViewPager的滑动事件

    public class NoScrollViewPager extends ViewPager { private boolean noScroll = false; public NoScroll ...

  5. LoadRunner学习笔记(1)--异常处理方法

    1.查看端口号占用情况 cmd  -> netstat  -ano  找到进程关闭后在重启网站 2.LR录制脚本时为何不弹出IE浏览器 启动浏览器,打开Internet选项对话框,切换到高级标签 ...

  6. java类.方法创建.继续调用

    1.ctrl +n 创建类(首字母大写) 2.alt +s 选倒数第二个 创建方法(Superclass) 3.alt +s 选倒数第三个 创建带参数的方法(using fileds) 4.创建的vo ...

  7. bug狩猎

    最近需求少了,终于有时间修一下底层的bug,做一点工具方便查bug,写篇文章简单记录一下. 一是优化了一个玩法进程的启动速度.这个玩法需要用战力做匹配,玩家按战力分段放进不同的桶里,每个桶用skipl ...

  8. Visual C++ 6.0对任意三个数字进行排序

    # include <stdio.h> int main (void) { int a, b, c; int t; printf("请输入三个整数,中间以空格隔开:") ...

  9. Vue慕课网音乐项目随手记--node代理及数据抓取

    1.抓取数据 链接   https://y.qq.com/portal/playlist.html Parameters 通过上图能看到,qq音乐通过设置了refer和host来保护接口. 那么怎么才 ...

  10. 嵌入页面的几种方法(转载自萤火虫小Q)

    一.应用框架技术 ---- 要在宿主页面中嵌入外部页面的方法是,在宿主页面中包含外部页面的位置插入“< IFRAME name="XXX" width=X height=X ...