HiBench成长笔记——(10) 分析源码execute_with_log.py
#!/usr/bin/env python2
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys, os, subprocess
from terminalsize import get_terminal_size
from time import time, sleep
import re
import fnmatch
def load_colors():
color_script_fn = os.path.join(os.path.dirname(__file__), "color.enabled.sh")
with open(color_script_fn) as f:
return dict([(k,v.split("'")[1].replace('\e[', "\033[")) for k,v in [x.strip().split('=') for x in f.readlines() if x.strip() and not x.strip().startswith('#')]])
Color=load_colors()
if int(os.environ.get("HIBENCH_PRINTFULLLOG", 0)):
Color['ret'] = os.linesep
else:
Color['ret']='\r'
tab_matcher = re.compile("\t")
tabstop = 8
def replace_tab_to_space(s):
def tab_replacer(match):
pos = match.start()
length = pos % tabstop
if not length: length += tabstop
return " " * length
return tab_matcher.sub(tab_replacer, s)
class _Matcher:
hadoop = re.compile(r"^.*map\s*=\s*(\d+)%,\s*reduce\s*=\s*(\d+)%.*$")
hadoop2 = re.compile(r"^.*map\s+\s*(\d+)%\s+reduce\s+\s*(\d+)%.*$")
spark = re.compile(r"^.*finished task \S+ in stage \S+ \(tid \S+\) in.*on.*\((\d+)/(\d+)\)\s*$")
def match(self, line):
for p in [self.hadoop, self.hadoop2]:
m = p.match(line)
if m:
return (float(m.groups()[0]) + float(m.groups()[1]))/2
for p in [self.spark]:
m = p.match(line)
if m:
return float(m.groups()[0]) / float(m.groups()[1]) * 100
matcher = _Matcher()
def show_with_progress_bar(line, progress, line_width):
"""
Show text with progress bar.
@progress:0-100
@line: text to show
@line_width: width of screen
"""
pos = int(line_width * progress / 100)
if len(line) < line_width:
line = line + " " * (line_width - len(line))
line = "{On_Yellow}{line_seg1}{On_Blue}{line_seg2}{Color_Off}{ret}".format(
line_seg1 = line[:pos], line_seg2 = line[pos:], **Color)
sys.stdout.write(line)
def execute(workload_result_file, command_lines):
proc = subprocess.Popen(" ".join(command_lines), shell=True, bufsize=1, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
count = 100
last_time=0
log_file = open(workload_result_file, 'w')
# see http://stackoverflow.com/a/4417735/1442961
lines_iterator = iter(proc.stdout.readline, b"")
for line in lines_iterator:
count += 1
if count > 100 or time()-last_time>1: # refresh terminal size for 100 lines or each seconds
count, last_time = 0, time()
width, height = get_terminal_size()
width -= 1
try:
line = line.rstrip()
log_file.write(line+"\n")
log_file.flush()
except KeyboardInterrupt:
proc.terminate()
break
line = line.decode('utf-8')
line = replace_tab_to_space(line)
#print "{Red}log=>{Color_Off}".format(**Color), line
lline = line.lower()
def table_not_found_in_log(line):
table_not_found_pattern = "*Table * not found*"
regex = fnmatch.translate(table_not_found_pattern)
reobj = re.compile(regex)
if reobj.match(line):
return True
else:
return False
def database_default_exist_in_log(line):
database_default_already_exist = "Database default already exists"
if database_default_already_exist in line:
return True
else:
return False
def uri_with_key_not_found_in_log(line):
uri_with_key_not_found = "Could not find uri with key [dfs.encryption.key.provider.uri]"
if uri_with_key_not_found in line:
return True
else:
return False
if ('error' in lline) and lline.lstrip() == lline:
#Bypass hive 'error's and KeyProviderCache error
bypass_error_condition = table_not_found_in_log or database_default_exist_in_log(lline) or uri_with_key_not_found_in_log(lline)
if not bypass_error_condition:
COLOR = "Red"
sys.stdout.write((u"{%s}{line}{Color_Off}{ClearEnd}\n" % COLOR).format(line=line,**Color).encode('utf-8'))
else:
if len(line) >= width:
line = line[:width-4]+'...'
progress = matcher.match(lline)
if progress is not None:
show_with_progress_bar(line, progress, width)
else:
sys.stdout.write(u"{line}{ClearEnd}{ret}".format(line=line, **Color).encode('utf-8'))
sys.stdout.flush()
print
log_file.close()
try:
proc.wait()
except KeyboardInterrupt:
proc.kill()
return 1
return proc.returncode
def test_progress_bar():
for i in range(101):
show_with_progress_bar("test progress : %d" % i, i, 80)
sys.stdout.flush()
sleep(0.05)
if __name__=="__main__":
sys.exit(execute(workload_result_file=sys.argv[1],
command_lines=sys.argv[2:]))
# test_progress_bar()
HiBench成长笔记——(10) 分析源码execute_with_log.py的更多相关文章
- HiBench成长笔记——(9) 分析源码monitor.py
monitor.py 是主监控程序,将监控数据写入日志,并统计监控数据生成HTML统计展示页面: #!/usr/bin/env python2 # Licensed to the Apache Sof ...
- HiBench成长笔记——(8) 分析源码workload_functions.sh
workload_functions.sh 是测试程序的入口,粘连了监控程序 monitor.py 和 主运行程序: #!/bin/bash # Licensed to the Apache Soft ...
- HiBench成长笔记——(11) 分析源码run.sh
#!/bin/bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributor licen ...
- HiBench成长笔记——(5) HiBench-Spark-SQL-Scan源码分析
run.sh #!/bin/bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributo ...
- Hadoop学习笔记(10) ——搭建源码学习环境
Hadoop学习笔记(10) ——搭建源码学习环境 上一章中,我们对整个hadoop的目录及源码目录有了一个初步的了解,接下来计划深入学习一下这头神象作品了.但是看代码用什么,难不成gedit?,单步 ...
- CentOS 7运维管理笔记(10)----MySQL源码安装
MySQL可以支持多种平台,如Windows,UNIX,FreeBSD或其他Linux系统.本篇随笔记录在CentOS 7 上使用源码安装MySQL的过程. 1.下载源码 选择使用北理工的镜像文件: ...
- memcached学习笔记——存储命令源码分析下篇
上一篇回顾:<memcached学习笔记——存储命令源码分析上篇>通过分析memcached的存储命令源码的过程,了解了memcached如何解析文本命令和mencached的内存管理机制 ...
- memcached学习笔记——存储命令源码分析上篇
原创文章,转载请标明,谢谢. 上一篇分析过memcached的连接模型,了解memcached是如何高效处理客户端连接,这一篇分析memcached源码中的process_update_command ...
- kernel 3.10内核源码分析--hung task机制
kernel 3.10内核源码分析--hung task机制 一.相关知识: 长期以来,处于D状态(TASK_UNINTERRUPTIBLE状态)的进程 都是让人比较烦恼的问题,处于D状态的进程不能接 ...
随机推荐
- 「SDOI2005」区间
「SDOI2005」区间 传送门 记录每一个位置作为左端点和右端点的出现次数,然后直接考虑差分即可. 参考代码: #include <cstdio> #define rg register ...
- a标签中执行js函数
在a标签中调用js函数最适当的方法推荐使用: 1.a href="javascript:void(0);" onclick="js_method()" 这种方法 ...
- C++关键字总结【新手必学】
const 关键字——常量const 与definedefine是预编译器的编译指令,它从C语言兼容下来,工作方式与文本编辑器中的全局搜索和替换相似.define定义的常量的意义在它开始的地方持续到文 ...
- [转]轻松理解AOP思想(面向切面编程)
原文链接 Spring是什么 先说一个Spring是什么吧,大家都是它是一个框架,但框架这个词对新手有点抽象,以致于越解释越模糊,不过它确实是个框架的,但那是从功能的角度来定义的,从本质意义上来讲,S ...
- 6 JavaScript函数&内置构造&函数提升&函数对象&箭头函数&函数参数&参数的值传递与对象传递
JavaScript函数:使用关键字function定义,也可以使用内置的JavaScript函数构造器定义 匿名函数: 函数表达式可以存储在变量中,并且该变量也可以作为函数使用. 实际上是匿名函数. ...
- javascript ----一些边距知识
Style top 属性 Style 对象 定义和用法 top 属性设置或返回定位元素的顶部位置. 该属性规定了元素的顶部位置,包括:内边距.滚动条.边框和外边距. 提示:一个定位元素是元素的 po ...
- python爬虫(三) 用request爬取拉勾网职位信息
request.Request类 如果想要在请求的时候添加一个请求头(增加请求头的原因是,如果不加请求头,那么在我们爬取得时候,可能会被限制),那么就必须使用request.Request类来实现,比 ...
- tensorflow变量的使用(02-2)
import tensorflow as tf x=tf.Variable([1,2]) a=tf.constant([3,3]) sub=tf.subtract(x,a) #增加一个减法op add ...
- vs code 本地调试配置
{ "name": "使用本机 Chrome 调试", "type": "chrome", "request& ...
- vue-mixin
当多个组件需要处理同一个问题,并且处理该问题的逻辑又相似,非常推荐用mixin