编译impala、拓展impala语法解析模块
以前也编译过,但是每次编译都忘记怎么做,然后都得重新找需要下载的文件。
编译文件:buildall.sh
如果想只编译前端可以这样运行:
buildall.sh -fe_only
编译时会去S3下载一些文件,由于在国外下载很慢,所以可以在本地开ss去下载好再上传到编译服务器上
那么会下载哪些东西呢?
编辑bin/bootstrap_toolchain.py
找到下面这几句话
def wget_and_unpack_package(download_path, file_name, destination, wget_no_clobber):
print "URL {0}".format(download_path)
print "Downloading {0} to {1}".format(file_name, destination)
# --no-clobber avoids downloading the file if a file with the name already exists
sh.wget(download_path, directory_prefix=destination, no_clobber=wget_no_clobber)
print "Extracting {0}".format(file_name)
sh.tar(z=True, x=True, f=os.path.join(destination, file_name), directory=destination)
sh.rm(os.path.join(destination, file_name))
把后面4行注释掉,就不会去真正下载了:
def wget_and_unpack_package(download_path, file_name, destination, wget_no_clobber):
print "URL {0}".format(download_path)
print "Downloading {0} to {1}".format(file_name, destination)
# --no-clobber avoids downloading the file if a file with the name already exists
"""
sh.wget(download_path, directory_prefix=destination, no_clobber=wget_no_clobber)
print "Extracting {0}".format(file_name)
sh.tar(z=True, x=True, f=os.path.join(destination, file_name), directory=destination)
sh.rm(os.path.join(destination, file_name))
"""
然后找到这段话:
def bootstrap(toolchain_root, packages):
"""Downloads and unpacks each package in the list `packages` into `toolchain_root` if it
doesn't exist already.
"""
if not try_get_platform_release_label():
check_custom_toolchain(toolchain_root, packages)
return
# Detect the compiler
compiler = "gcc-{0}".format(os.environ["IMPALA_GCC_VERSION"])
for p in packages:
pkg_name, pkg_version = unpack_name_and_version(p)
if check_for_existing_package(toolchain_root, pkg_name, pkg_version, compiler):
continue
if pkg_name != "kudu" or os.environ["KUDU_IS_SUPPORTED"] == "true":
download_package(toolchain_root, pkg_name, pkg_version, compiler)
else:
build_kudu_stub(toolchain_root, pkg_version, compiler)
write_version_file(toolchain_root, pkg_name, pkg_version, compiler,
get_platform_release_label())
把最后一句话注释掉:
def bootstrap(toolchain_root, packages):
"""Downloads and unpacks each package in the list `packages` into `toolchain_root` if it
doesn't exist already.
"""
if not try_get_platform_release_label():
check_custom_toolchain(toolchain_root, packages)
return
# Detect the compiler
compiler = "gcc-{0}".format(os.environ["IMPALA_GCC_VERSION"])
for p in packages:
pkg_name, pkg_version = unpack_name_and_version(p)
if check_for_existing_package(toolchain_root, pkg_name, pkg_version, compiler):
continue
if pkg_name != "kudu" or os.environ["KUDU_IS_SUPPORTED"] == "true":
download_package(toolchain_root, pkg_name, pkg_version, compiler)
else:
build_kudu_stub(toolchain_root, pkg_version, compiler)
"""
write_version_file(toolchain_root, pkg_name, pkg_version, compiler,
get_platform_release_label())
"""
运行buildall.sh后屏幕就会打印出需要下载的东西,上传到toolchain
文件夹就行。上传结束后再把刚才注释的代码恢复就好
加速编译前端
运行下面的命令,去掉测试,只编译前端代码
./buildall.sh -skiptests -fe_only
上面命令运行成功后,找到infra/python/deps/pip_download.py
的下段代码
def download_package(pkg_name, pkg_version):
'''Download the required package. Sometimes the download can be flaky, so we use the
retry decorator.'''
pkg_type = 'sdist' # Don't download wheel archives for now
# This JSON endpoint is not provided by PyPI mirrors so we always need to get this
# from pypi.python.org.
pkg_info = json.loads(urlopen('https://pypi.python.org/pypi/%s/json' % pkg_name).read())
downloader = URLopener()
for pkg in pkg_info['releases'][pkg_version]:
if pkg['packagetype'] == pkg_type:
filename = pkg['filename']
expected_md5 = pkg['md5_digest']
if os.path.isfile(filename) and check_md5sum(filename, expected_md5):
print "File with matching md5sum already exists, skipping %s" % filename
return True
pkg_url = "{0}/packages/{1}".format(PYPI_MIRROR, pkg['path'])
print "Downloading %s from %s" % (filename, pkg_url)
downloader.retrieve(pkg_url, filename)
actual_md5 = md5(open(filename).read()).hexdigest()
if check_md5sum(filename, expected_md5):
return True
else:
print "MD5 mismatch in file %s." % filename
return False
print "Could not find archive to download for %s %s %s" % (
pkg_name, pkg_version, pkg_type)
sys.exit(1)
这段代码会打开url链接下载第三方软件,然后检查md5值,非常慢,所以注释掉整个代码,返回True:
def download_package(pkg_name, pkg_version):
return True
'''Download the required package. Sometimes the download can be flaky, so we use the
retry decorator.'''
"""
pkg_type = 'sdist' # Don't download wheel archives for now
# This JSON endpoint is not provided by PyPI mirrors so we always need to get this
# from pypi.python.org.
pkg_info = json.loads(urlopen('https://pypi.python.org/pypi/%s/json' % pkg_name).read())
downloader = URLopener()
for pkg in pkg_info['releases'][pkg_version]:
if pkg['packagetype'] == pkg_type:
filename = pkg['filename']
expected_md5 = pkg['md5_digest']
if os.path.isfile(filename) and check_md5sum(filename, expected_md5):
print "File with matching md5sum already exists, skipping %s" % filename
return True
pkg_url = "{0}/packages/{1}".format(PYPI_MIRROR, pkg['path'])
print "Downloading %s from %s" % (filename, pkg_url)
downloader.retrieve(pkg_url, filename)
actual_md5 = md5(open(filename).read()).hexdigest()
if check_md5sum(filename, expected_md5):
return True
else:
print "MD5 mismatch in file %s." % filename
return False
print "Could not find archive to download for %s %s %s" % (
pkg_name, pkg_version, pkg_type)
sys.exit(1)
"""
修改前端词法、语法解析源码
impala使用了jflex做词法解析,java_cup去做语法解析。
java_cup有个java_cup.runtime.Symbol
类用来表示解析到的每个词,其中left
属性代表词的行号,right
属性代表词的列号,但是用行列来代表词在sql中的位置很不方便,我想要修改成可以获取当前词的开始、末尾在字符串的下标。
因此编辑fe/src/main/jflex/sql-scanner.flex
文件,增加一个%char
让jflex记录字符偏移变量到yychar
然后把newToken改成这样:
private ExtendSymbol newToken(int id, Object value) {
String text = yytext();
return new ExtendSymbol(id, yyline+1, yycolumn+1, value,
this.yychar, this.yychar + text.length(), text);
}
这样就通过SqlScanner拿到当前词在Reader中的位置了,另外我们希望Symbol还能提供位置信息,所以增加一个子类:
package java_cup.runtime;
import java_cup.runtime.Symbol;
public class ExtendSymbol extends Symbol {
public int start = -1;
public int end = -1;
public String text;
public ExtendSymbol(int id, int left, int right, Object value,
int start, int end, String text) {
super(id, left, right, value);
this.start = start;
this.end = end;
this.text = text;
}
public ExtendSymbol(int id, ExtendSymbol left, ExtendSymbol right, Object value) {
this(id, left.left, right.right, value, left.start, right.end, null);
}
public ExtendSymbol(int id, ExtendSymbol left, ExtendSymbol right) {
this(id, left, right, null);
}
public ExtendSymbol(int id, int left, int right, Object value) {
this(id, left, right, value, -1, -1, null);
}
public ExtendSymbol(int id, Object o) {
this(id, -1, -1, o);
}
public ExtendSymbol(int id, int left, int right) {
this(id, left, right, (Object)null);
}
public ExtendSymbol(int sym_num) {
super(sym_num, -1);
}
ExtendSymbol(int sym_num, int state) {
super(sym_num, state);
}
}
增加一个符号工厂类:
package java_cup.runtime;
import java_cup.runtime.Symbol;
import java_cup.runtime.SymbolFactory;
public class ExtendSymbolFactory implements SymbolFactory {
@Override
public ExtendSymbol newSymbol(String name, int id, Symbol left, Symbol right, Object value) {
return new ExtendSymbol(id, (ExtendSymbol) left, (ExtendSymbol) right, value);
}
@Override
public ExtendSymbol newSymbol(String name, int id, Symbol left, Symbol right) {
return new ExtendSymbol(id, (ExtendSymbol) left, (ExtendSymbol) right);
}
@Override
public ExtendSymbol newSymbol(String name, int id, Object o) {
return new ExtendSymbol(id, o);
}
@Override
public ExtendSymbol newSymbol(String name, int id) {
return new ExtendSymbol(id);
}
@Override
public ExtendSymbol startSymbol(String name, int id, int state) {
return new ExtendSymbol(id, state);
}
}
为语法块增加位置信息就比较复杂了,主要给org/apache/impala/analysis
中的类增加一个带位置信息和子语法块的类:
package org.apache.impala.analysis;
import java.util.List;
public class SyntaxBlock {
public int startPosition = -1;
public int endPosition = -1;
public List<SyntaxBlock> subBlocks;
public SyntaxBlock() {
}
public SyntaxBlock(int startPosition, int endPosition) {
this.startPosition = startPosition;
this.endPosition = endPosition;
}
public SyntaxBlock(int startPosition, int endPosition, List<SyntaxBlock> subBlocks) {
this.startPosition = startPosition;
this.endPosition = endPosition;
this.subBlocks = subBlocks;
}
}
然后加上一个子类ObjectSyntaxBlock
,用来存放String、Object、HashMap、ArrayList、Pair这些类型的语法块
package org.apache.impala.analysis;
import java.util.List;
public class ObjectSyntaxBlock<T> extends SyntaxBlock {
public T objectValue;
public ObjectSyntaxBlock() {
}
public ObjectSyntaxBlock(T objectValue) {
this.objectValue = objectValue;
}
public ObjectSyntaxBlock(int startPosition, int endPosition, T objectValue) {
super(startPosition, endPosition);
this.objectValue = objectValue;
}
public ObjectSyntaxBlock(int startPosition, int endPosition, List<SyntaxBlock> subBlocks, T objectValue) {
super(startPosition, endPosition, subBlocks);
this.objectValue = objectValue;
}
public T getObjectValue() {
return objectValue;
}
}
最难的一步是修改sql-parse.cup
文件,需要把所有语法信息都修改了,非常耗时间
和原版的区别是非终结符的类型如果是String、Object、HashMap、ArrayList、Pair、enum和非org.apache.impala.analysis
包的类的话,需要用ObjectSyntaxBlock包装一层
比如
nonterminal List<UnionOperand> values_operand_list;
nonterminal TDescribeOutputStyle describe_output_style;
需要修改成
nonterminal ObjectSyntaxBlock<List<UnionOperand>> values_operand_list;
nonterminal ObjectSyntaxBlock<TDescribeOutputStyle> describe_output_style;
语法块的定义比如
table_ref ::=
dotted_path:path
{: RESULT = new TableRef(path, null); :}
| dotted_path:path alias_clause:alias
{: RESULT = new TableRef(path, alias); :}
| LPAREN query_stmt:query RPAREN alias_clause:alias
{: RESULT = new InlineViewRef(alias, query); :}
;
需要修改为:
table_ref ::=
dotted_path:path
{:
ExtendSymbol _0_symbol = (ExtendSymbol) CUP$SqlParser$stack.peek();
RESULT = new TableRef(path.objectValue, null);
RESULT.startPosition = _0_symbol.start;
RESULT.endPosition = _0_symbol.end;
RESULT.subBlocks = Lists.newArrayList(
(SyntaxBlock) _0_symbol.value
);
:}
| dotted_path:path alias_clause:alias
{:
ExtendSymbol _1_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 1);
ExtendSymbol _0_symbol = (ExtendSymbol) CUP$SqlParser$stack.peek();
RESULT = new TableRef(path.objectValue, alias.objectValue);
RESULT.startPosition = _1_symbol.start;
RESULT.endPosition = _0_symbol.end;
RESULT.subBlocks = Lists.newArrayList(
(SyntaxBlock) _1_symbol.value,
(SyntaxBlock) _0_symbol.value
);
:}
| LPAREN query_stmt:query RPAREN alias_clause:alias
{:
ExtendSymbol _3_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 3);
ExtendSymbol _2_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 2);
ExtendSymbol _1_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 1);
ExtendSymbol _0_symbol = (ExtendSymbol) CUP$SqlParser$stack.peek();
RESULT = new InlineViewRef(alias.objectValue, query);
RESULT.startPosition = _3_symbol.start;
RESULT.endPosition = _0_symbol.end;
RESULT.subBlocks = Lists.newArrayList(
(SyntaxBlock) _3_symbol.value,
(SyntaxBlock) _2_symbol.value,
(SyntaxBlock) _1_symbol.value,
(SyntaxBlock) _0_symbol.value
);
:}
;
像SelectStmt这种类,我们让他直接或间接继承SyntaxBlock,所以可以直接设置位置信息,不用包装
select_stmt ::=
select_clause:selectList
{:
RESULT = new SelectStmt(selectList, null, null, null, null, null, null);
:}
|
select_clause:selectList
from_clause:fromClause
where_clause:wherePredicate
group_by_clause:groupingExprs
having_clause:havingPredicate
opt_order_by_clause:orderByClause
opt_limit_offset_clause:limitOffsetClause
{:
RESULT = new SelectStmt(selectList, fromClause, wherePredicate, groupingExprs,
havingPredicate, orderByClause, limitOffsetClause);
:}
;
改成:
select_stmt ::=
select_clause:selectList
{:
ExtendSymbol _0_symbol = (ExtendSymbol) CUP$SqlParser$stack.peek();
RESULT = new SelectStmt(selectList, null, null, null, null, null, null);
RESULT.startPosition = _0_symbol.start;
RESULT.endPosition = _0_symbol.end;
RESULT.subBlocks = Lists.newArrayList(
(SyntaxBlock) _0_symbol.value
);
:}
|
select_clause:selectList
from_clause:fromClause
where_clause:wherePredicate
group_by_clause:groupingExprs
having_clause:havingPredicate
opt_order_by_clause:orderByClause
opt_limit_offset_clause:limitOffsetClause
{:
ExtendSymbol _6_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 6);
ExtendSymbol _5_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 5);
ExtendSymbol _4_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 4);
ExtendSymbol _3_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 3);
ExtendSymbol _2_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 2);
ExtendSymbol _1_symbol = (ExtendSymbol) CUP$SqlParser$stack.elementAt(CUP$SqlParser$top - 1);
ExtendSymbol _0_symbol = (ExtendSymbol) CUP$SqlParser$stack.peek();
RESULT = new SelectStmt(selectList, fromClause, wherePredicate, groupingExprs.objectValue,
havingPredicate, orderByClause.objectValue, limitOffsetClause);
RESULT.startPosition = _6_symbol.start;
RESULT.endPosition = _0_symbol.end;
RESULT.subBlocks = Lists.newArrayList(
(SyntaxBlock) _6_symbol.value,
(SyntaxBlock) _5_symbol.value,
(SyntaxBlock) _4_symbol.value,
(SyntaxBlock) _3_symbol.value,
(SyntaxBlock) _2_symbol.value,
(SyntaxBlock) _1_symbol.value,
(SyntaxBlock) _0_symbol.value
);
:}
;
其中groupingExprs.objectValue
是因为groupingExprs
是opt_order_by_clause
语法块的对象,opt_order_by_clause
是ObjectSyntaxBlock类型的,所以本来可以直接引用,现在需要加上objectValue才能访问到包装里面的对象
github地址:impala增加语法块位置库
编译impala、拓展impala语法解析模块的更多相关文章
- 【python】词法语法解析模块ply
官方手册:http://www.dabeaz.com/ply/ply.html 以下例子都来自官方手册: 以四则运算为例: x = 3 + 42 * (s - t) 词法分析: 需要将其分解为: 'x ...
- JSP编译成Servlet(一)语法树的生成——语法解析
一般来说,语句按一定规则进行推导后会形成一个语法树,这种树状结构有利于对语句结构层次的描述.同样Jasper对JSP语法解析后也会生成一棵树,这棵树各个节点包含了不同的信息,但对于JSP来说解析后的语 ...
- 学习Hive和Impala必看经典解析
Hive和Impala作为数据查询工具,它们是怎样来查询数据的呢?与Impala和Hive进行交互,我们有哪些工具可以使用呢? 我们首先明确Hive和Impala分别提供了对应查询的接口: (1)命令 ...
- Android安全攻防战,反编译与混淆技术完全解析(下)
在上一篇文章当中,我们学习了Android程序反编译方面的知识,包括反编译代码.反编译资源.以及重新打包等内容.通过这些内容我们也能看出来,其实我们的程序并没有那么的安全.可能资源被反编译影响还不是很 ...
- Compiler Theory(编译原理)、词法/语法/AST/中间代码优化在Webshell检测上的应用
catalog . 引论 . 构建一个编译器的相关科学 . 程序设计语言基础 . 一个简单的语法制导翻译器 . 简单表达式的翻译器(源代码示例) . 词法分析 . 生成中间代码 . 词法分析器的实现 ...
- Atitit.sql ast 表达式 语法树 语法 解析原理与实现 java php c#.net js python
Atitit.sql ast 表达式 语法树 语法 解析原理与实现 java php c#.net js python 1.1. Sql语法树 ast 如下图锁死1 2. SQL语句解析的思路和过程3 ...
- Android安全攻防战,反编译与混淆技术全然解析(下)
转载请注明出处:http://blog.csdn.net/guolin_blog/article/details/50451259 在上一篇文章其中,我们学习了Android程序反编译方面的知识,包括 ...
- Android安全攻防战,反编译与混淆技术完全解析(上)
转载请注明出处:http://blog.csdn.net/guolin_blog/article/details/49738023 之前一直有犹豫过要不要写这篇文章,毕竟去反编译人家的程序并不是什么值 ...
- With语句以及@contextmanager的语法解析
with 语句以及@contextmanager的语法解析 with语句可以通过很简单的方式来替try/finally语句. with语句中EXPR部分必须是一个包含__enter__()和__e ...
随机推荐
- POJ 2155 Matrix【 二维树状数组 】
题意:给出两种操作,C是给出一个矩形的左上角和左下角的下标,把这个矩形里面的0变成1,1变成0,Q是询问某个点的值 看这篇论文讲得很清楚 http://wenku.baidu.com/view/1e5 ...
- 理解ZBrush中的笔触
笔触主要配合笔刷来使用,同样的笔刷搭配不同的笔触可以绘制出各种不同的效果.简言之,ZBrush 4R8就是提供了各种各样的笔触效果,例如,有模拟连贯笔触的效果,也有模拟喷枪喷洒的笔触效果. 下面简单认 ...
- str 数据类型的用法
---------------------------------------------------------------------------------------------------- ...
- CSS常用样式--background
CSS background 属性 参考:W3school- CSS background 所有浏览器都支持 background 属性,其简写形式,在一个声明中设置所有的背景属性,各属性需按顺序,语 ...
- POJ-2420 A Star not a Tree? 梯度下降 | 模拟退火
题目链接:https://cn.vjudge.net/problem/POJ-2420 题意 给出n个点,找一个点,使得这个点到其余所有点距离之和最小. 思路 一开始就在抖机灵考虑梯度下降,猜测是个凸 ...
- PostGIS解析Geometry几何对象
一.Geometry转WKT select st_astext(geom) where tableName; 二.PostGIS常用函数 wkt转geometry st_geomfromtext(wk ...
- wget 升级
漏洞描述: Wget是GNU计划开发的一套用于在网络上进行下载的自由软件,是Unix/Linux系统最常用的下载工具,支持通过HTTP.HTTPS以及FTP这三个最常见的TCP/IP协议下载. Wge ...
- HDU 4756 Install Air Conditioning(次小生成树)
题目大意:给你n个点然后让你求出去掉一条边之后所形成的最小生成树. 比較基础的次小生成树吧. ..先prime一遍求出最小生成树.在dfs求出次小生成树. Install Air Conditioni ...
- 对于startActivity的使用改进
传统方式 一直以来,使用startActivity的方式就是例如以下: 比方从AActivity跳转到BActivity.那么我们是在AActivity中这样去写: Intent intent = n ...
- Swift,Objective-C语言性能对照測试
原文发表于踏得网 Swift包括了非常多现代语言特性尤其是从一些脚本语言如Javascript/Ruby中汲取了营养. 此外苹果公布Swift时,使用特别选用的一些样例来宣称Swift性能对于Ojbe ...