[HIve - LanguageManual] XPathUDF
Documentation for Built-In User-Defined Functions Related To XPath
UDFs
xpath, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double, xpath_number, xpath_string
- Functions for parsing XML data using XPath expressions.
- Since version: 0.6.0
Overview
The xpath family of UDFs are wrappers around the Java XPath library javax.xml.xpath
provided by the JDK. The library is based on the XPath 1.0 specification. Please refer to http://java.sun.com/javase/6/docs/api/javax/xml/xpath/package-summary.html for detailed information on the Java XPath library.
All functions follow the form: xpath_*(xml_string, xpath_expression_string)
. The XPath expression string is compiled and cached. It is reused if the expression in the next input row matches the previous. Otherwise, it is recompiled. So, the xml string is always parsed for every input row, but the xpath expression is precompiled and reused for the vast majority of use cases.
Backward axes are supported. For example:
> select xpath ( '<a><b id="1"><c/></b><b id="2"><c/></b></a>' , '/descendant::c/ancestor::b/@id' ) from t1 limit 1 ; [ 1 "," 2 ] |
Each function returns a specific Hive type given the XPath expression:
xpath
returns a Hive array of strings.xpath_string
returns a string.xpath_boolean
returns a boolean.xpath_short
returns a short integer.xpath_int
returns an integer.xpath_long
returns a long integer.xpath_float
returns a floating point number.xpath_double,xpath_number
returns a double-precision floating point number (xpath_number
is an alias forxpath_double
).
The UDFs are schema agnostic - no XML validation is performed. However, malformed xml (e.g., <a><b>1</b></aa>
) will result in a runtime exception being thrown.
Following are specifics on each xpath UDF variant.
xpath
The xpath()
function always returns a hive array of strings. If the expression results in a non-text value (e.g., another xml node) the function will return an empty array. There are 2 primary uses for this function: to get a list of node text values or to get a list of attribute values.
Examples:
Non-matching XPath expression:
> select xpath( '<a><b>b1</b><b>b2</b></a>' , 'a/*' ) from src limit 1 ; [] |
Get a list of node text values:
> select xpath( '<a><b>b1</b><b>b2</b></a>' , 'a/*/text()' ) from src limit 1 ; [b1 "," b2] |
Get a list of values for attribute 'id':
> select xpath( '<a><b id="foo">b1</b><b id="bar">b2</b></a>' , '//@id' ) from src limit 1 ; [foo "," bar] |
Get a list of node texts for nodes where the 'class' attribute equals 'bb':
> SELECT xpath ( '<a><b class="bb">b1</b><b>b2</b><b>b3</b><c class="bb">c1</c><c>c2</c></a>' , 'a/*[@class="bb"]/text()' ) FROM src LIMIT 1 ; [b1 "," c1] |
xpath_string
The xpath_string()
function returns the text of the first matching node.
Get the text for node 'a/b':
> SELECT xpath_string ( '<a><b>bb</b><c>cc</c></a>' , 'a/b' ) FROM src LIMIT 1 ; bb |
Get the text for node 'a'. Because 'a' has children nodes with text, the result is a composite of text from the children.
> SELECT xpath_string ( '<a><b>bb</b><c>cc</c></a>' , 'a' ) FROM src LIMIT 1 ; bbcc |
Non-matching expression returns an empty string:
> SELECT xpath_string ( '<a><b>bb</b><c>cc</c></a>' , 'a/d' ) FROM src LIMIT 1 ; |
Gets the text of the first node that matches '//b':
> SELECT xpath_string ( '<a><b>b1</b><b>b2</b></a>' , '//b' ) FROM src LIMIT 1 ; b1 |
Gets the second matching node:
> SELECT xpath_string ( '<a><b>b1</b><b>b2</b></a>' , 'a/b[2]' ) FROM src LIMIT 1 ; b2 |
Gets the text from the first node that has an attribute 'id' with value 'b_2':
> SELECT xpath_string ( '<a><b>b1</b><b id="b_2">b2</b></a>' , 'a/b[@id="b_2"]' ) FROM src LIMIT 1 ; b2 |
xpath_boolean
Returns true if the XPath expression evaluates to true, or if a matching node is found.
Match found:
> SELECT xpath_boolean ( '<a><b>b</b></a>' , 'a/b' ) FROM src LIMIT 1 ; true |
No match found:
> SELECT xpath_boolean ( '<a><b>b</b></a>' , 'a/c' ) FROM src LIMIT 1 ; false |
Match found:
> SELECT xpath_boolean ( '<a><b>b</b></a>' , 'a/b = "b"' ) FROM src LIMIT 1 ; true |
No match found:
> SELECT xpath_boolean ( '<a><b>10</b></a>' , 'a/b < 10' ) FROM src LIMIT 1 ; false |
xpath_short, xpath_int, xpath_long
These functions return an integer numeric value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Mathematical operations are supported. In cases where the value overflows the return type, then the maximum value for the type is returned.
No match:
> SELECT xpath_int ( '<a>b</a>' , 'a = 10' ) FROM src LIMIT 1 ; 0 |
Non-numeric match:
> SELECT xpath_int ( '<a>this is not a number</a>' , 'a' ) FROM src LIMIT 1 ; 0 > SELECT xpath_int ( '<a>this 2 is not a number</a>' , 'a' ) FROM src LIMIT 1 ; 0 |
Adding values:
> SELECT xpath_int ( '<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>' , 'sum(a/*)' ) FROM src LIMIT 1 ; 15 > SELECT xpath_int ( '<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>' , 'sum(a/b)' ) FROM src LIMIT 1 ; 7 > SELECT xpath_int ( '<a><b class="odd">1</b><b class="even">2</b><b class="odd">4</b><c>8</c></a>' , 'sum(a/b[@class="odd"])' ) FROM src LIMIT 1 ; 5 |
Overflow:
> SELECT xpath_int ( '<a><b>2000000000</b><c>40000000000</c></a>' , 'a/b * a/c' ) FROM src LIMIT 1 ; 2147483647 |
xpath_float, xpath_double, xpath_number
Similar to xpath_short, xpath_int and xpath_long but with floating point semantics. Non-matches result in zero. However,
non-numeric matches result in NaN. Note that xpath_number()
is an alias for xpath_double()
.
No match:
> SELECT xpath_double ( '<a>b</a>' , 'a = 10' ) FROM src LIMIT 1 ; 0.0 |
Non-numeric match:
> SELECT xpath_double ( '<a>this is not a number</a>' , 'a' ) FROM src LIMIT 1 ; NaN |
A very large number:
SELECT xpath_double ( '<a><b>2000000000</b><c>40000000000</c></a>' , 'a/b * a/c' ) FROM src LIMIT 1 ; 8 .0E19 |
[HIve - LanguageManual] XPathUDF的更多相关文章
- [HIve - LanguageManual] Hive Operators and User-Defined Functions (UDFs)
Hive Operators and User-Defined Functions (UDFs) Hive Operators and User-Defined Functions (UDFs) Bu ...
- [Hive - LanguageManual ] Windowing and Analytics Functions (待)
LanguageManual WindowingAndAnalytics Skip to end of metadata Added by Lefty Leverenz, last edi ...
- [Hive - LanguageManual] Import/Export
LanguageManual ImportExport Skip to end of metadata Added by Carl Steinbach, last edited by Le ...
- [Hive - LanguageManual] DML: Load, Insert, Update, Delete
LanguageManual DML Hive Data Manipulation Language Hive Data Manipulation Language Loading files int ...
- [Hive - LanguageManual] Alter Table/Partition/Column
Alter Table/Partition/Column Alter Table Rename Table Alter Table Properties Alter Table Comment Add ...
- Hive LanguageManual DDL
hive语法规则LanguageManual DDL SQL DML 和 DDL 数据操作语言 (DML) 和 数据定义语言 (DDL) 一.数据库 增删改都在文档里说得也很明白,不重复造车轮 二.表 ...
- [Hive - LanguageManual ] ]SQL Standard Based Hive Authorization
Status of Hive Authorization before Hive 0.13 SQL Standards Based Hive Authorization (New in Hive 0. ...
- [Hive - LanguageManual] Hive Concurrency Model (待)
Hive Concurrency Model Hive Concurrency Model Use Cases Turn Off Concurrency Debugging Configuration ...
- [Hive - LanguageManual ] Explain (待)
EXPLAIN Syntax EXPLAIN Syntax Hive provides an EXPLAIN command that shows the execution plan for a q ...
随机推荐
- No compiler is provided in this environment. Perhaps you are running on a JRE rather than a JDK? 问题
maven编译项目时出错,提示信息如下: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3 ...
- Linux下安装、配置、启动Apache
http://www.cnblogs.com/zhuque/archive/2012/11/03/2763352.html#
- OpenCV源码阅读(3)---base.hpp
base.h处于core模块中,是OpenCV的核心类.其作用是定义了OpenCV的基本错误类型,在程序运行出现错误是抛出错误,防止数据溢出.总而言之,其功能主要是考虑程序的健壮性. 头文件 #ifn ...
- Spring配置概述
1.Spring容器 1)要使应用程序中的Spring容器成功启动,需要以下三方面的条件都具备: · Spring架构的类包都已经放在应用程序的类路径下: · 应用程序为Spring提供完备的Bean ...
- PHP的线程安全与非线程安全版本的区别
Windows版的PHP从版本5.2.1开始有Thread Safe(线程安全)和None Thread Safe(NTS,非线程安全)之分,这两者不同在于何处?到底应该用哪种?这里做一个简单的介绍. ...
- Android zxing连续扫描
initCamera(); if (mHandler != null) mHandler.restartPreviewAndDecode(); 在扫描完毕后执行这3句即可. 说明: 1.扫描处理方法为 ...
- trackr: An AngularJS app with a Java 8 backend – Part I
该系列文章来自techdev 我想分享在techdev公司开发的项目-trackr-的一些最新的见解.trackr是一个用来跟踪我们的工作时间,创建报告和管理请假的web应用程序.做这个程序的目的有两 ...
- POJ 2947 Widget Factory (高斯消元 判多解 无解 和解集 模7情况)
题目链接 题意: 公司被吞并,老员工几乎全部被炒鱿鱼.一共有n种不同的工具,编号1-N(代码中是0—N-1), 每种工具的加工时间为3—9天 ,但是现在老员工不在我们不知道每种工具的加工时间,庆幸的是 ...
- 学军NOI训练13 T3 白黑树
唉,大学军有自己的OJ就是好,无限orz 只有周六的比赛是开放的囧,这场比赛最后因为虚拟机卡住没有及时提交…… 否则就能让大家看到我有多弱了…… 前两题题解写的很详细,可以自己去看,我来随便扯扯T3好 ...
- bzoj4042
比较好的树形dp,涉及到树上路径的题目,我们往往考虑对路径分类 当我们考虑以x为根的子树,有这样几类路径 1. 起点终点都在子树内 2. 一个点延伸到子树外 对于要选择另一个点在子树外的路径,要建立在 ...