ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...

: jdbc:hive2://master01.hadoop.dtmobile.cn:1> select * from cell_random_grid_tmp2 limit 1;
INFO : Compiling command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5):
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:grid_row_id, type:,), comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5); Time taken: 0.045 seconds
INFO : Executing command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5):
INFO : Completed executing command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5); Time taken: 0.001 seconds
INFO : OK
Error: java.io.IOException: parquet.io.ParquetDecodingException: Can not read value at   in file hdfs://master01.hadoop.dtmobile.cn:8020/user/hive/warehouse/capacity.db/cell_random_grid_tmp2/part-00000-82a689a5-7c2a-48a0-ab17-8bf04c963ea6-c000.snappy.parquet (state=,code=0)
: jdbc:hive2://master01.hadoop.dtmobile.cn:1>

通过spark2.3 sparksql执行写数据到hive，saveAsTable()，sparksql写数据到hive时候，默认是保存为parquet+snappy的数据。在数据保存完成之后，通过hive beeline查询，报错如上。但是通过spark查询，执行正常。

在stackoverflow上找到同样的问题：

根本原因如下：

This issue is caused because of different parquet conventions used in Hive and Spark. In Hive, the decimal datatype is represented as fixed bytes (INT 32). In Spark 1.4 or later the default convention is to use the Standard Parquet representation for decimal data type. As per the Standard Parquet representation based on the precision of the column datatype, the underlying representation changes.
eg: DECIMAL can be used to annotate the following types: int32: for 1 <= precision <= 9 int64: for 1 <= precision <= 18; precision < 10 will produce a warning

Hence this issue happens only with the usage of datatypes which have different representations in the different Parquet conventions. If the datatype is DECIMAL (10,3), both the conventions represent it as INT32, hence we won't face an issue. If you are not aware of the internal representation of the datatypes it is safe to use the same convention used for writing while reading. With Hive, you do not have the flexibility to choose the Parquet convention. But with Spark, you do.

Solution: The convention used by Spark to write Parquet data is configurable. This is determined by the property spark.sql.parquet.writeLegacyFormat The default value is false. If set to "true", Spark will use the same convention as Hive for writing the Parquet data. This will help to solve the issue.

所以尝试调整参数 spark.sql.parquet.writeLegacyFormat = true，问题解决。

到spark2.3源代码中查找该参数(spark.sql.parquet.writeLegacyFormat)：

package org.apache.spark.sql.internal 中关于sparksql的默认配置 SQLConf.scala中相关描述如下

  val PARQUET_WRITE_LEGACY_FORMAT = buildConf("spark.sql.parquet.writeLegacyFormat")
    .doc("Whether to be compatible with the legacy Parquet format adopted by Spark 1.4 and prior " +
      "versions, when converting Parquet schema to Spark SQL schema and vice versa.")
    .booleanConf
    .createWithDefault(false)

可以看到默认值为false

在 package org.apache.spark.sql.execution.datasources.parquet 的关于ParquetWriteSupport.scala 的描述如下：

/**
 * A Parquet [[WriteSupport]] implementation that writes Catalyst [[InternalRow]]s as Parquet
 * messages.  This class can write Parquet data in two modes:
 *
 *  - Standard mode: Parquet data are written in standard format defined in parquet-format spec.
 *  - Legacy mode: Parquet data are written in legacy format compatible with Spark 1.4 and prior.
 *
 * This behavior can be controlled by SQL option `spark.sql.parquet.writeLegacyFormat`.  The value
 * of this option is propagated to this class by the `init()` method and its Hadoop configuration
 * argument.
 */

ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...的更多相关文章

python3: error while loading shared libraries: libpython3.5m.so.1.0: cannot open shared object file: No such file or directory
安装python3遇到报错: wget https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz ./configure --prefix=/u ...
svnadmin:error while loading shared libraries: libaprutil-1.so.0:cannot open shared object file: No such file or directory
wdcp下安装svn后一直提示 svnadmin:error while loading shared libraries: libaprutil-1.so.0:cannot open shared ...
动态链接库找不到： error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory
问题: 运行gsl(GNU scientific Library)的函数库,用 gcc erf.c -I/usr/local/include -L/usr/local/lib64 -L/usr/loc ...
解决libpython2.6.so.1.0: cannot open shared object file
文章解决的问题:安装nginx中需要Python2.6的支持,下面介绍如何安装Python2.6,并建立lib的连接. 问题展示:error while loading shared librarie ...
./filezilla: error while loading shared libraries: libpng12.so.0: cannot open shared object file: No such file or directory
opensuse系统在filezilla官网下载压缩文件解压运行后报 ./filezilla: error while loading shared libraries: libpng12.so.0 ...
error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file
安装rac10g,出现例如以下错误: [root@rac2 oracle]# /u01/product/crs/root.sh WARNING: directory '/u01/product' is ...
tensorflow-gpu版本出现libcublas.so.8.0:cannot open shared object file
文章主要参考以下博客https://www.aliyun.com/zixun/wenji/1289957.html 在利用GPU加速tensorflow时,出现了libcublas.so.8.0:ca ...
ubuntu下tensorflow 报错 libcusolver.so.8.0: cannot open shared object file: No such file or directory
解决方法1. 在终端执行: export LD_LIBRARY_PATH=”$LD_LIBRARY_PATH:/usr/local/cuda/lib64” export CUDA_HOME=/usr/ ...
〖Android〗arm-linux-androideabi-gdb报 libpython2.6.so.1.0: cannot open shared object file错误的解决方法
执行: prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.6/bin/arm-linux-androideabi-gdb out/target/p ...

随机推荐

算法与数据结构基础 - 广度优先搜索(BFS)
BFS基础广度优先搜索(Breadth First Search)用于按离始节点距离.由近到远渐次访问图的节点,可视化BFS 通常使用队列(queue)结构模拟BFS过程,关于queue见:算法与数 ...
[填坑] ubuntu检测不到外接显示器
笔记本是win10+ubuntu18双系统,今天ubuntu(开启nivida独显状态)突然无法连外接屏幕,但切换win10就可以显示. 贴吧找到的简单解决方法,不需要重装驱动,记录分享在这里: su ...
jsp数据交互（二）.3
01.Application原理与应用 01.application对象的作用域范围是整个应用服务,而它在应用中所承担的责任就类似于一个全局变量.只要服务启动,则application对象就会存在. ...
IO流1
一.I/0:input/output1.java.io.file表示:文件或文件夹(目录)File f = new File("文件路径");注意:相对路径:非web项目的相对都是 ...
spring 的权限控制：security
下面我们将实现关于Spring Security3的一系列教程. 最终的目标是整合Spring Security + Spring3MVC 完成类似于SpringSide3中mini-web的功能. ...
4. 源码分析---SOFARPC服务端暴露
服务端的示例我们首先贴上我们的服务端的示例: public static void main(String[] args) { ServerConfig serverConfig = new Ser ...
利用模板生成html页面(NVelocity)
公司的网站需要有些新闻,每次的新闻格式都是一样的,而不想每次都查询操作,所以想把这些新闻的页面保存成静态的html,之后搜索了下就找到了这个模板引擎,当然其他的模板引擎可以的,例如:Razor,自己写 ...
luogu1330_封锁阳光大学图的遍历
传送门解释:(转自洛谷题解) 首先,肯定要明确一点,那就是这个图是不一定联通的.于是,我们就可以将整张图切分成许多分开的连同子图来处理.然而最重要的事情是:如何处理一个连通图? 乍看下去,似乎无从下 ...
git bash 初始化配置
这里只针对 windows 下,使用git 时的一些初始配置 1. git bash 安装下载地址: https://git-for-windows.github.io/ 根据提示,一步步安装即可 ...
还在为垂直居中苦恼？CSS 布局利器 flexbox 轻轻松松帮你搞定
传统的 CSS 布局方式是基于盒模型(它是根据盒子与父盒子以及兄弟盒子的关系确定大小和位置的算法),实现时依赖于 block, inline, table, position, float 这些属性, ...

ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...

ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...的更多相关文章

随机推荐

热门专题