ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...
: jdbc:hive2://master01.hadoop.dtmobile.cn:1> select * from cell_random_grid_tmp2 limit 1; INFO : Compiling command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5): INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:grid_row_id, type:,), comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5); Time taken: 0.045 seconds INFO : Executing command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5): INFO : Completed executing command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5); Time taken: 0.001 seconds INFO : OK Error: java.io.IOException: parquet.io.ParquetDecodingException: Can not read value at in file hdfs://master01.hadoop.dtmobile.cn:8020/user/hive/warehouse/capacity.db/cell_random_grid_tmp2/part-00000-82a689a5-7c2a-48a0-ab17-8bf04c963ea6-c000.snappy.parquet (state=,code=0) : jdbc:hive2://master01.hadoop.dtmobile.cn:1>
通过spark2.3 sparksql执行写数据到hive,saveAsTable(),sparksql写数据到hive时候,默认是保存为parquet+snappy的数据。在数据保存完成之后,通过hive beeline查询,报错如上。但是通过spark查询,执行正常。
在stackoverflow上找到同样的问题:
根本原因如下:
This issue is caused because of different parquet conventions used in Hive and Spark. In Hive, the decimal datatype is represented as fixed bytes (INT 32). In Spark 1.4 or later the default convention is to use the Standard Parquet representation for decimal data type. As per the Standard Parquet representation based on the precision of the column datatype, the underlying representation changes.
eg: DECIMAL can be used to annotate the following types: int32: for 1 <= precision <= 9 int64: for 1 <= precision <= 18; precision < 10 will produce a warning
Hence this issue happens only with the usage of datatypes which have different representations in the different Parquet conventions. If the datatype is DECIMAL (10,3), both the conventions represent it as INT32, hence we won't face an issue. If you are not aware of the internal representation of the datatypes it is safe to use the same convention used for writing while reading. With Hive, you do not have the flexibility to choose the Parquet convention. But with Spark, you do.
Solution: The convention used by Spark to write Parquet data is configurable. This is determined by the property spark.sql.parquet.writeLegacyFormat The default value is false. If set to "true", Spark will use the same convention as Hive for writing the Parquet data. This will help to solve the issue.
所以尝试调整参数 spark.sql.parquet.writeLegacyFormat = true,问题解决。
到spark2.3源代码中查找该参数(spark.sql.parquet.writeLegacyFormat):
package org.apache.spark.sql.internal 中 关于sparksql的默认配置 SQLConf.scala中相关描述如下
val PARQUET_WRITE_LEGACY_FORMAT = buildConf("spark.sql.parquet.writeLegacyFormat") .doc("Whether to be compatible with the legacy Parquet format adopted by Spark 1.4 and prior " + "versions, when converting Parquet schema to Spark SQL schema and vice versa.") .booleanConf .createWithDefault(false)
可以看到默认值为false
在 package org.apache.spark.sql.execution.datasources.parquet 的关于ParquetWriteSupport.scala 的描述如下:
/** * A Parquet [[WriteSupport]] implementation that writes Catalyst [[InternalRow]]s as Parquet * messages. This class can write Parquet data in two modes: * * - Standard mode: Parquet data are written in standard format defined in parquet-format spec. * - Legacy mode: Parquet data are written in legacy format compatible with Spark 1.4 and prior. * * This behavior can be controlled by SQL option `spark.sql.parquet.writeLegacyFormat`. The value * of this option is propagated to this class by the `init()` method and its Hadoop configuration * argument. */
ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...的更多相关文章
- python3: error while loading shared libraries: libpython3.5m.so.1.0: cannot open shared object file: No such file or directory
安装python3遇到报错: wget https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz ./configure --prefix=/u ...
- svnadmin:error while loading shared libraries: libaprutil-1.so.0:cannot open shared object file: No such file or directory
wdcp下安装svn后一直提示 svnadmin:error while loading shared libraries: libaprutil-1.so.0:cannot open shared ...
- 动态链接库找不到 : error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory
问题: 运行gsl(GNU scientific Library)的函数库,用 gcc erf.c -I/usr/local/include -L/usr/local/lib64 -L/usr/loc ...
- 解决libpython2.6.so.1.0: cannot open shared object file
文章解决的问题:安装nginx中需要Python2.6的支持,下面介绍如何安装Python2.6,并建立lib的连接. 问题展示:error while loading shared librarie ...
- ./filezilla: error while loading shared libraries: libpng12.so.0: cannot open shared object file: No such file or directory
opensuse系统 在filezilla官网下载压缩文件解压运行后报 ./filezilla: error while loading shared libraries: libpng12.so.0 ...
- error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file
安装rac10g,出现例如以下错误: [root@rac2 oracle]# /u01/product/crs/root.sh WARNING: directory '/u01/product' is ...
- tensorflow-gpu版本出现libcublas.so.8.0:cannot open shared object file
文章主要参考以下博客https://www.aliyun.com/zixun/wenji/1289957.html 在利用GPU加速tensorflow时,出现了libcublas.so.8.0:ca ...
- ubuntu下tensorflow 报错 libcusolver.so.8.0: cannot open shared object file: No such file or directory
解决方法1. 在终端执行: export LD_LIBRARY_PATH=”$LD_LIBRARY_PATH:/usr/local/cuda/lib64” export CUDA_HOME=/usr/ ...
- 〖Android〗arm-linux-androideabi-gdb报 libpython2.6.so.1.0: cannot open shared object file错误的解决方法
执行: prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.6/bin/arm-linux-androideabi-gdb out/target/p ...
随机推荐
- golang在多个go routine中进行map或者slice操作应该注意的对象。
因为golang的map和列表切片都是引用类型,且非线程安全的,所以在多个go routine中进行读写操作的时候,会产生“map read and map write“的panic错误. 某一些类型 ...
- 【TensorFlow 3】mnist数据集:与Keras对比
在TF1.8之后Keras被当作为一个内置API:tf.keras. 并且之前的下载语句会报错. mnist = input_data.read_data_sets('MNIST_data',one_ ...
- 2019牛客暑期多校训练营(第四场)K.number
>传送门< 题意:给你一个字符串s,求出其中能整除300的子串个数(子串要求是连续的,允许前面有0) 思路: >动态规划 记f[i][j]为右端点满足mod 300 = j的子串个数 ...
- jenkins下使用python虚拟环境
jenkins下使用python虚拟环境碰到的一些坑: 1. 构建使用window批处理 - 坑1 c: cd c:\xxxxx\xxxxx\scripts activate c: cd c:\xxx ...
- React Hooks 深入系列 —— 设计模式
本文是 React Hooks 深入系列的后续.此篇详细介绍了 Hooks 相对 class 的优势所在, 并介绍了相关 api 的设计思想, 同时对 Hooks 如何对齐 class 的生命周期钩子 ...
- ImageView 使用详解
极力推荐文章:欢迎收藏 Android 干货分享 阅读五分钟,每日十点,和您一起终身学习,这里是程序员Android 本篇文章主要介绍 Android 开发中的部分知识点,通过阅读本篇文章,您将收获以 ...
- 改 Anaconda Jupyter Notebook 开发文件保存目录
1.打开cmd,输入命令找到配置文件路径 jupyter notebook --generate-config 2.打开 jupyter_notebook_config.py 修改配置 c.Noteb ...
- kubernetes lowB安装方式
kubernetes离线安装包,仅需三步 基础环境 关闭防火墙 selinux $ systemctl stop firewalld && systemctl disable fire ...
- ASP.NET Core 框架本质学习
本文作为学习过程中的一个记录. 学习文章地址: https://www.cnblogs.com/artech/p/inside-asp-net-core-framework.html 一. ASP.N ...
- Vue创建项目配置
前言 安装VS Code,开始vue的学习及编程,但是总是遇到各种各样的错误,控制台语法错误,格式错误.一股脑的袭来,感觉创建个项目怎么这个麻烦.这里就讲一下vue的安装及创建. 安装环境 当然第一步 ...