: jdbc:hive2://master01.hadoop.dtmobile.cn:1> select * from cell_random_grid_tmp2 limit 1;
INFO : Compiling command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5):
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:grid_row_id, type:,), comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5); Time taken: 0.045 seconds
INFO : Executing command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5):
INFO : Completed executing command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5); Time taken: 0.001 seconds
INFO : OK
Error: java.io.IOException: parquet.io.ParquetDecodingException: Can not read value at   in file hdfs://master01.hadoop.dtmobile.cn:8020/user/hive/warehouse/capacity.db/cell_random_grid_tmp2/part-00000-82a689a5-7c2a-48a0-ab17-8bf04c963ea6-c000.snappy.parquet (state=,code=0)
: jdbc:hive2://master01.hadoop.dtmobile.cn:1>

通过spark2.3 sparksql执行写数据到hive,saveAsTable(),sparksql写数据到hive时候,默认是保存为parquet+snappy的数据。在数据保存完成之后,通过hive beeline查询,报错如上。但是通过spark查询,执行正常。

stackoverflow上找到同样的问题:

根本原因如下:

This issue is caused because of different parquet conventions used in Hive and Spark. In Hive, the decimal datatype is represented as fixed bytes (INT 32). In Spark 1.4 or later the default convention is to use the Standard Parquet representation for decimal data type. As per the Standard Parquet representation based on the precision of the column datatype, the underlying representation changes.
eg: DECIMAL can be used to annotate the following types: int32: for 1 <= precision <= 9 int64: for 1 <= precision <= 18; precision < 10 will produce a warning

Hence this issue happens only with the usage of datatypes which have different representations in the different Parquet conventions. If the datatype is DECIMAL (10,3), both the conventions represent it as INT32, hence we won't face an issue. If you are not aware of the internal representation of the datatypes it is safe to use the same convention used for writing while reading. With Hive, you do not have the flexibility to choose the Parquet convention. But with Spark, you do.

Solution: The convention used by Spark to write Parquet data is configurable. This is determined by the property spark.sql.parquet.writeLegacyFormat The default value is false. If set to "true", Spark will use the same convention as Hive for writing the Parquet data. This will help to solve the issue.

所以尝试调整参数 spark.sql.parquet.writeLegacyFormat = true,问题解决。

到spark2.3源代码中查找该参数(spark.sql.parquet.writeLegacyFormat):

package org.apache.spark.sql.internal 中 关于sparksql的默认配置 SQLConf.scala中相关描述如下

  val PARQUET_WRITE_LEGACY_FORMAT = buildConf("spark.sql.parquet.writeLegacyFormat")
    .doc("Whether to be compatible with the legacy Parquet format adopted by Spark 1.4 and prior " +
      "versions, when converting Parquet schema to Spark SQL schema and vice versa.")
    .booleanConf
    .createWithDefault(false)

可以看到默认值为false

在 package org.apache.spark.sql.execution.datasources.parquet 的关于ParquetWriteSupport.scala 的描述如下:

/**
 * A Parquet [[WriteSupport]] implementation that writes Catalyst [[InternalRow]]s as Parquet
 * messages.  This class can write Parquet data in two modes:
 *
 *  - Standard mode: Parquet data are written in standard format defined in parquet-format spec.
 *  - Legacy mode: Parquet data are written in legacy format compatible with Spark 1.4 and prior.
 *
 * This behavior can be controlled by SQL option `spark.sql.parquet.writeLegacyFormat`.  The value
 * of this option is propagated to this class by the `init()` method and its Hadoop configuration
 * argument.
 */

ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...的更多相关文章

  1. python3: error while loading shared libraries: libpython3.5m.so.1.0: cannot open shared object file: No such file or directory

    安装python3遇到报错: wget https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz ./configure --prefix=/u ...

  2. svnadmin:error while loading shared libraries: libaprutil-1.so.0:cannot open shared object file: No such file or directory

    wdcp下安装svn后一直提示 svnadmin:error while loading shared libraries: libaprutil-1.so.0:cannot open shared ...

  3. 动态链接库找不到 : error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory

    问题: 运行gsl(GNU scientific Library)的函数库,用 gcc erf.c -I/usr/local/include -L/usr/local/lib64 -L/usr/loc ...

  4. 解决libpython2.6.so.1.0: cannot open shared object file

    文章解决的问题:安装nginx中需要Python2.6的支持,下面介绍如何安装Python2.6,并建立lib的连接. 问题展示:error while loading shared librarie ...

  5. ./filezilla: error while loading shared libraries: libpng12.so.0: cannot open shared object file: No such file or directory

    opensuse系统 在filezilla官网下载压缩文件解压运行后报 ./filezilla: error while loading shared libraries: libpng12.so.0 ...

  6. error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file

    安装rac10g,出现例如以下错误: [root@rac2 oracle]# /u01/product/crs/root.sh WARNING: directory '/u01/product' is ...

  7. tensorflow-gpu版本出现libcublas.so.8.0:cannot open shared object file

    文章主要参考以下博客https://www.aliyun.com/zixun/wenji/1289957.html 在利用GPU加速tensorflow时,出现了libcublas.so.8.0:ca ...

  8. ubuntu下tensorflow 报错 libcusolver.so.8.0: cannot open shared object file: No such file or directory

    解决方法1. 在终端执行: export LD_LIBRARY_PATH=”$LD_LIBRARY_PATH:/usr/local/cuda/lib64” export CUDA_HOME=/usr/ ...

  9. 〖Android〗arm-linux-androideabi-gdb报 libpython2.6.so.1.0: cannot open shared object file错误的解决方法

    执行: prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.6/bin/arm-linux-androideabi-gdb out/target/p ...

随机推荐

  1. Java入门 面向对象第一天

    面向对象 人为抽象的一种编程模型,在理解面向对象的代码时要按照抽象的模型来理解,不能只从代码字面来理解复杂的问题,学会拆分成一个一个独立的小问题,通过解决每一个小问题,最后解决一个大问题 类 类是事物 ...

  2. 【iOS】UIButton 常用属性

    发现 UIButton 的相关属性不熟悉了……常用的一些属性代码如下: UIButton *add = [UIButton buttonWithType:UIButtonTypeCustom]; ad ...

  3. 七分钟理解什么是 KMP 算法

    本文是介绍 什么是 BF算法.KMP算法.BM算法 三部曲之一. KMP算法 内部涉及到的数学原理与知识太多,本文只会对 KMP算法 的运行过程. 部分匹配表 .next数组 进行介绍,如果理解了这三 ...

  4. 2019前端面试系列——JS高频手写代码题

    实现 new 方法 /* * 1.创建一个空对象 * 2.链接到原型 * 3.绑定this值 * 4.返回新对象 */ // 第一种实现 function createNew() { let obj ...

  5. MOCTF-WEB-writeup

    MOCTF-WEB-writeup 好菜,除了简单的几个题,自己会做,难的都是看老大WP完成的,太菜了 啥姿势都不会,就此记录一下,供日后查看及反省.菜鸡的自我修养 0x01 一道水题 题目链接:ht ...

  6. 使用jvisualvm.exe工具远程监视tomcat的线程运行状态

    一.简述 在web项目中,常使用tomcat作为web容器.代码编写的时候,由于业务需要,也常会使用线程机制.在系统运行一段时间之后,若出现响应慢或线程之间出现死锁的情况,要查出问题所在,需要使用jd ...

  7. ceph 初始化函数解析

    global_pre_init 预初始化函数,解析ceph.conf配置文件, 初始化定义global_context 和 config的全局变量. 全局预初始化函数 CINIT_FLAG_UNPRI ...

  8. JVM和GC的工作原理

    转载于https://uestc-dpz.github.io JVM Java 虚拟机 Java 虚拟机(Java virtual machine,JVM)是运行 Java 程序必不可少的机制.JVM ...

  9. Java 通过反射改变私有变量的值

    直接上代码 import java.lang.reflect.Field; public class Main {      public static void main(String[] args ...

  10. ssm执行流程

    SSM运行流程 1:服务器启动,创建springmvc的前端控制器DispatcherServlet,创建Spring容器对象. 加载spring-servlet.xml .applicationCo ...