Java 解析Tiff深入研究

最近在读取客户发过来的tiff文件是，底层竟然报错了，错误：bandOffsets.length is wrong! 没办法，因为错误消息出现在tiff的read中，因此就对

底层序中tiff读取的代码进行了研究。

之前有一篇文章，我简单的介绍了Geotools读取Tiff的代码，Java 通过geotools读取tiff，其实通过深入研究发现，原来幕后的大佬竟然是imageio-ext中的TiffImageReader，

imageio做为Java开发的人员肯定都知道，而ImageIO-ext是imageio的扩展类，我们可以到github上看到它的源码，这是一个非常强大的库，对于Java处理各种栅格数据的读写非常有帮助！

借助这篇文章，我们需要先了解Tiff文件的具体结构，可以参考这篇文章，TIFF文件结构详解 https://blog.csdn.net/oYinHeZhiGuang/article/details/121710467 讲的很好！

下面我们来看下imageio-ext中的tiff读取代码，主要类TiffImageReader，我们来看下Java程序是如何读取tiff文件的。

构造方法：

public TIFFImageReader(ImageReaderSpi originatingProvider) {

        super(originatingProvider);

 }

这个类需要通过一个ImageReaderSpi来实例化，其实这种SPI的设计模式，Java的很多开源项目都在用到，这里我们通过TIFFImageReaderSpi这个类即可。

其次设置文件的路径，以及其它一些参数，通过该类的如下方法：

public void setInput(Object input,

                         boolean seekForwardOnly,

                         boolean ignoreMetadata)

这个方法，里面有input就是需要读取的文件，seekForwardOnly设置为true表示：只能从这个输入源按升序读取图像和元数据。ignoreMetadata设置为true表示读取忽略元数据

接下来就是对tiff元数据的读取，具体参见getImageMetadata(int imageIndex)这个方法：

public IIOMetadata getImageMetadata(int imageIndex) throws IIOException {

        seekToImage(imageIndex, true);

        TIFFImageMetadata im =

            new TIFFImageMetadata(imageMetadata.getRootIFD().getTagSetList());

        Node root =

            imageMetadata.getAsTree(TIFFImageMetadata.nativeMetadataFormatName);

        im.setFromTree(TIFFImageMetadata.nativeMetadataFormatName, root);

        if (noData != null) {

            im.setNoData(new double[] {noData, noData});

        }

        if (scales != null && offsets != null) {

            im.setScales(scales);

            im.setOffsets(offsets);

        }

        return im;

    }

其中的seekToImage(imageIndex, true)为最主要的逻辑处理，这个方法中，第一个参数，imageIndex为tiff多页中的第几个，第二参数设置标示该tiff页是否已经被解析过

 private void seekToImage(int imageIndex, boolean optimized) throws IIOException {

        checkIndex(imageIndex);

        // TODO we should do this initialization just once!!!

        int index = locateImage(imageIndex);

        if (index != imageIndex) {

            throw new IndexOutOfBoundsException("imageIndex out of bounds!");

        }

        final Integer i= Integer.valueOf(index);

        //optimized branch

        if(!optimized){

            readMetadata();

            initializeFromMetadata();

            return;

        }

        // in case we have cache the info for this page

        if(pagesInfo.containsKey(i)){

            // initialize from cachedinfo only if needed

            // TODO Improve

            if(imageMetadata == null || !initialized) {// this means the curindex has changed

                final PageInfo info = pagesInfo.get(i);

                final TIFFImageMetadata metadata = info.imageMetadata.get();

                if (metadata != null) {

                    initializeFromCachedInfo(info, metadata);

                    return;

                }

                pagesInfo.put(i,null);

            }

        }

        readMetadata();

        initializeFromMetadata();

    }

这个方法当中，第一次加载tiff，通过readMetadata()和initializeFromMetadata()将tiff的元信息缓存起来，方便后面再次读取。

读取过程

主要是要结合Tiff的格式进行理解，大体主要是解析tiff头，然后获取到IFD(tiff的图像目录信息)，然后再依次去解析每个目录的具体内容，代码就不再这里罗列了。

这里主要说下，解析目录信息是获取tiff的元信息的过程，通常是解析每个tag的信息，解析代码TIFFIFD类的initialize(ImageInputStream stream, boolean ignoreUnknownFields, final boolean isBTIFF)方法中

public void initialize(ImageInputStream stream,

            boolean ignoreUnknownFields, final boolean isBTIFF) throws IOException {

        removeTIFFFields();

        List tagSetList = getTagSetList();

        final long numEntries;

        if(isBTIFF)

            numEntries= stream.readLong();

        else

            numEntries= stream.readUnsignedShort();

        for (int i = 0; i < numEntries; i++) {

            // Read tag number, value type, and value count.

            int tag = stream.readUnsignedShort();

            int type = stream.readUnsignedShort();

            int count;

            if(isBTIFF)

            {

                long count_=stream.readLong();

                count = (int)count_;

                if(count!=count_)

                    throw new IllegalArgumentException("unable to use long number of values");

            }

            else

                count = (int)stream.readUnsignedInt();

            // Get the associated TIFFTag.

            TIFFTag tiffTag = getTag(tag, tagSetList);

            // Ignore unknown fields.

            if(ignoreUnknownFields && tiffTag == null) {

                // Skip the value/offset so as to leave the stream

                // position at the start of the next IFD entry.

                if(isBTIFF)

                    stream.skipBytes(8);

                else

                    stream.skipBytes(4);

                // XXX Warning message ...

                // Continue with the next IFD entry.

                continue;

            }

            long nextTagOffset;

            if(isBTIFF){

                nextTagOffset = stream.getStreamPosition() + 8;

                int sizeOfType = TIFFTag.getSizeOfType(type);

                if (count*sizeOfType > 8) {

                    long value = stream.readLong();

                    stream.seek(value);

                 }

            }

            else{

                nextTagOffset = stream.getStreamPosition() + 4;

                int sizeOfType = TIFFTag.getSizeOfType(type);

                 if (count*sizeOfType > 4) {

                    long value = stream.readUnsignedInt();

                    stream.seek(value);

                 }

            }

            if (tag == BaselineTIFFTagSet.TAG_STRIP_BYTE_COUNTS ||

                tag == BaselineTIFFTagSet.TAG_TILE_BYTE_COUNTS ||

                tag == BaselineTIFFTagSet.TAG_JPEG_INTERCHANGE_FORMAT_LENGTH) {

                this.stripOrTileByteCountsPosition =

                    stream.getStreamPosition();

                if (LAZY_LOADING) {

                    type = type == TIFFTag.TIFF_LONG ? TIFFTag.TIFF_LAZY_LONG : TIFFTag.TIFF_LAZY_LONG8;

                }

            } else if (tag == BaselineTIFFTagSet.TAG_STRIP_OFFSETS ||

                       tag == BaselineTIFFTagSet.TAG_TILE_OFFSETS ||

                       tag == BaselineTIFFTagSet.TAG_JPEG_INTERCHANGE_FORMAT) {

                this.stripOrTileOffsetsPosition =

                    stream.getStreamPosition();

                if (LAZY_LOADING) {

                    type = type == TIFFTag.TIFF_LONG ? TIFFTag.TIFF_LAZY_LONG : TIFFTag.TIFF_LAZY_LONG8;

                }

            }

            Object obj = null;

            try {

                switch (type) {

                case TIFFTag.TIFF_BYTE:

                case TIFFTag.TIFF_SBYTE:

                case TIFFTag.TIFF_UNDEFINED:

                case TIFFTag.TIFF_ASCII:

                    byte[] bvalues = new byte[count];

                    stream.readFully(bvalues, 0, count);

                    if (type == TIFFTag.TIFF_ASCII) {

                        // Can be multiple strings

                        final List<String> v = new ArrayList<String>();

                        boolean inString = false;

                        int prevIndex = 0;

                        for (int index = 0; index <= count; index++) {

                            if (index < count && bvalues[index] != 0) {

                                if (!inString) {

                                // start of string

                                    prevIndex = index;

                                    inString = true;

                                }

                            } else { // null or special case at end of string

                                if (inString) {

                                // end of string

                                    final String s = new String(bvalues, prevIndex,index - prevIndex);

                                    v.add(s);

                                    inString = false;

                                }

                            }

                        }

                        count = v.size();

                        String[] strings;

                        if(count != 0) {

                            strings = new String[count];

                            for (int c = 0 ; c < count; c++) {

                                strings[c] = v.get(c);

                            }

                        } else {

                            // This case has been observed when the value of

                            // 'count' recorded in the field is non-zero but

                            // the value portion contains all nulls.

                            count = 1;

                            strings = new String[] {""};

                        }

                        obj = strings;

                    } else {

                        obj = bvalues;

                    }

                    break;

                case TIFFTag.TIFF_SHORT:

                    char[] cvalues = new char[count];

                    for (int j = 0; j < count; j++) {

                        cvalues[j] = (char)(stream.readUnsignedShort());

                    }

                    obj = cvalues;

                    break;

                case TIFFTag.TIFF_LONG:

                case TIFFTag.TIFF_IFD_POINTER:

                    long[] lvalues = new long[count];

                    for (int j = 0; j < count; j++) {

                        lvalues[j] = stream.readUnsignedInt();

                    }

                    obj = lvalues;

                    break;

                case TIFFTag.TIFF_RATIONAL:

                    long[][] llvalues = new long[count][2];

                    for (int j = 0; j < count; j++) {

                        llvalues[j][0] = stream.readUnsignedInt();

                        llvalues[j][1] = stream.readUnsignedInt();

                    }

                    obj = llvalues;

                    break;

                case TIFFTag.TIFF_SSHORT:

                    short[] svalues = new short[count];

                    for (int j = 0; j < count; j++) {

                        svalues[j] = stream.readShort();

                    }

                    obj = svalues;

                    break;

                case TIFFTag.TIFF_SLONG:

                    int[] ivalues = new int[count];

                    for (int j = 0; j < count; j++) {

                        ivalues[j] = stream.readInt();

                    }

                    obj = ivalues;

                    break;

                case TIFFTag.TIFF_SRATIONAL:

                    int[][] iivalues = new int[count][2];

                    for (int j = 0; j < count; j++) {

                        iivalues[j][0] = stream.readInt();

                        iivalues[j][1] = stream.readInt();

                    }

                    obj = iivalues;

                    break;

                case TIFFTag.TIFF_FLOAT:

                    float[] fvalues = new float[count];

                    for (int j = 0; j < count; j++) {

                        fvalues[j] = stream.readFloat();

                    }

                    obj = fvalues;

                    break;

                case TIFFTag.TIFF_DOUBLE:

                    double[] dvalues = new double[count];

                    for (int j = 0; j < count; j++) {

                        dvalues[j] = stream.readDouble();

                    }

                    obj = dvalues;

                    break;

                case TIFFTag.TIFF_LONG8:

                case TIFFTag.TIFF_SLONG8:

                case TIFFTag.TIFF_IFD8:

                    long[] lBvalues = new long[count];

                    for (int j = 0; j < count; j++) {

                        lBvalues[j] = stream.readLong();

                    }

                    obj = lBvalues;

                    break;

                case TIFFTag.TIFF_LAZY_LONG8:

                case TIFFTag.TIFF_LAZY_LONG:

                    obj = new TIFFLazyData(stream, type, count);

                    break;

                default:

                    // XXX Warning

                    break;

                }

            } catch(EOFException eofe) {

                // The TIFF 6.0 fields have tag numbers less than or equal

                // to 532 (ReferenceBlackWhite) or equal to 33432 (Copyright).

                // If there is an error reading a baseline tag, then re-throw

                // the exception and fail; otherwise continue with the next

                // field.

                if(BaselineTIFFTagSet.getInstance().getTag(tag) == null) {

                    throw eofe;

                }

            }

            if (tiffTag == null) {

                // XXX Warning: unknown tag

            } else if (!tiffTag.isDataTypeOK(type)) {

                // XXX Warning: bad data type

            } else if (tiffTag.isIFDPointer() && obj != null) {

                stream.mark();

                stream.seek(((long[])obj)[0]);

                List tagSets = new ArrayList(1);

                tagSets.add(tiffTag.getTagSet());

                TIFFIFD subIFD = new TIFFIFD(tagSets);

                // XXX Use same ignore policy for sub-IFD fields?

                subIFD.initialize(stream, ignoreUnknownFields);

                obj = subIFD;

                stream.reset();

            }

            if (tiffTag == null) {

                tiffTag = new TIFFTag(null, tag, 1 << type, null);

            }

            // Add the field if its contents have been initialized which

            // will not be the case if an EOF was ignored above.

            if(obj != null) {

                TIFFField f = new TIFFField(tiffTag, type, count, obj);

                addTIFFField(f);

            }

            stream.seek(nextTagOffset);

        }

        this.lastPosition = stream.getStreamPosition();

    }

Tiff常用的Tag标签类有BaseLineTiffTagSet、FaxTiffTagSet、GeoTiffTagSet、EXIFPTiffTagSet、PrivateTIFFTagSet等。

其中的GeoTiffTagSet用于geotiff的额外存储信息，在这里说明下，Geotiff是Tiff格式对Gis数据的一种存储支持，而PrivateTIFFTagSet是对gdal的支持，增加了NODATA、MEATADATA的信息。

对于文章开头提的关于bandOffsets.length is wrong!，主要原因出现在getImageTypes(int imageIndex)这个方法的下面这个实现中。

ImageTypeSpecifier itsRaw =

            TIFFDecompressor.getRawImageTypeSpecifier

                (photometricInterpretation,

                 compression,

                 samplesPerPixel,

                 bitsPerSample,

                 sampleFormat,

                 extraSamples,

                 colorMap);

最终我们在ImageTypeSpecifier这个类的Interleaved(ColorSpace colorSpace,int[] bandOffsets,int dataType,boolean hasAlpha,boolean isAlphaPremultiplied) 方法中发现问题。

public Interleaved(ColorSpace colorSpace,

                           int[] bandOffsets,

                           int dataType,

                           boolean hasAlpha,

                           boolean isAlphaPremultiplied) {

            if (colorSpace == null) {

                throw new IllegalArgumentException("colorSpace == null!");

            }

            if (bandOffsets == null) {

                throw new IllegalArgumentException("bandOffsets == null!");

            }

            int numBands = colorSpace.getNumComponents() +

                (hasAlpha ? 1 : 0);

            if (bandOffsets.length != numBands) {

                throw new IllegalArgumentException

                    ("bandOffsets.length is wrong!");

            }

我们发现只有当我们的图像偏移数量和我们的通道数不一致的时候，就会报这个错误！

总结

通过研究这个问题，基本上梳理了Java基于ImageIO-ext读取tiff的过程，基本跟tiff的数据结构对应起来。