php下载远程图片方法总结（curl手动解析header）curl跳转问题解决

常用方法一般有：、

file_get_contents

file_put_contents

readfile($file) //效率很高。

一般代码：

/**

 * 抓取远程图片

 *

 * @param string $url 远程图片路径

 * @param string $filename 本地存储文件名

 */

function grabImage($url, $filename = '') {

    if($url == '') {

        return false; //如果 $url 为空则返回 false;

    }

    $ext_name = strrchr($url, '.'); //获取图片的扩展名

    if($ext_name != '.gif' && $ext_name != '.jpg' && $ext_name != '.bmp' && $ext_name != '.png') {

        return false; //格式不在允许的范围

    }

    if($filename == '') {

        $filename = time().$ext_name; //以时间戳另起名

    }

    //开始捕获

    ob_start();

    readfile($url);

    $img_data = ob_get_contents();

    ob_end_clean();

    $size = strlen($img_data);

    $local_file = fopen($filename , 'a');

    fwrite($local_file, $img_data);

    fclose($local_file);

    return $filename;

}

我一个网址：http://www.njphp.cn/uc_server/avatar.php?uid=1&size=middle 测试一下，发现什么都没有输出。

为什么没有输出：因为grabImage函数检测url发现扩展名不是图片格式就返回false了。我们可以不检测后缀名就可以了。

让我们看看http://www.njphp.cn/uc_server/avatar.php?uid=1&size=middle

这个网址，这个网址不是直接输出图片流，而是重定向了。

在浏览器测试发现

Status Code:

301 Moved Permanently

返回的301状态吗，永久性重定向到另一个页面，也就是图片的真实地址：

http://www.njphp.cn/uc_server/data/avatar/000/00/00/01_avatar_middle.jpg。

上面的代码无法处理重定向这种形式。

上面的函数有几个缺点：

1.不能自动识别图片后缀名（很多图片的url并不指向一个静态图片地址，而是直接将图片流输出到客户端）

2.不支持图片url的302跳转（只是return false罢了，readfile和file_get_contents是读取最终的文件的，即支持重定向）

这个函数并不符合本人项目的需求，于是花了点时间自己写了一个下载函数，此函数支持：

1.静态图片下载

2.服务端直接输出图片流下载

3.服务端使用302跳转到真实图片地址的下载（可限定跳转次数）

函数代码如下：

/**

     * 下载远程图片

     * @param string $url 图片的绝对url

     * @param string $filepath 文件的完整路径（包括目录，不包括后缀名,例如/www/images/test） ，此函数会自动根据图片url和http头信息确定图片的后缀名

     * @return mixed 下载成功返回一个描述图片信息的数组，下载失败则返回false

     */

    function downloadImage($url, $filepath) {

        //服务器返回的头信息

        $responseHeaders = array();

        //原始图片名

        $originalfilename = '';

        //图片的后缀名

        $ext = '';

        $ch = curl_init($url);

        //设置curl_exec返回的值包含Http头

        curl_setopt($ch, CURLOPT_HEADER, 1);

        //设置curl_exec返回的值包含Http内容

        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

        //设置抓取跳转（http 301，302）后的页面

        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

        //设置最多的HTTP重定向的数量

        curl_setopt($ch, CURLOPT_MAXREDIRS, 2);

        //服务器返回的数据（包括http头信息和内容）

        $html = curl_exec($ch);

        //获取此次抓取的相关信息

        $httpinfo = curl_getinfo($ch);

        curl_close($ch);

        if ($html !== false) {

            //分离response的header和body，由于服务器可能使用了302跳转，所以此处需要将字符串分离为 2+跳转次数 个子串

            $httpArr = explode("\r\n\r\n", $html, 2 + $httpinfo['redirect_count']); //最后一个参数可选。规定所返回的数组元素的最大数目。

            //倒数第二段是服务器最后一次response的http头

            $header = $httpArr[count($httpArr) - 2];

            //倒数第一段是服务器最后一次response的内容

            $body = $httpArr[count($httpArr) - 1];

            $header.="\r\n";

            //获取最后一次response的header信息

            preg_match_all('/([a-z0-9-_]+):\s*([^\r\n]+)\r\n/i', $header, $matches);

            if (!empty($matches) && count($matches) == 3 && !empty($matches[1]) && !empty($matches[1])) {

                for ($i = 0; $i < count($matches[1]); $i++) {

                    if (array_key_exists($i, $matches[2])) {

                        $responseHeaders[$matches[1][$i]] = $matches[2][$i];

                    }

                }

            }

            //获取图片后缀名

            if (0 < preg_match('{(?:[^\/\\\\]+)\.(jpg|jpeg|gif|png|bmp)$}i', $url, $matches)) {

                $originalfilename = $matches[0];

                $ext = $matches[1];

            } else {

                if (array_key_exists('Content-Type', $responseHeaders)) {

                    if (0 < preg_match('{image/(\w+)}i', $responseHeaders['Content-Type'], $extmatches)) {

                        $ext = $extmatches[1];

                    }

                }

            }

            //保存文件

            if (!empty($ext)) {

                $filepath .= ".$ext";

                //如果目录不存在，则先要创建目录

                CFiles::createDirectory(dirname($filepath));

                $local_file = fopen($filepath, 'w');

                if (false !== $local_file) {

                    if (false !== fwrite($local_file, $body)) {

                        fclose($local_file);

                        $sizeinfo = getimagesize($filepath);

                        return array('filepath' => realpath($filepath), 'width' => $sizeinfo[0], 'height' => $sizeinfo[1], 'orginalfilename' => $originalfilename, 'filename' => pathinfo($filepath, PATHINFO_BASENAME));

                    }

                }

            }

        }

        return false;

    }

解决跳转问题设置:

//设置抓取跳转（http 301，302）后的页面

        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

        //设置最多的HTTP重定向的数量

        curl_setopt($ch, CURLOPT_MAXREDIRS, 2);

值得注意的是
我们获取了所有跳转的页面的header：
$httpinfo=curl_getinfo($ch);
如何得到最后一个页面的header,用到了explode函数。
explode(分隔符,字符串,limit);

//最后一个参数可选。规定所返回的数组元素的最大数目。假设我们请求的页面有2次跳转
a->b->c

print_r ($httpinfo)结果类似：

Array

(

    [url] => http://c.php

    [content_type] => text/html

    [http_code] => 200

    [header_size] => 602

    [request_size] => 230

    [filetime] => -1

    [ssl_verify_result] => 0

    [redirect_count] => 2

    [total_time] => 0.281

    [namelookup_time] => 0

    [connect_time] => 0

    [pretransfer_time] => 0

    [size_upload] => 0

    [size_download] => 0

    [speed_download] => 0

    [speed_upload] => 0

    [download_content_length] => -1

    [upload_content_length] => 0

    [starttransfer_time] => 0.047

    [redirect_time] => 0.234

    [certinfo] => Array

        (

        )

    [primary_ip] => ::1

    [primary_port] => 80

    [local_ip] => ::1

    [local_port] => 50768

    [redirect_url] =>

)

里面有一个direct_count记录跳转的次数。

我们可以利用上面两点从$html=curl_exec($ch);提取最后一个页面的header信息。

     //分离response的header和body，由于服务器可能使用了302跳转，所以此处需要将字符串分离为 2+跳转次数 个子串

            $httpArr = explode("\r\n\r\n", $html, 2 + $httpinfo['redirect_count']);

            //倒数第二段是服务器最后一次response的http头

            $header = $httpArr[count($httpArr) - 2];

            //倒数第一段是服务器最后一次response的内容

            $body = $httpArr[count($httpArr) - 1];

            $header.="\r\n";

       print $header;

每个header信息最后都有一个\r\n\r\n，如：

reponse header:

HTTP/1.0 200 OK

Date: Wed, 07 Aug 2013 08:15:21 GMT

Server: Microsoft-IIS/6.0

X-Powered-By: ASP.NET

X-Powered-By: PHP/5.2.17

Content-Type: text/html; charset=UTF-8

Content-Encoding: gzip

Vary: Accept-Encoding

X-Cache: MISS from tesaasst.abc.com

X-Cache-Lookup: MISS from tesaasst.abc.com:80

Via: 1.0 tesaasst.abc.com (squid/3.0.STABLE20)

Connection: close

有一个换行符

a->b->c会产生如下形式:

a header \r\n\r\n bheader \r\n \r\n cheader \r\n\r\n cbody.

分成了

2 + $httpinfo['redirect_count']片段。
最后一个为body，倒数第二个为header。

注意，我们想要打印最后一个页面的header，还要加上

$header.="\r\n";
符合http 规范。

CURLOPT_FOLLOWLOCATION

　　TRUE to follow any "Location: " header that the server sends as part of the HTTP header (note this is recursive, PHP will follow as many "Location: " headers that it is sent, unless CURLOPT_MAXREDIRS is set).默认true

CURLOPT_MAXREDIRS

　　The maximum amount of HTTP redirections to follow. Use this option alongside CURLOPT_FOLLOWLOCATION.

CURLOPT_AUTOREFERER

　　TRUE to automatically set the Referer: field in requests where it follows a Location: redirect.

CURLOPT_AUTOREFERER ：curl 会自动添加 Referer header 在每一个跳转链接，也就是一跟到底。默认true.

参考了：

http://www.cnblogs.com/helloprogram/archive/2012/03/25/2416492.html

http://www.groad.net/bbs/read.php?tid-4455.html

php下载远程图片方法总结（curl手动解析header）curl跳转问题解决的更多相关文章

php 下载远程图片的几种方法(转)
1.获取远程文件大小及信息的函数 function getFileSize($url){ $url = parse_url($url); if($fp = @fso ...
织梦dedecms 无法下载远程图片 fsockopen函数被禁用的解决方法
在linux服务器上fsockopen()函数被无情的禁用了(这其实是出于安全考虑,可以理解),下载远程图片的功能就没有办法使用了.找了一些资料之后,找到了解决方法,就是用stream_socket_ ...
php下载远程图片到本地
在使用 PHP 做简单的爬虫的时候,我们经常会遇到需要下载远程图片的需求,所以下面来简单实现这个需求1:使用curl 比如我们有下面这两张图片: $images = [ 'https://img.al ...
php下载远程文件方法~
直接上代码: getFile("http://easyread.ph.126.net/N8gDl6ayo5wLgKbgT21NZQ==/7917056565549478184.jpg&quo ...
python多线程批量下载远程图片
python多线程使用场景:多线程采集, 以及性能测试等 . 数据库驱动类-简单封装下 mysqlDriver.py #!/usr/bin/python3 #-*- coding: utf-8 -*- ...
PHP 下载远程图片
方法一:file_get_contents /**-- 下载远程文件 --**/ function down_img($url){ set_time_limit(60); if($url==" ...
ASP.NET下载远程图片保存到本地的方法、保存抓取远程图片
以下介绍两种方法:1.利用WebRequest,WebResponse 类 WebRequest wreq=WebRequest.Create("http://www.xueit.com/e ...
用ASP.NET实现下载远程图片保存到本地的方法保存抓取远程图片的方法
以下介绍两种方法:1.利用WebRequest,WebResponse 类WebRequest wreq=WebRequest.Create("http://files.jb51.net/f ...
PHP下载远程图片的几种方法总结
1. 使用file_get_contents function dlfile($file_url, $save_to) { $content = file_get_contents($file_url ...

随机推荐

JBoss Jopr
http://rhq.jboss.org/ https://issues.jboss.org/browse/JBPAPP6-947 挺好的网站: http://outofmemory.cn/code- ...
The process "E:\Qt\4.8.5\bin\qmake.exe" exited with code 2.（不能包含中文路径，qmake够弱智的）
打开某个项目的时候,编译出现类似的错误 21:46:44: The process "E:\Qt\4.8.5\bin\qmake.exe" exited with code 2. ...
ubuntu安装hadoop 若干问题的解决
问题1:安装openssh-server失败原因: 下列软件包有未满足的依赖关系: openssh-server : 依赖: openssh-client (= 1:5.9p1-5ubuntu1) ...
看java源代码
不会看JDK源代码,相当于没学过Java. 网上不容易找到一篇帮助我解决了如何在Eclipse下查看JDK源代码的文章. 核心提示:在Eclipse中查看JDK类库的源代码!!! 设置: 1.点 w ...
Sumsets（完全背包）
Sumsets Time Limit: 2000MS Memory Limit: 200000K Total Submissions: 15045 Accepted: 5997 Descrip ...
搜狗2015校园招聘javaproject师面经
面试时看到了我的笔试题.真是慘不忍睹啊. . 1. 问回去有没有研究一下笔试题木有,果断后面悲剧了 2. 解释一下笔试的一道选择题: 下列哪种操作可能带来死锁? A: lock(m1) lock(m ...
令人作呕的OpenSSL
在OpenSSL心脏出血之后,我相信非常多人都出了血,而且流了泪...网上瞬间出现了大量吐嘈OpenSSL的文章或段子,仿佛内心的窝火一瞬间被释放了出来,跟着这场疯闹,我也吐一下嘈,以雪这些年被Ope ...
正则RegEXp
JavaScript RegExp 对象 RegExp 对象 RegExp 对象表示正则表达式,它是对字符串执行模式匹配的强大工具. 直接量语法 /pattern/attributes 创建 RegE ...
Ext JS学习第十六天事件机制event（一）
此文用来记录学习笔记: 休息了好几天,从今天开始继续保持更新,鞭策自己学习今天我们来说一说什么是事件,对于事件,相信你一定不陌生, 基本事件是什么?就类似于click.keypress.focus. ...
UVa---------10935（Throwing cards away I）
题目: Problem B: Throwing cards away I Given is an ordered deck of n cards numbered 1 to n with card 1 ...

php下载远程图片方法总结（curl手动解析header）curl跳转问题解决

php下载远程图片方法总结（curl手动解析header）curl跳转问题解决的更多相关文章

随机推荐

热门专题