Snoopy.class.php使用手册
Snoopy - the PHP net client v1.2.4
Snoopy是一个php类,用来模拟浏览器的功能,可以获取网页内容,发送表单。
Snoopy的特点:
1、抓取网页的内容 fetch
2、抓取网页的文本内容 (去除HTML标签) fetchtext
3、抓取网页的链接,表单 fetchlinks fetchform
4、支持代理主机
5、支持基本的用户名/密码验证
6、支持设置 user_agent, referer(来路), cookies 和 header content(头文件)
7、支持浏览器重定向,并能控制重定向深度
8、能把网页中的链接扩展成高质量的url(默认)
9、提交数据并且获取返回值
10、支持跟踪HTML框架
11、支持重定向的时候传递cookies
要求php4以上就可以了,由于本身是php一个类,无需扩支持,服务器不支持curl时候的最好选择。
概要方法:
include "Snoopy.class.php";
$snoopy = new Snoopy; $snoopy->fetchtext("http://www.php.net/");
print $snoopy->results; $snoopy->fetchlinks("http://www.phpbuilder.com/");
print $snoopy->results; $submit_url = "http://lnk.ispi.net/texis/scripts/msearch/netsearch.html"; $submit_vars["q"] = "amiga";
$submit_vars["submit"] = "Search!";
$submit_vars["searchhost"] = "Altavista"; $snoopy->submit($submit_url,$submit_vars);
print $snoopy->results; $snoopy->maxframes=5;
$snoopy->fetch("http://www.ispi.net/");
echo "<PRE>\n";
echo htmlentities($snoopy->results[0]);
echo htmlentities($snoopy->results[1]);
echo htmlentities($snoopy->results[2]);
echo "</PRE>\n"; $snoopy->fetchform("http://www.altavista.com");
print $snoopy->results;
类方法说明:
fetch($URI)
----------- This is the method used for fetching the contents of a web page.
$URI is the fully qualified URL of the page to fetch.
The results of the fetch are stored in $this->results.
If you are fetching frames, then $this->results
contains each frame fetched in an array. fetchtext($URI)
--------------- This behaves exactly like fetch() except that it only returns
the text from the page, stripping out html tags and other
irrelevant data. fetchform($URI)
--------------- This behaves exactly like fetch() except that it only returns
the form elements from the page, stripping out html tags and other
irrelevant data. fetchlinks($URI)
---------------- This behaves exactly like fetch() except that it only returns
the links from the page. By default, relative links are
converted to their fully qualified URL form. submit($URI,$formvars)
---------------------- This submits a form to the specified $URI. $formvars is an
array of the form variables to pass. submittext($URI,$formvars)
-------------------------- This behaves exactly like submit() except that it only returns
the text from the page, stripping out html tags and other
irrelevant data. submitlinks($URI)
---------------- This behaves exactly like submit() except that it only returns
the links from the page. By default, relative links are
converted to their fully qualified URL form.
类 VARIABLES: (default value in parenthesis)
$host the host to connect to
$port the port to connect to
$proxy_host the proxy host to use, if any
$proxy_port the proxy port to use, if any
$agent the user agent to masqerade as (Snoopy v0.1)
$referer referer information to pass, if any
$cookies cookies to pass if any
$rawheaders other header info to pass, if any
$maxredirs maximum redirects to allow. 0=none allowed. (5)
$offsiteok whether or not to allow redirects off-site. (true)
$expandlinks whether or not to expand links to fully qualified URLs (true)
$user authentication username, if any
$pass authentication password, if any
$accept http accept types (image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, )
$error where errors are sent, if any
$response_code responde code returned from server
$headers headers returned from server
$maxlength max return data length
$read_timeout timeout on read operations (requires PHP 4 Beta 4+)
set to 0 to disallow timeouts
$timed_out true if a read operation timed out (requires PHP 4 Beta 4+)
$maxframes number of frames we will follow
$status http status of fetch
$temp_dir temp directory that the webserver can write to. (/tmp)
$curl_path system path to cURL binary, set to false if none
EXample:
Example: fetch a web page and display the return headers and
the contents of the page (html-escaped): include "Snoopy.class.php";
$snoopy = new Snoopy; $snoopy->user = "joe";
$snoopy->pass = "bloe"; if($snoopy->fetch("http://www.slashdot.org/"))
{
echo "response code: ".$snoopy->response_code."<br>\n";
while(list($key,$val) = each($snoopy->headers))
echo $key.": ".$val."<br>\n";
echo "<p>\n"; echo "<PRE>".htmlspecialchars($snoopy->results)."</PRE>\n";
}
else
echo "error fetching document: ".$snoopy->error."\n"; Example: submit a form and print out the result headers
and html-escaped page: include "Snoopy.class.php";
$snoopy = new Snoopy; $submit_url = "http://lnk.ispi.net/texis/scripts/msearch/netsearch.html"; $submit_vars["q"] = "amiga";
$submit_vars["submit"] = "Search!";
$submit_vars["searchhost"] = "Altavista"; if($snoopy->submit($submit_url,$submit_vars))
{
while(list($key,$val) = each($snoopy->headers))
echo $key.": ".$val."<br>\n";
echo "<p>\n"; echo "<PRE>".htmlspecialchars($snoopy->results)."</PRE>\n";
}
else
echo "error fetching document: ".$snoopy->error."\n"; Example: showing functionality of all the variables: include "Snoopy.class.php";
$snoopy = new Snoopy; $snoopy->proxy_host = "my.proxy.host";
$snoopy->proxy_port = "8080"; $snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)";
$snoopy->referer = "http://www.microsnot.com/"; $snoopy->cookies["SessionID"] = 238472834723489l;
$snoopy->cookies["favoriteColor"] = "RED"; $snoopy->rawheaders["Pragma"] = "no-cache"; $snoopy->maxredirs = 2;
$snoopy->offsiteok = false;
$snoopy->expandlinks = false; $snoopy->user = "joe";
$snoopy->pass = "bloe"; if($snoopy->fetchtext("http://www.phpbuilder.com"))
{
while(list($key,$val) = each($snoopy->headers))
echo $key.": ".$val."<br>\n";
echo "<p>\n"; echo "<PRE>".htmlspecialchars($snoopy->results)."</PRE>\n";
}
else
echo "error fetching document: ".$snoopy->error."\n"; Example: fetched framed content and display the results include "Snoopy.class.php";
$snoopy = new Snoopy; $snoopy->maxframes = 5; if($snoopy->fetch("http://www.ispi.net/"))
{
echo "<PRE>".htmlspecialchars($snoopy->results[0])."</PRE>\n";
echo "<PRE>".htmlspecialchars($snoopy->results[1])."</PRE>\n";
echo "<PRE>".htmlspecialchars($snoopy->results[2])."</PRE>\n";
}
else
echo "error fetching document: ".$snoopy->error."\n";
Snoopy.class.php使用手册的更多相关文章
- Snoopy.class.php介绍
Snoopy是一个开源的模拟抓取工具,找到一个不错的介绍网页 记录一下: php开源采集类Snoopy.class.php功能使用介绍与下载地址 Snoopy.class.php使用手册 还有一个介绍 ...
- simple_html_dom配合snoopy使用
https://github.com/samacs/simple_html_dom Snoopy的特点是“大”和“全”,一个fetch什么都采到了,可以作为采集的第一步.接下来就需要用simple_h ...
- FREERTOS 手册阅读笔记
郑重声明,版权所有! 转载需说明. FREERTOS堆栈大小的单位是word,不是byte. 根据处理器架构优化系统的任务优先级不能超过32,If the architecture optimized ...
- JS魔法堂:不完全国际化&本地化手册 之 理論篇
前言 最近加入到新项目组负责前端技术预研和选型,其中涉及到一个熟悉又陌生的需求--国际化&本地化.熟悉的是之前的项目也玩过,陌生的是之前的实现仅仅停留在"有"的阶段而已. ...
- 转职成为TypeScript程序员的参考手册
写在前面 作者并没有任何可以作为背书的履历来证明自己写作这份手册的分量. 其内容大都来自于TypeScript官方资料或者搜索引擎获得,期间掺杂少量作者的私见,并会标明. 大部分内容来自于http:/ ...
- Redis学习手册(目录)
为什么自己当初要选择Redis作为数据存储解决方案中的一员呢?现在能想到的原因主要有三.其一,Redis不仅性能高效,而且完全免费.其二,是基于C/C++开发的服务器,这里应该有一定的感情因素吧.最后 ...
- JS魔法堂:不完全国际化&本地化手册 之 实战篇
前言 最近加入到新项目组负责前端技术预研和选型,其中涉及到一个熟悉又陌生的需求--国际化&本地化.熟悉的是之前的项目也玩过,陌生的是之前的实现仅仅停留在"有"的阶段而已. ...
- Windows API 函数列表 附帮助手册
所有Windows API函数列表,为了方便查询,也为了大家查找,所以整理一下贡献出来了. 帮助手册:700多个Windows API的函数手册 免费下载 API之网络函数 API之消息函数 API之 ...
- linux命令在线手册
下面几个网址有一些 Linux命令的在线手册,而且还是中文的,还可以搜索.非常方便 Linux命令手册 Linux命令大全 Linux中文man在线手册 每日一linux命令
随机推荐
- MFC消息反射机制
消息反射机制要解决什么问题呢? 消息反射机制主要是为了控件而实现的.每当控件需要某些资讯(比如,绘制自身背景的画刷,显示字体的颜色等等)时,都会频繁地向其父窗口发送通告消息(notification ...
- Mybatis源码学习之日志(五)
简述 在Java开发中常用的日志框架有Log4j.Log4j2.Apache Commons Log.java.util.logging.slf4j等,这些工具对外的接口并不相同.为了统一这些工具的接 ...
- CSS子元素在父元素中水平垂直居中的几种方法
1. 水平居中(margin: auto;)子父元素宽度固定,子元素上设置 margin: auto; 子元素不能设置浮动,否则居中失效. #div1{ width: 300px; height: 3 ...
- Leetcode题目79.单词搜索(回溯+DFS-中等)
题目描述: 给定一个二维网格和一个单词,找出该单词是否存在于网格中. 单词必须按照字母顺序,通过相邻的单元格内的字母构成,其中“相邻”单元格是那些水平相邻或垂直相邻的单元格.同一个单元格内的字母不允许 ...
- 数据库 | SQL 诊断优化套路包,套路用的对,速度升百倍
本文出自头条号老王谈运维,转载请说明出处. 前言 在DBA的日常工作中,调整个别性能较差的SQL语句是一项富有挑战性的工作.面对慢SQL,一些DBA会心烦,会沮丧,会束手无措,也会沉着冷静.斗智斗勇! ...
- druid 参数配置详解
druid 参数配置详解 */--> druid 参数配置详解 Table of Contents 1. 初始化连接 2. 参数配置及说明 3. 注意事项 3.1. 底层连接 3.2. 空闲检查 ...
- Android:JNA实践(附Demo)
一.JNA和JNI的对比 1.JNI的调用流程 Android应用开发中要实现Java和C,C++层交互时,想必首先想到的是JNI,但是JNI的使用过程十分繁琐,需要自己再封装一层JNI接口进行转 ...
- selenium操作cookie
1,登录网页,使用webdriver的get_cookies获取cookie,并保存json文件 2,读取json文件,遍历添加网站使用的每一个cookies的name,value. 使用add_co ...
- Laravel的Nginx重写规则--让路由支持末尾加斜线
默认laravel路由末尾不能加/,如果加了斜线会报404 要想支持url末尾的斜线需要在public/index.php加入如下代码: $_SERVER['REQUEST_URI'] = trim( ...
- 安装python3.6并使用virtualenvwrapper管理虚环境
1.安装python3.6.5依赖环境 注:python3.7.4需要安装:yum install libffi-devel -y yum install gcc patch libffi-devel ...