1:简单的使用

import mechanize
# response = mechanize.urlopen("http://www.hao123.com/")
request = mechanize.Request("http://www.hao123.com/")
response = mechanize.urlopen(request)
print response.geturl()
print response.info()
# print response.read()

2:mechanize.urlretrieve

>>> import mechanize
>>> help(mechanize.urlretrieve)
Help on function urlretrieve in module mechanize._opener: urlretrieve(url, filename=None, reporthook=None, data=None, timeout=<object object>)
  • 参数 finename 指定了保存本地路径(如果参数未指定,urllib会生成一个临时文件保存数据。)
  • 参数 reporthook 是一个回调函数,当连接上服务器、以及相应的数据块传输完毕时会触发该回调,我们可以利用这个回调函数来显示当前的下载进度。
  • 参数 data 指 post 到服务器的数据,该方法返回一个包含两个元素的(filename, headers)元组,filename 表示保存到本地的路径,header 表示服务器的响应头
  • 参数 timeout 是设定的超时对象

reporthook(block_read,block_size,total_size)定义回调函数,block_size是每次读取的数据块的大小,block_read是每次读取的数据块个数,taotal_size是一一共读取的数据量,单位是byte。可以使用reporthook函数来显示读取进度。

简单的例子

def cbk(a, b, c):print a,b,c

url = 'http://www.hao123.com/'
local = 'd://hao.html'
mechanize.urlretrieve(url,local,cbk)

3:form表单登陆

br = mechanize.Browser()
br.set_handle_robots(False)
br.open("http://www.zhaopin.com/")
br.select_form(nr=0)
br['loginname'] = '**'自己注册一个账号密码就行了
br['password'] = '**'
r = br.submit()
print os.path.dirname(__file__)+'\login.html'
h = file(os.path.dirname(__file__)+'\login.html',"w")
rt = r.read()
h.write(rt)
h.close()

4:Browser

看完help的文档基本可以成神了

Help on class Browser in module mechanize._mechanize:

class Browser(mechanize._useragent.UserAgentBase)
| Browser-like class with support for history, forms and links.
|
| BrowserStateError is raised whenever the browser is in the wrong state to
| complete the requested operation - e.g., when .back() is called when the
| browser history is empty, or when .follow_link() is called when the current
| response does not contain HTML data.
|
| Public attributes:
|
| request: current request (mechanize.Request)
| form: currently selected form (see .select_form())
|
| Method resolution order:
| Browser
| mechanize._useragent.UserAgentBase
| mechanize._opener.OpenerDirector
| mechanize._urllib2_fork.OpenerDirector
|
| Methods defined here:
|
| __getattr__(self, name)
|
| __init__(self, factory=None, history=None, request_class=None)
| Only named arguments should be passed to this constructor.
|
| factory: object implementing the mechanize.Factory interface.
| history: object implementing the mechanize.History interface. Note
| this interface is still experimental and may change in future.
| request_class: Request class to use. Defaults to mechanize.Request
|
| The Factory and History objects passed in are 'owned' by the Browser,
| so they should not be shared across Browsers. In particular,
| factory.set_response() should not be called except by the owning
| Browser itself.
|
| Note that the supplied factory's request_class is overridden by this
| constructor, to ensure only one Request class is used.
|
| __str__(self)
|
| back(self, n=1)
| Go back n steps in history, and return response object.
|
| n: go back this number of steps (default 1 step)
|
| clear_history(self)
|
| click(self, *args, **kwds)
| See mechanize.HTMLForm.click for documentation.
|
| click_link(self, link=None, **kwds)
| Find a link and return a Request object for it.
|
| Arguments are as for .find_link(), except that a link may be supplied
| as the first argument.
|
| close(self)
|
| encoding(self)
|
| find_link(self, **kwds)
| Find a link in current page.
|
| Links are returned as mechanize.Link objects.
|
| # Return third link that .search()-matches the regexp "python"
| # (by ".search()-matches", I mean that the regular expression method
| # .search() is used, rather than .match()).
| find_link(text_regex=re.compile("python"), nr=2)
|
| # Return first http link in the current page that points to somewhere
| # on python.org whose link text (after tags have been removed) is
| # exactly "monty python".
| find_link(text="monty python",
| url_regex=re.compile("http.*python.org"))
|
| # Return first link with exactly three HTML attributes.
| find_link(predicate=lambda link: len(link.attrs) == 3)
|
| Links include anchors (<a>), image maps (<area>), and frames (<frame>,
| <iframe>).
|
| All arguments must be passed by keyword, not position. Zero or more
| arguments may be supplied. In order to find a link, all arguments
| supplied must match.
|
| If a matching link is not found, mechanize.LinkNotFoundError is raised.
|
| text: link text between link tags: e.g. <a href="blah">this bit</a> (as
| returned by pullparser.get_compressed_text(), ie. without tags but
| with opening tags "textified" as per the pullparser docs) must compare
| equal to this argument, if supplied
| text_regex: link text between tag (as defined above) must match the
| regular expression object or regular expression string passed as this
| argument, if supplied
| name, name_regex: as for text and text_regex, but matched against the
| name HTML attribute of the link tag
| url, url_regex: as for text and text_regex, but matched against the
| URL of the link tag (note this matches against Link.url, which is a
| relative or absolute URL according to how it was written in the HTML)
| tag: element name of opening tag, e.g. "a"
| predicate: a function taking a Link object as its single argument,
| returning a boolean result, indicating whether the links
| nr: matches the nth link that matches all other criteria (default 0)
|
| follow_link(self, link=None, **kwds)
| Find a link and .open() it.
|
| Arguments are as for .click_link().
|
| Return value is same as for Browser.open().
|
| forms(self)
| Return iterable over forms.
|
| The returned form objects implement the mechanize.HTMLForm interface.
|
| geturl(self)
| Get URL of current document.
|
| global_form(self)
| Return the global form object, or None if the factory implementation
| did not supply one.
|
| The "global" form object contains all controls that are not descendants
| of any FORM element.
|
| The returned form object implements the mechanize.HTMLForm interface.
|
| This is a separate method since the global form is not regarded as part
| of the sequence of forms in the document -- mostly for
| backwards-compatibility.
|
| links(self, **kwds)
| Return iterable over links (mechanize.Link objects).
|
| open(self, url, data=None, timeout=<object object>)
|
| open_local_file(self, filename)
|
| open_novisit(self, url, data=None, timeout=<object object>)
| Open a URL without visiting it.
|
| Browser state (including request, response, history, forms and links)
| is left unchanged by calling this function.
|
| The interface is the same as for .open().
|
| This is useful for things like fetching images.
|
| See also .retrieve().
|
| reload(self)
| Reload current document, and return response object.
|
| response(self)
| Return a copy of the current response.
|
| The returned object has the same interface as the object returned by
| .open() (or mechanize.urlopen()).
|
| select_form(self, name=None, predicate=None, nr=None)
| Select an HTML form for input.
|
| This is a bit like giving a form the "input focus" in a browser.
|
| If a form is selected, the Browser object supports the HTMLForm
| interface, so you can call methods like .set_value(), .set(), and
| .click().
|
| Another way to select a form is to assign to the .form attribute. The
| form assigned should be one of the objects returned by the .forms()
| method.
|
| At least one of the name, predicate and nr arguments must be supplied.
| If no matching form is found, mechanize.FormNotFoundError is raised.
|
| If name is specified, then the form must have the indicated name.
|
| If predicate is specified, then the form must match that function. The
| predicate function is passed the HTMLForm as its single argument, and
| should return a boolean value indicating whether the form matched.
|
| nr, if supplied, is the sequence number of the form (where 0 is the
| first). Note that control 0 is the first form matching all the other
| arguments (if supplied); it is not necessarily the first control in the
| form. The "global form" (consisting of all form controls not contained
| in any FORM element) is considered not to be part of this sequence and
| to have no name, so will not be matched unless both name and nr are
| None.
|
| set_cookie(self, cookie_string)
| Request to set a cookie.
|
| Note that it is NOT necessary to call this method under ordinary
| circumstances: cookie handling is normally entirely automatic. The
| intended use case is rather to simulate the setting of a cookie by
| client script in a web page (e.g. JavaScript). In that case, use of
| this method is necessary because mechanize currently does not support
| JavaScript, VBScript, etc.
|
| The cookie is added in the same way as if it had arrived with the
| current response, as a result of the current request. This means that,
| for example, if it is not appropriate to set the cookie based on the
| current request, no cookie will be set.
|
| The cookie will be returned automatically with subsequent responses
| made by the Browser instance whenever that's appropriate.
|
| cookie_string should be a valid value of the Set-Cookie header.
|
| For example:
|
| browser.set_cookie(
| "sid=abcdef; expires=Wednesday, 09-Nov-06 23:12:40 GMT")
|
| Currently, this method does not allow for adding RFC 2986 cookies.
| This limitation will be lifted if anybody requests it.
|
| set_handle_referer(self, handle)
| Set whether to add Referer header to each request.
|
| set_response(self, response)
| Replace current response with (a copy of) response.
|
| response may be None.
|
| This is intended mostly for HTML-preprocessing.
|
| submit(self, *args, **kwds)
| Submit current form.
|
| Arguments are as for mechanize.HTMLForm.click().
|
| Return value is same as for Browser.open().
|
| title(self)
| Return title, or None if there is no title element in the document.
|
| Treatment of any tag children of attempts to follow Firefox and IE
| (currently, tags are preserved).
|
| viewing_html(self)
| Return whether the current response contains HTML data.
|
| visit_response(self, response, request=None)
| Visit the response, as if it had been .open()ed.
|
| Unlike .set_response(), this updates history rather than replacing the
| current response.
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| default_features = ['_redirect', '_cookies', '_refresh', '_equiv', '_b...
|
| handler_classes = {'_basicauth': <class mechanize._urllib2_fork.HTTPBa...
|
| ----------------------------------------------------------------------
| Methods inherited from mechanize._useragent.UserAgentBase:
|
| add_client_certificate(self, url, key_file, cert_file)
| Add an SSL client certificate, for HTTPS client auth.
|
| key_file and cert_file must be filenames of the key and certificate
| files, in PEM format. You can use e.g. OpenSSL to convert a p12 (PKCS
| 12) file to PEM format:
|
| openssl pkcs12 -clcerts -nokeys -in cert.p12 -out cert.pem
| openssl pkcs12 -nocerts -in cert.p12 -out key.pem
|
|
| Note that client certificate password input is very inflexible ATM. At
| the moment this seems to be console only, which is presumably the
| default behaviour of libopenssl. In future mechanize may support
| third-party libraries that (I assume) allow more options here.
|
| add_password(self, url, user, password, realm=None)
|
| add_proxy_password(self, user, password, hostport=None, realm=None)
|
| set_client_cert_manager(self, cert_manager)
| Set a mechanize.HTTPClientCertMgr, or None.
|
| set_cookiejar(self, cookiejar)
| Set a mechanize.CookieJar, or None.
|
| set_debug_http(self, handle)
| Print HTTP headers to sys.stdout.
|
| set_debug_redirects(self, handle)
| Log information about HTTP redirects (including refreshes).
|
| Logging is performed using module logging. The logger name is
| "mechanize.http_redirects". To actually print some debug output,
| eg:
|
| import sys, logging
| logger = logging.getLogger("mechanize.http_redirects")
| logger.addHandler(logging.StreamHandler(sys.stdout))
| logger.setLevel(logging.INFO)
|
| Other logger names relevant to this module:
|
| "mechanize.http_responses"
| "mechanize.cookies"
|
| To turn on everything:
|
| import sys, logging
| logger = logging.getLogger("mechanize")
| logger.addHandler(logging.StreamHandler(sys.stdout))
| logger.setLevel(logging.INFO)
|
| set_debug_responses(self, handle)
| Log HTTP response bodies.
|
| See docstring for .set_debug_redirects() for details of logging.
|
| Response objects may be .seek()able if this is set (currently returned
| responses are, raised HTTPError exception responses are not).
|
| set_handle_equiv(self, handle, head_parser_class=None)
| Set whether to treat HTML http-equiv headers like HTTP headers.
|
| Response objects may be .seek()able if this is set (currently returned
| responses are, raised HTTPError exception responses are not).
|
| set_handle_gzip(self, handle)
| Handle gzip transfer encoding.
|
| set_handle_redirect(self, handle)
| Set whether to handle HTTP 30x redirections.
|
| set_handle_refresh(self, handle, max_time=None, honor_time=True)
| Set whether to handle HTTP Refresh headers.
|
| set_handle_robots(self, handle)
| Set whether to observe rules from robots.txt.
|
| set_handled_schemes(self, schemes)
| Set sequence of URL scheme (protocol) strings.
|
| For example: ua.set_handled_schemes(["http", "ftp"])
|
| If this fails (with ValueError) because you've passed an unknown
| scheme, the set of handled schemes will not be changed.
|
| set_password_manager(self, password_manager)
| Set a mechanize.HTTPPasswordMgrWithDefaultRealm, or None.
|
| set_proxies(self, proxies=None, proxy_bypass=None)
| Configure proxy settings.
|
| proxies: dictionary mapping URL scheme to proxy specification. None
| means use the default system-specific settings.
| proxy_bypass: function taking hostname, returning whether proxy should
| be used. None means use the default system-specific settings.
|
| The default is to try to obtain proxy settings from the system (see the
| documentation for urllib.urlopen for information about the
| system-specific methods used -- note that's urllib, not urllib2).
|
| To avoid all use of proxies, pass an empty proxies dict.
|
| >>> ua = UserAgentBase()
| >>> def proxy_bypass(hostname):
| ... return hostname == "noproxy.com"
| >>> ua.set_proxies(
| ... {"http": "joe:password@myproxy.example.com:3128",
| ... "ftp": "proxy.example.com"},
| ... proxy_bypass)
|
| set_proxy_password_manager(self, password_manager)
| Set a mechanize.HTTPProxyPasswordMgr, or None.
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from mechanize._useragent.UserAgentBase:
|
| default_others = ['_unknown', '_http_error', '_http_default_error']
|
| default_schemes = ['http', 'ftp', 'file', 'https']
|
| ----------------------------------------------------------------------
| Methods inherited from mechanize._opener.OpenerDirector:
|
| add_handler(self, handler)
|
| error(self, proto, *args)
|
| retrieve(self, fullurl, filename=None, reporthook=None, data=None, timeout=<object object>, open=<built-in function open>)
| Returns (filename, headers).
|
| For remote objects, the default filename will refer to a temporary
| file. Temporary files are removed when the OpenerDirector.close()
| method is called.
|
| For file: URLs, at present the returned filename is None. This may
| change in future.
|
| If the actual number of bytes read is less than indicated by the
| Content-Length header, raises ContentTooShortError (a URLError
| subclass). The exception's .result attribute contains the (filename,
| headers) that would have been returned.
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from mechanize._opener.OpenerDirector:
|
| BLOCK_SIZE = 8192

pyhton mechanize 学习笔记的更多相关文章

  1. OpenCV之Python学习笔记

    OpenCV之Python学习笔记 直都在用Python+OpenCV做一些算法的原型.本来想留下发布一些文章的,可是整理一下就有点无奈了,都是写零散不成系统的小片段.现在看 到一本国外的新书< ...

  2. js学习笔记:webpack基础入门(一)

    之前听说过webpack,今天想正式的接触一下,先跟着webpack的官方用户指南走: 在这里有: 如何安装webpack 如何使用webpack 如何使用loader 如何使用webpack的开发者 ...

  3. PHP-自定义模板-学习笔记

    1.  开始 这几天,看了李炎恢老师的<PHP第二季度视频>中的“章节7:创建TPL自定义模板”,做一个学习笔记,通过绘制架构图.UML类图和思维导图,来对加深理解. 2.  整体架构图 ...

  4. PHP-会员登录与注册例子解析-学习笔记

    1.开始 最近开始学习李炎恢老师的<PHP第二季度视频>中的“章节5:使用OOP注册会员”,做一个学习笔记,通过绘制基本页面流程和UML类图,来对加深理解. 2.基本页面流程 3.通过UM ...

  5. 2014年暑假c#学习笔记目录

    2014年暑假c#学习笔记 一.C#编程基础 1. c#编程基础之枚举 2. c#编程基础之函数可变参数 3. c#编程基础之字符串基础 4. c#编程基础之字符串函数 5.c#编程基础之ref.ou ...

  6. JAVA GUI编程学习笔记目录

    2014年暑假JAVA GUI编程学习笔记目录 1.JAVA之GUI编程概述 2.JAVA之GUI编程布局 3.JAVA之GUI编程Frame窗口 4.JAVA之GUI编程事件监听机制 5.JAVA之 ...

  7. seaJs学习笔记2 – seaJs组建库的使用

    原文地址:seaJs学习笔记2 – seaJs组建库的使用 我觉得学习新东西并不是会使用它就够了的,会使用仅仅代表你看懂了,理解了,二不代表你深入了,彻悟了它的精髓. 所以不断的学习将是源源不断. 最 ...

  8. CSS学习笔记

    CSS学习笔记 2016年12月15日整理 CSS基础 Chapter1 在console输入escape("宋体") ENTER 就会出现unicode编码 显示"%u ...

  9. HTML学习笔记

    HTML学习笔记 2016年12月15日整理 Chapter1 URL(scheme://host.domain:port/path/filename) scheme: 定义因特网服务的类型,常见的为 ...

随机推荐

  1. Window.open()方法参数详解总结(转)

    1, 最基本的弹出窗口代码   window.open('page.html'); 2, 经过设置后的弹出窗口   window.open('page.html', 'newwindow', 'hei ...

  2. php解析二维码

    第一种方法: 安装PHP扩展 php-zbarcode之前需要先安装ImageMagick.zbar 第二种方法: 不需要那么麻烦,直接使用PHP的第三方类库 下载地址:https://github. ...

  3. Java开发JDBC连接数据库

    Java开发JDBC连接数据库 创建一个以JDBC连接数据库的程序,包含6个步骤: JDBC五部曲1.加载驱动2.获得链接3.获取statement对象 4.执行SQL语句5.产生resultset对 ...

  4. Java项目启动时候报Neither the JAVA_HOME nor the JRE_HOME environment variable is defined 解决办法

    今天在发布Java项目的时候又遇到    Neither the JAVA_HOME nor the JRE_HOME environment variable is defined  At leas ...

  5. iOS-登录发送验证码时60秒倒计时,直接用

    __block NSInteger timeout= ; //倒计时时间 KWeakSelf dispatch_queue_t queue = dispatch_get_global_queue(DI ...

  6. B - 寻找M

    B - 寻找M Time Limit: 1000/1000MS (C++/Others) Memory Limit: 65536/65536KB (C++/Others) Problem Descri ...

  7. (转)部署MongoDB时需要注意的调参

    部署MongoDB的生产服务器,给出如下相关建议: 使用虚拟化环境: 系统配置 1)推荐RAID配置 RAID(Redundant Array of Independent Disk,独立磁盘冗余阵列 ...

  8. 【bzoj2815】[ZJOI2012]灾难 拓扑排序+倍增LCA

    题目描述(转自洛谷) 阿米巴是小强的好朋友. 阿米巴和小强在草原上捉蚂蚱.小强突然想,果蚂蚱被他们捉灭绝了,那么吃蚂蚱的小鸟就会饿死,而捕食小鸟的猛禽也会跟着灭绝,从而引发一系列的生态灾难. 学过生物 ...

  9. Str 函数

    Str 函数 Visual Studio 2005 返回数字的 String 表示形式.     Public Shared Function Str(ByVal Number As Object) ...

  10. noip 2011观光公交

    P1315 观光公交 95通过 244提交 题目提供者该用户不存在 标签贪心递推2011NOIp提高组 难度提高+/省选- 提交该题 讨论 题解 记录   题目描述 风景迷人的小城Y 市,拥有n 个美 ...