1:简单的使用

import mechanize
# response = mechanize.urlopen("http://www.hao123.com/")
request = mechanize.Request("http://www.hao123.com/")
response = mechanize.urlopen(request)
print response.geturl()
print response.info()
# print response.read()

2:mechanize.urlretrieve

>>> import mechanize
>>> help(mechanize.urlretrieve)
Help on function urlretrieve in module mechanize._opener: urlretrieve(url, filename=None, reporthook=None, data=None, timeout=<object object>)
  • 参数 finename 指定了保存本地路径(如果参数未指定,urllib会生成一个临时文件保存数据。)
  • 参数 reporthook 是一个回调函数,当连接上服务器、以及相应的数据块传输完毕时会触发该回调,我们可以利用这个回调函数来显示当前的下载进度。
  • 参数 data 指 post 到服务器的数据,该方法返回一个包含两个元素的(filename, headers)元组,filename 表示保存到本地的路径,header 表示服务器的响应头
  • 参数 timeout 是设定的超时对象

reporthook(block_read,block_size,total_size)定义回调函数,block_size是每次读取的数据块的大小,block_read是每次读取的数据块个数,taotal_size是一一共读取的数据量,单位是byte。可以使用reporthook函数来显示读取进度。

简单的例子

def cbk(a, b, c):print a,b,c

url = 'http://www.hao123.com/'
local = 'd://hao.html'
mechanize.urlretrieve(url,local,cbk)

3:form表单登陆

br = mechanize.Browser()
br.set_handle_robots(False)
br.open("http://www.zhaopin.com/")
br.select_form(nr=0)
br['loginname'] = '**'自己注册一个账号密码就行了
br['password'] = '**'
r = br.submit()
print os.path.dirname(__file__)+'\login.html'
h = file(os.path.dirname(__file__)+'\login.html',"w")
rt = r.read()
h.write(rt)
h.close()

4:Browser

看完help的文档基本可以成神了

Help on class Browser in module mechanize._mechanize:

class Browser(mechanize._useragent.UserAgentBase)
| Browser-like class with support for history, forms and links.
|
| BrowserStateError is raised whenever the browser is in the wrong state to
| complete the requested operation - e.g., when .back() is called when the
| browser history is empty, or when .follow_link() is called when the current
| response does not contain HTML data.
|
| Public attributes:
|
| request: current request (mechanize.Request)
| form: currently selected form (see .select_form())
|
| Method resolution order:
| Browser
| mechanize._useragent.UserAgentBase
| mechanize._opener.OpenerDirector
| mechanize._urllib2_fork.OpenerDirector
|
| Methods defined here:
|
| __getattr__(self, name)
|
| __init__(self, factory=None, history=None, request_class=None)
| Only named arguments should be passed to this constructor.
|
| factory: object implementing the mechanize.Factory interface.
| history: object implementing the mechanize.History interface. Note
| this interface is still experimental and may change in future.
| request_class: Request class to use. Defaults to mechanize.Request
|
| The Factory and History objects passed in are 'owned' by the Browser,
| so they should not be shared across Browsers. In particular,
| factory.set_response() should not be called except by the owning
| Browser itself.
|
| Note that the supplied factory's request_class is overridden by this
| constructor, to ensure only one Request class is used.
|
| __str__(self)
|
| back(self, n=1)
| Go back n steps in history, and return response object.
|
| n: go back this number of steps (default 1 step)
|
| clear_history(self)
|
| click(self, *args, **kwds)
| See mechanize.HTMLForm.click for documentation.
|
| click_link(self, link=None, **kwds)
| Find a link and return a Request object for it.
|
| Arguments are as for .find_link(), except that a link may be supplied
| as the first argument.
|
| close(self)
|
| encoding(self)
|
| find_link(self, **kwds)
| Find a link in current page.
|
| Links are returned as mechanize.Link objects.
|
| # Return third link that .search()-matches the regexp "python"
| # (by ".search()-matches", I mean that the regular expression method
| # .search() is used, rather than .match()).
| find_link(text_regex=re.compile("python"), nr=2)
|
| # Return first http link in the current page that points to somewhere
| # on python.org whose link text (after tags have been removed) is
| # exactly "monty python".
| find_link(text="monty python",
| url_regex=re.compile("http.*python.org"))
|
| # Return first link with exactly three HTML attributes.
| find_link(predicate=lambda link: len(link.attrs) == 3)
|
| Links include anchors (<a>), image maps (<area>), and frames (<frame>,
| <iframe>).
|
| All arguments must be passed by keyword, not position. Zero or more
| arguments may be supplied. In order to find a link, all arguments
| supplied must match.
|
| If a matching link is not found, mechanize.LinkNotFoundError is raised.
|
| text: link text between link tags: e.g. <a href="blah">this bit</a> (as
| returned by pullparser.get_compressed_text(), ie. without tags but
| with opening tags "textified" as per the pullparser docs) must compare
| equal to this argument, if supplied
| text_regex: link text between tag (as defined above) must match the
| regular expression object or regular expression string passed as this
| argument, if supplied
| name, name_regex: as for text and text_regex, but matched against the
| name HTML attribute of the link tag
| url, url_regex: as for text and text_regex, but matched against the
| URL of the link tag (note this matches against Link.url, which is a
| relative or absolute URL according to how it was written in the HTML)
| tag: element name of opening tag, e.g. "a"
| predicate: a function taking a Link object as its single argument,
| returning a boolean result, indicating whether the links
| nr: matches the nth link that matches all other criteria (default 0)
|
| follow_link(self, link=None, **kwds)
| Find a link and .open() it.
|
| Arguments are as for .click_link().
|
| Return value is same as for Browser.open().
|
| forms(self)
| Return iterable over forms.
|
| The returned form objects implement the mechanize.HTMLForm interface.
|
| geturl(self)
| Get URL of current document.
|
| global_form(self)
| Return the global form object, or None if the factory implementation
| did not supply one.
|
| The "global" form object contains all controls that are not descendants
| of any FORM element.
|
| The returned form object implements the mechanize.HTMLForm interface.
|
| This is a separate method since the global form is not regarded as part
| of the sequence of forms in the document -- mostly for
| backwards-compatibility.
|
| links(self, **kwds)
| Return iterable over links (mechanize.Link objects).
|
| open(self, url, data=None, timeout=<object object>)
|
| open_local_file(self, filename)
|
| open_novisit(self, url, data=None, timeout=<object object>)
| Open a URL without visiting it.
|
| Browser state (including request, response, history, forms and links)
| is left unchanged by calling this function.
|
| The interface is the same as for .open().
|
| This is useful for things like fetching images.
|
| See also .retrieve().
|
| reload(self)
| Reload current document, and return response object.
|
| response(self)
| Return a copy of the current response.
|
| The returned object has the same interface as the object returned by
| .open() (or mechanize.urlopen()).
|
| select_form(self, name=None, predicate=None, nr=None)
| Select an HTML form for input.
|
| This is a bit like giving a form the "input focus" in a browser.
|
| If a form is selected, the Browser object supports the HTMLForm
| interface, so you can call methods like .set_value(), .set(), and
| .click().
|
| Another way to select a form is to assign to the .form attribute. The
| form assigned should be one of the objects returned by the .forms()
| method.
|
| At least one of the name, predicate and nr arguments must be supplied.
| If no matching form is found, mechanize.FormNotFoundError is raised.
|
| If name is specified, then the form must have the indicated name.
|
| If predicate is specified, then the form must match that function. The
| predicate function is passed the HTMLForm as its single argument, and
| should return a boolean value indicating whether the form matched.
|
| nr, if supplied, is the sequence number of the form (where 0 is the
| first). Note that control 0 is the first form matching all the other
| arguments (if supplied); it is not necessarily the first control in the
| form. The "global form" (consisting of all form controls not contained
| in any FORM element) is considered not to be part of this sequence and
| to have no name, so will not be matched unless both name and nr are
| None.
|
| set_cookie(self, cookie_string)
| Request to set a cookie.
|
| Note that it is NOT necessary to call this method under ordinary
| circumstances: cookie handling is normally entirely automatic. The
| intended use case is rather to simulate the setting of a cookie by
| client script in a web page (e.g. JavaScript). In that case, use of
| this method is necessary because mechanize currently does not support
| JavaScript, VBScript, etc.
|
| The cookie is added in the same way as if it had arrived with the
| current response, as a result of the current request. This means that,
| for example, if it is not appropriate to set the cookie based on the
| current request, no cookie will be set.
|
| The cookie will be returned automatically with subsequent responses
| made by the Browser instance whenever that's appropriate.
|
| cookie_string should be a valid value of the Set-Cookie header.
|
| For example:
|
| browser.set_cookie(
| "sid=abcdef; expires=Wednesday, 09-Nov-06 23:12:40 GMT")
|
| Currently, this method does not allow for adding RFC 2986 cookies.
| This limitation will be lifted if anybody requests it.
|
| set_handle_referer(self, handle)
| Set whether to add Referer header to each request.
|
| set_response(self, response)
| Replace current response with (a copy of) response.
|
| response may be None.
|
| This is intended mostly for HTML-preprocessing.
|
| submit(self, *args, **kwds)
| Submit current form.
|
| Arguments are as for mechanize.HTMLForm.click().
|
| Return value is same as for Browser.open().
|
| title(self)
| Return title, or None if there is no title element in the document.
|
| Treatment of any tag children of attempts to follow Firefox and IE
| (currently, tags are preserved).
|
| viewing_html(self)
| Return whether the current response contains HTML data.
|
| visit_response(self, response, request=None)
| Visit the response, as if it had been .open()ed.
|
| Unlike .set_response(), this updates history rather than replacing the
| current response.
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| default_features = ['_redirect', '_cookies', '_refresh', '_equiv', '_b...
|
| handler_classes = {'_basicauth': <class mechanize._urllib2_fork.HTTPBa...
|
| ----------------------------------------------------------------------
| Methods inherited from mechanize._useragent.UserAgentBase:
|
| add_client_certificate(self, url, key_file, cert_file)
| Add an SSL client certificate, for HTTPS client auth.
|
| key_file and cert_file must be filenames of the key and certificate
| files, in PEM format. You can use e.g. OpenSSL to convert a p12 (PKCS
| 12) file to PEM format:
|
| openssl pkcs12 -clcerts -nokeys -in cert.p12 -out cert.pem
| openssl pkcs12 -nocerts -in cert.p12 -out key.pem
|
|
| Note that client certificate password input is very inflexible ATM. At
| the moment this seems to be console only, which is presumably the
| default behaviour of libopenssl. In future mechanize may support
| third-party libraries that (I assume) allow more options here.
|
| add_password(self, url, user, password, realm=None)
|
| add_proxy_password(self, user, password, hostport=None, realm=None)
|
| set_client_cert_manager(self, cert_manager)
| Set a mechanize.HTTPClientCertMgr, or None.
|
| set_cookiejar(self, cookiejar)
| Set a mechanize.CookieJar, or None.
|
| set_debug_http(self, handle)
| Print HTTP headers to sys.stdout.
|
| set_debug_redirects(self, handle)
| Log information about HTTP redirects (including refreshes).
|
| Logging is performed using module logging. The logger name is
| "mechanize.http_redirects". To actually print some debug output,
| eg:
|
| import sys, logging
| logger = logging.getLogger("mechanize.http_redirects")
| logger.addHandler(logging.StreamHandler(sys.stdout))
| logger.setLevel(logging.INFO)
|
| Other logger names relevant to this module:
|
| "mechanize.http_responses"
| "mechanize.cookies"
|
| To turn on everything:
|
| import sys, logging
| logger = logging.getLogger("mechanize")
| logger.addHandler(logging.StreamHandler(sys.stdout))
| logger.setLevel(logging.INFO)
|
| set_debug_responses(self, handle)
| Log HTTP response bodies.
|
| See docstring for .set_debug_redirects() for details of logging.
|
| Response objects may be .seek()able if this is set (currently returned
| responses are, raised HTTPError exception responses are not).
|
| set_handle_equiv(self, handle, head_parser_class=None)
| Set whether to treat HTML http-equiv headers like HTTP headers.
|
| Response objects may be .seek()able if this is set (currently returned
| responses are, raised HTTPError exception responses are not).
|
| set_handle_gzip(self, handle)
| Handle gzip transfer encoding.
|
| set_handle_redirect(self, handle)
| Set whether to handle HTTP 30x redirections.
|
| set_handle_refresh(self, handle, max_time=None, honor_time=True)
| Set whether to handle HTTP Refresh headers.
|
| set_handle_robots(self, handle)
| Set whether to observe rules from robots.txt.
|
| set_handled_schemes(self, schemes)
| Set sequence of URL scheme (protocol) strings.
|
| For example: ua.set_handled_schemes(["http", "ftp"])
|
| If this fails (with ValueError) because you've passed an unknown
| scheme, the set of handled schemes will not be changed.
|
| set_password_manager(self, password_manager)
| Set a mechanize.HTTPPasswordMgrWithDefaultRealm, or None.
|
| set_proxies(self, proxies=None, proxy_bypass=None)
| Configure proxy settings.
|
| proxies: dictionary mapping URL scheme to proxy specification. None
| means use the default system-specific settings.
| proxy_bypass: function taking hostname, returning whether proxy should
| be used. None means use the default system-specific settings.
|
| The default is to try to obtain proxy settings from the system (see the
| documentation for urllib.urlopen for information about the
| system-specific methods used -- note that's urllib, not urllib2).
|
| To avoid all use of proxies, pass an empty proxies dict.
|
| >>> ua = UserAgentBase()
| >>> def proxy_bypass(hostname):
| ... return hostname == "noproxy.com"
| >>> ua.set_proxies(
| ... {"http": "joe:password@myproxy.example.com:3128",
| ... "ftp": "proxy.example.com"},
| ... proxy_bypass)
|
| set_proxy_password_manager(self, password_manager)
| Set a mechanize.HTTPProxyPasswordMgr, or None.
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from mechanize._useragent.UserAgentBase:
|
| default_others = ['_unknown', '_http_error', '_http_default_error']
|
| default_schemes = ['http', 'ftp', 'file', 'https']
|
| ----------------------------------------------------------------------
| Methods inherited from mechanize._opener.OpenerDirector:
|
| add_handler(self, handler)
|
| error(self, proto, *args)
|
| retrieve(self, fullurl, filename=None, reporthook=None, data=None, timeout=<object object>, open=<built-in function open>)
| Returns (filename, headers).
|
| For remote objects, the default filename will refer to a temporary
| file. Temporary files are removed when the OpenerDirector.close()
| method is called.
|
| For file: URLs, at present the returned filename is None. This may
| change in future.
|
| If the actual number of bytes read is less than indicated by the
| Content-Length header, raises ContentTooShortError (a URLError
| subclass). The exception's .result attribute contains the (filename,
| headers) that would have been returned.
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from mechanize._opener.OpenerDirector:
|
| BLOCK_SIZE = 8192

pyhton mechanize 学习笔记的更多相关文章

  1. OpenCV之Python学习笔记

    OpenCV之Python学习笔记 直都在用Python+OpenCV做一些算法的原型.本来想留下发布一些文章的,可是整理一下就有点无奈了,都是写零散不成系统的小片段.现在看 到一本国外的新书< ...

  2. js学习笔记:webpack基础入门(一)

    之前听说过webpack,今天想正式的接触一下,先跟着webpack的官方用户指南走: 在这里有: 如何安装webpack 如何使用webpack 如何使用loader 如何使用webpack的开发者 ...

  3. PHP-自定义模板-学习笔记

    1.  开始 这几天,看了李炎恢老师的<PHP第二季度视频>中的“章节7:创建TPL自定义模板”,做一个学习笔记,通过绘制架构图.UML类图和思维导图,来对加深理解. 2.  整体架构图 ...

  4. PHP-会员登录与注册例子解析-学习笔记

    1.开始 最近开始学习李炎恢老师的<PHP第二季度视频>中的“章节5:使用OOP注册会员”,做一个学习笔记,通过绘制基本页面流程和UML类图,来对加深理解. 2.基本页面流程 3.通过UM ...

  5. 2014年暑假c#学习笔记目录

    2014年暑假c#学习笔记 一.C#编程基础 1. c#编程基础之枚举 2. c#编程基础之函数可变参数 3. c#编程基础之字符串基础 4. c#编程基础之字符串函数 5.c#编程基础之ref.ou ...

  6. JAVA GUI编程学习笔记目录

    2014年暑假JAVA GUI编程学习笔记目录 1.JAVA之GUI编程概述 2.JAVA之GUI编程布局 3.JAVA之GUI编程Frame窗口 4.JAVA之GUI编程事件监听机制 5.JAVA之 ...

  7. seaJs学习笔记2 – seaJs组建库的使用

    原文地址:seaJs学习笔记2 – seaJs组建库的使用 我觉得学习新东西并不是会使用它就够了的,会使用仅仅代表你看懂了,理解了,二不代表你深入了,彻悟了它的精髓. 所以不断的学习将是源源不断. 最 ...

  8. CSS学习笔记

    CSS学习笔记 2016年12月15日整理 CSS基础 Chapter1 在console输入escape("宋体") ENTER 就会出现unicode编码 显示"%u ...

  9. HTML学习笔记

    HTML学习笔记 2016年12月15日整理 Chapter1 URL(scheme://host.domain:port/path/filename) scheme: 定义因特网服务的类型,常见的为 ...

随机推荐

  1. C#非托管跨线程委托调试

    使用C#调用mingw的so文件,拿视频数据回wpf的界面进行显示,注册了回调函数.C++在调用回调函数时遇到了委托被回收的问题,提示:“类型的已垃圾回收委托进行了回调.这可能会导致应用程序崩溃.损坏 ...

  2. 数据结构6——DFS

    一.相关定义 深度优先遍历,也有称为深度优先搜索,简称DFS.其实,就像是一棵树的前序遍历. 初始条件:图G所有顶点均未被访问过,任选一点v. 思想:是从一个顶点V1开始,沿着一条路一直走到底,如果发 ...

  3. 自定义Json格式

    老铁们都知道,一般的json格式就是键值对格式,在一些特定的框架或者系统中,会用到自定义格式的json文件,假设我们要得到的特定格式json格式如下: {"A":"2&q ...

  4. iOS-开发,拨打电话

    [[UIApplication sharedApplication] openURL:[NSURL URLWithString:[NSString stringWithFormat:@"te ...

  5. 有用的Java注解

    好处: 能够读懂别人的代码,特别是框架相关的代码: 让编程更加简洁,代码更加清晰. 使用自定义注解解决问题!! Java1.5版本引入. Java中的常见注解 @Override:告诉使用者及编译器, ...

  6. JSP乱码问题

    在JSP中通过request对象获取请求参数时,如果遇到参数值为中文的情况,若不进行处理,获取到的参数值将是乱码.乱码情况分为以下两种: 1. 获取访问请求参数时乱码,采用如下方式解决. String ...

  7. 【题解】MUTC2013idiots

    我是先知道的这题是FFT然后再做的,知道是FFT其实就是个套路题啦.首先,我们容易发现 \(P = \frac{a}{b}\) 其中a表示合法的方案数,而b表示全部的方案数. b的值即为\(C\lef ...

  8. VB托盘图标不响应WM_MOUSEMOVE的原因及解决方法

    文章参考地址:http://blog.csdn.net/txh0001/article/details/38265895:http://bbs.csdn.net/topics/330106030 网上 ...

  9. HDU 2844 二进制优化的多重背包

    Coins Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others)Total Submi ...

  10. HDU3338:Kakuro Extension(最大流)

    Kakuro Extension Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others) ...