1:简单的使用

import mechanize

# response = mechanize.urlopen("http://www.hao123.com/")

request = mechanize.Request("http://www.hao123.com/")

response = mechanize.urlopen(request)

print response.geturl()

print response.info()

# print response.read()

2：mechanize.urlretrieve

>>> import mechanize

>>> help(mechanize.urlretrieve)

Help on function urlretrieve in module mechanize._opener:

urlretrieve(url, filename=None, reporthook=None, data=None, timeout=<object object>)

参数 finename 指定了保存本地路径（如果参数未指定，urllib会生成一个临时文件保存数据。）
参数 reporthook 是一个回调函数，当连接上服务器、以及相应的数据块传输完毕时会触发该回调，我们可以利用这个回调函数来显示当前的下载进度。
参数 data 指 post 到服务器的数据，该方法返回一个包含两个元素的(filename, headers)元组，filename 表示保存到本地的路径，header 表示服务器的响应头
参数 timeout 是设定的超时对象

reporthook(block_read,block_size,total_size)定义回调函数，block_size是每次读取的数据块的大小，block_read是每次读取的数据块个数，taotal_size是一一共读取的数据量，单位是byte。可以使用reporthook函数来显示读取进度。

简单的例子

def cbk(a, b, c):print a,b,c

url = 'http://www.hao123.com/'

local = 'd://hao.html'

mechanize.urlretrieve(url,local,cbk)

3：form表单登陆

br = mechanize.Browser()

br.set_handle_robots(False)

br.open("http://www.zhaopin.com/")

br.select_form(nr=0)

br['loginname'] = '**'自己注册一个账号密码就行了

br['password'] = '**'

r = br.submit()

print os.path.dirname(__file__)+'\login.html'

h = file(os.path.dirname(__file__)+'\login.html',"w")

rt = r.read()

h.write(rt)

h.close()

4：Browser

看完help的文档基本可以成神了

Help on class Browser in module mechanize._mechanize:

class Browser(mechanize._useragent.UserAgentBase)

 |  Browser-like class with support for history, forms and links.

 |

 |  BrowserStateError is raised whenever the browser is in the wrong state to

 |  complete the requested operation - e.g., when .back() is called when the

 |  browser history is empty, or when .follow_link() is called when the current

 |  response does not contain HTML data.

 |

 |  Public attributes:

 |

 |  request: current request (mechanize.Request)

 |  form: currently selected form (see .select_form())

 |

 |  Method resolution order:

 |      Browser

 |      mechanize._useragent.UserAgentBase

 |      mechanize._opener.OpenerDirector

 |      mechanize._urllib2_fork.OpenerDirector

 |

 |  Methods defined here:

 |

 |  __getattr__(self, name)

 |

 |  __init__(self, factory=None, history=None, request_class=None)

 |      Only named arguments should be passed to this constructor.

 |

 |      factory: object implementing the mechanize.Factory interface.

 |      history: object implementing the mechanize.History interface.  Note

 |       this interface is still experimental and may change in future.

 |      request_class: Request class to use.  Defaults to mechanize.Request

 |

 |      The Factory and History objects passed in are 'owned' by the Browser,

 |      so they should not be shared across Browsers.  In particular,

 |      factory.set_response() should not be called except by the owning

 |      Browser itself.

 |

 |      Note that the supplied factory's request_class is overridden by this

 |      constructor, to ensure only one Request class is used.

 |

 |  __str__(self)

 |

 |  back(self, n=1)

 |      Go back n steps in history, and return response object.

 |

 |      n: go back this number of steps (default 1 step)

 |

 |  clear_history(self)

 |

 |  click(self, *args, **kwds)

 |      See mechanize.HTMLForm.click for documentation.

 |

 |  click_link(self, link=None, **kwds)

 |      Find a link and return a Request object for it.

 |

 |      Arguments are as for .find_link(), except that a link may be supplied

 |      as the first argument.

 |

 |  close(self)

 |

 |  encoding(self)

 |

 |  find_link(self, **kwds)

 |      Find a link in current page.

 |

 |      Links are returned as mechanize.Link objects.

 |

 |      # Return third link that .search()-matches the regexp "python"

 |      # (by ".search()-matches", I mean that the regular expression method

 |      # .search() is used, rather than .match()).

 |      find_link(text_regex=re.compile("python"), nr=2)

 |

 |      # Return first http link in the current page that points to somewhere

 |      # on python.org whose link text (after tags have been removed) is

 |      # exactly "monty python".

 |      find_link(text="monty python",

 |                url_regex=re.compile("http.*python.org"))

 |

 |      # Return first link with exactly three HTML attributes.

 |      find_link(predicate=lambda link: len(link.attrs) == 3)

 |

 |      Links include anchors (<a>), image maps (<area>), and frames (<frame>,

 |      <iframe>).

 |

 |      All arguments must be passed by keyword, not position.  Zero or more

 |      arguments may be supplied.  In order to find a link, all arguments

 |      supplied must match.

 |

 |      If a matching link is not found, mechanize.LinkNotFoundError is raised.

 |

 |      text: link text between link tags: e.g. <a href="blah">this bit</a> (as

 |       returned by pullparser.get_compressed_text(), ie. without tags but

 |       with opening tags "textified" as per the pullparser docs) must compare

 |       equal to this argument, if supplied

 |      text_regex: link text between tag (as defined above) must match the

 |       regular expression object or regular expression string passed as this

 |       argument, if supplied

 |      name, name_regex: as for text and text_regex, but matched against the

 |       name HTML attribute of the link tag

 |      url, url_regex: as for text and text_regex, but matched against the

 |       URL of the link tag (note this matches against Link.url, which is a

 |       relative or absolute URL according to how it was written in the HTML)

 |      tag: element name of opening tag, e.g. "a"

 |      predicate: a function taking a Link object as its single argument,

 |       returning a boolean result, indicating whether the links

 |      nr: matches the nth link that matches all other criteria (default 0)

 |

 |  follow_link(self, link=None, **kwds)

 |      Find a link and .open() it.

 |

 |      Arguments are as for .click_link().

 |

 |      Return value is same as for Browser.open().

 |

 |  forms(self)

 |      Return iterable over forms.

 |

 |      The returned form objects implement the mechanize.HTMLForm interface.

 |

 |  geturl(self)

 |      Get URL of current document.

 |

 |  global_form(self)

 |      Return the global form object, or None if the factory implementation

 |      did not supply one.

 |

 |      The "global" form object contains all controls that are not descendants

 |      of any FORM element.

 |

 |      The returned form object implements the mechanize.HTMLForm interface.

 |

 |      This is a separate method since the global form is not regarded as part

 |      of the sequence of forms in the document -- mostly for

 |      backwards-compatibility.

 |

 |  links(self, **kwds)

 |      Return iterable over links (mechanize.Link objects).

 |

 |  open(self, url, data=None, timeout=<object object>)

 |

 |  open_local_file(self, filename)

 |

 |  open_novisit(self, url, data=None, timeout=<object object>)

 |      Open a URL without visiting it.

 |

 |      Browser state (including request, response, history, forms and links)

 |      is left unchanged by calling this function.

 |

 |      The interface is the same as for .open().

 |

 |      This is useful for things like fetching images.

 |

 |      See also .retrieve().

 |

 |  reload(self)

 |      Reload current document, and return response object.

 |

 |  response(self)

 |      Return a copy of the current response.

 |

 |      The returned object has the same interface as the object returned by

 |      .open() (or mechanize.urlopen()).

 |

 |  select_form(self, name=None, predicate=None, nr=None)

 |      Select an HTML form for input.

 |

 |      This is a bit like giving a form the "input focus" in a browser.

 |

 |      If a form is selected, the Browser object supports the HTMLForm

 |      interface, so you can call methods like .set_value(), .set(), and

 |      .click().

 |

 |      Another way to select a form is to assign to the .form attribute.  The

 |      form assigned should be one of the objects returned by the .forms()

 |      method.

 |

 |      At least one of the name, predicate and nr arguments must be supplied.

 |      If no matching form is found, mechanize.FormNotFoundError is raised.

 |

 |      If name is specified, then the form must have the indicated name.

 |

 |      If predicate is specified, then the form must match that function.  The

 |      predicate function is passed the HTMLForm as its single argument, and

 |      should return a boolean value indicating whether the form matched.

 |

 |      nr, if supplied, is the sequence number of the form (where 0 is the

 |      first).  Note that control 0 is the first form matching all the other

 |      arguments (if supplied); it is not necessarily the first control in the

 |      form.  The "global form" (consisting of all form controls not contained

 |      in any FORM element) is considered not to be part of this sequence and

 |      to have no name, so will not be matched unless both name and nr are

 |      None.

 |

 |  set_cookie(self, cookie_string)

 |      Request to set a cookie.

 |

 |      Note that it is NOT necessary to call this method under ordinary

 |      circumstances: cookie handling is normally entirely automatic.  The

 |      intended use case is rather to simulate the setting of a cookie by

 |      client script in a web page (e.g. JavaScript).  In that case, use of

 |      this method is necessary because mechanize currently does not support

 |      JavaScript, VBScript, etc.

 |

 |      The cookie is added in the same way as if it had arrived with the

 |      current response, as a result of the current request.  This means that,

 |      for example, if it is not appropriate to set the cookie based on the

 |      current request, no cookie will be set.

 |

 |      The cookie will be returned automatically with subsequent responses

 |      made by the Browser instance whenever that's appropriate.

 |

 |      cookie_string should be a valid value of the Set-Cookie header.

 |

 |      For example:

 |

 |      browser.set_cookie(

 |          "sid=abcdef; expires=Wednesday, 09-Nov-06 23:12:40 GMT")

 |

 |      Currently, this method does not allow for adding RFC 2986 cookies.

 |      This limitation will be lifted if anybody requests it.

 |

 |  set_handle_referer(self, handle)

 |      Set whether to add Referer header to each request.

 |

 |  set_response(self, response)

 |      Replace current response with (a copy of) response.

 |

 |      response may be None.

 |

 |      This is intended mostly for HTML-preprocessing.

 |

 |  submit(self, *args, **kwds)

 |      Submit current form.

 |

 |      Arguments are as for mechanize.HTMLForm.click().

 |

 |      Return value is same as for Browser.open().

 |

 |  title(self)

 |      Return title, or None if there is no title element in the document.

 |

 |      Treatment of any tag children of attempts to follow Firefox and IE

 |      (currently, tags are preserved).

 |

 |  viewing_html(self)

 |      Return whether the current response contains HTML data.

 |

 |  visit_response(self, response, request=None)

 |      Visit the response, as if it had been .open()ed.

 |

 |      Unlike .set_response(), this updates history rather than replacing the

 |      current response.

 |

 |  ----------------------------------------------------------------------

 |  Data and other attributes defined here:

 |

 |  default_features = ['_redirect', '_cookies', '_refresh', '_equiv', '_b...

 |

 |  handler_classes = {'_basicauth': <class mechanize._urllib2_fork.HTTPBa...

 |

 |  ----------------------------------------------------------------------

 |  Methods inherited from mechanize._useragent.UserAgentBase:

 |

 |  add_client_certificate(self, url, key_file, cert_file)

 |      Add an SSL client certificate, for HTTPS client auth.

 |

 |      key_file and cert_file must be filenames of the key and certificate

 |      files, in PEM format.  You can use e.g. OpenSSL to convert a p12 (PKCS

 |      12) file to PEM format:

 |

 |      openssl pkcs12 -clcerts -nokeys -in cert.p12 -out cert.pem

 |      openssl pkcs12 -nocerts -in cert.p12 -out key.pem

 |

 |

 |      Note that client certificate password input is very inflexible ATM.  At

 |      the moment this seems to be console only, which is presumably the

 |      default behaviour of libopenssl.  In future mechanize may support

 |      third-party libraries that (I assume) allow more options here.

 |

 |  add_password(self, url, user, password, realm=None)

 |

 |  add_proxy_password(self, user, password, hostport=None, realm=None)

 |

 |  set_client_cert_manager(self, cert_manager)

 |      Set a mechanize.HTTPClientCertMgr, or None.

 |

 |  set_cookiejar(self, cookiejar)

 |      Set a mechanize.CookieJar, or None.

 |

 |  set_debug_http(self, handle)

 |      Print HTTP headers to sys.stdout.

 |

 |  set_debug_redirects(self, handle)

 |      Log information about HTTP redirects (including refreshes).

 |

 |      Logging is performed using module logging.  The logger name is

 |      "mechanize.http_redirects".  To actually print some debug output,

 |      eg:

 |

 |      import sys, logging

 |      logger = logging.getLogger("mechanize.http_redirects")

 |      logger.addHandler(logging.StreamHandler(sys.stdout))

 |      logger.setLevel(logging.INFO)

 |

 |      Other logger names relevant to this module:

 |

 |      "mechanize.http_responses"

 |      "mechanize.cookies"

 |

 |      To turn on everything:

 |

 |      import sys, logging

 |      logger = logging.getLogger("mechanize")

 |      logger.addHandler(logging.StreamHandler(sys.stdout))

 |      logger.setLevel(logging.INFO)

 |

 |  set_debug_responses(self, handle)

 |      Log HTTP response bodies.

 |

 |      See docstring for .set_debug_redirects() for details of logging.

 |

 |      Response objects may be .seek()able if this is set (currently returned

 |      responses are, raised HTTPError exception responses are not).

 |

 |  set_handle_equiv(self, handle, head_parser_class=None)

 |      Set whether to treat HTML http-equiv headers like HTTP headers.

 |

 |      Response objects may be .seek()able if this is set (currently returned

 |      responses are, raised HTTPError exception responses are not).

 |

 |  set_handle_gzip(self, handle)

 |      Handle gzip transfer encoding.

 |

 |  set_handle_redirect(self, handle)

 |      Set whether to handle HTTP 30x redirections.

 |

 |  set_handle_refresh(self, handle, max_time=None, honor_time=True)

 |      Set whether to handle HTTP Refresh headers.

 |

 |  set_handle_robots(self, handle)

 |      Set whether to observe rules from robots.txt.

 |

 |  set_handled_schemes(self, schemes)

 |      Set sequence of URL scheme (protocol) strings.

 |

 |      For example: ua.set_handled_schemes(["http", "ftp"])

 |

 |      If this fails (with ValueError) because you've passed an unknown

 |      scheme, the set of handled schemes will not be changed.

 |

 |  set_password_manager(self, password_manager)

 |      Set a mechanize.HTTPPasswordMgrWithDefaultRealm, or None.

 |

 |  set_proxies(self, proxies=None, proxy_bypass=None)

 |      Configure proxy settings.

 |

 |      proxies: dictionary mapping URL scheme to proxy specification.  None

 |        means use the default system-specific settings.

 |      proxy_bypass: function taking hostname, returning whether proxy should

 |        be used.  None means use the default system-specific settings.

 |

 |      The default is to try to obtain proxy settings from the system (see the

 |      documentation for urllib.urlopen for information about the

 |      system-specific methods used -- note that's urllib, not urllib2).

 |

 |      To avoid all use of proxies, pass an empty proxies dict.

 |

 |      >>> ua = UserAgentBase()

 |      >>> def proxy_bypass(hostname):

 |      ...     return hostname == "noproxy.com"

 |      >>> ua.set_proxies(

 |      ...     {"http": "joe:password@myproxy.example.com:3128",

 |      ...      "ftp": "proxy.example.com"},

 |      ...     proxy_bypass)

 |

 |  set_proxy_password_manager(self, password_manager)

 |      Set a mechanize.HTTPProxyPasswordMgr, or None.

 |

 |  ----------------------------------------------------------------------

 |  Data and other attributes inherited from mechanize._useragent.UserAgentBase:

 |

 |  default_others = ['_unknown', '_http_error', '_http_default_error']

 |

 |  default_schemes = ['http', 'ftp', 'file', 'https']

 |

 |  ----------------------------------------------------------------------

 |  Methods inherited from mechanize._opener.OpenerDirector:

 |

 |  add_handler(self, handler)

 |

 |  error(self, proto, *args)

 |

 |  retrieve(self, fullurl, filename=None, reporthook=None, data=None, timeout=<object object>, open=<built-in function open>)

 |      Returns (filename, headers).

 |

 |      For remote objects, the default filename will refer to a temporary

 |      file.  Temporary files are removed when the OpenerDirector.close()

 |      method is called.

 |

 |      For file: URLs, at present the returned filename is None.  This may

 |      change in future.

 |

 |      If the actual number of bytes read is less than indicated by the

 |      Content-Length header, raises ContentTooShortError (a URLError

 |      subclass).  The exception's .result attribute contains the (filename,

 |      headers) that would have been returned.

 |

 |  ----------------------------------------------------------------------

 |  Data and other attributes inherited from mechanize._opener.OpenerDirector:

 |

 |  BLOCK_SIZE = 8192

pyhton mechanize 学习笔记的更多相关文章

OpenCV之Python学习笔记
OpenCV之Python学习笔记直都在用Python+OpenCV做一些算法的原型.本来想留下发布一些文章的,可是整理一下就有点无奈了,都是写零散不成系统的小片段.现在看到一本国外的新书< ...
js学习笔记：webpack基础入门（一）
之前听说过webpack,今天想正式的接触一下,先跟着webpack的官方用户指南走: 在这里有: 如何安装webpack 如何使用webpack 如何使用loader 如何使用webpack的开发者 ...
PHP-自定义模板-学习笔记
1. 开始这几天,看了李炎恢老师的<PHP第二季度视频>中的“章节7:创建TPL自定义模板”,做一个学习笔记,通过绘制架构图.UML类图和思维导图,来对加深理解. 2. 整体架构图 ...
PHP-会员登录与注册例子解析-学习笔记
1.开始最近开始学习李炎恢老师的<PHP第二季度视频>中的“章节5:使用OOP注册会员”,做一个学习笔记,通过绘制基本页面流程和UML类图,来对加深理解. 2.基本页面流程 3.通过UM ...
2014年暑假c#学习笔记目录
2014年暑假c#学习笔记一.C#编程基础 1. c#编程基础之枚举 2. c#编程基础之函数可变参数 3. c#编程基础之字符串基础 4. c#编程基础之字符串函数 5.c#编程基础之ref.ou ...
JAVA GUI编程学习笔记目录
2014年暑假JAVA GUI编程学习笔记目录 1.JAVA之GUI编程概述 2.JAVA之GUI编程布局 3.JAVA之GUI编程Frame窗口 4.JAVA之GUI编程事件监听机制 5.JAVA之 ...
seaJs学习笔记2 – seaJs组建库的使用
原文地址:seaJs学习笔记2 – seaJs组建库的使用我觉得学习新东西并不是会使用它就够了的,会使用仅仅代表你看懂了,理解了,二不代表你深入了,彻悟了它的精髓. 所以不断的学习将是源源不断. 最 ...
CSS学习笔记
CSS学习笔记 2016年12月15日整理 CSS基础 Chapter1 在console输入escape("宋体") ENTER 就会出现unicode编码显示"%u ...
HTML学习笔记
HTML学习笔记 2016年12月15日整理 Chapter1 URL(scheme://host.domain:port/path/filename) scheme: 定义因特网服务的类型,常见的为 ...

随机推荐

Window.open()方法参数详解总结（转）
1, 最基本的弹出窗口代码 window.open('page.html'); 2, 经过设置后的弹出窗口 window.open('page.html', 'newwindow', 'hei ...
php解析二维码
第一种方法: 安装PHP扩展 php-zbarcode之前需要先安装ImageMagick.zbar 第二种方法: 不需要那么麻烦,直接使用PHP的第三方类库下载地址:https://github. ...
Java开发JDBC连接数据库
Java开发JDBC连接数据库创建一个以JDBC连接数据库的程序,包含6个步骤: JDBC五部曲1.加载驱动2.获得链接3.获取statement对象 4.执行SQL语句5.产生resultset对 ...
Java项目启动时候报Neither the JAVA_HOME nor the JRE_HOME environment variable is defined 解决办法
今天在发布Java项目的时候又遇到 Neither the JAVA_HOME nor the JRE_HOME environment variable is defined At leas ...
iOS-登录发送验证码时60秒倒计时,直接用
__block NSInteger timeout= ; //倒计时时间 KWeakSelf dispatch_queue_t queue = dispatch_get_global_queue(DI ...
B - 寻找M
B - 寻找M Time Limit: 1000/1000MS (C++/Others) Memory Limit: 65536/65536KB (C++/Others) Problem Descri ...
(转)部署MongoDB时需要注意的调参
部署MongoDB的生产服务器,给出如下相关建议: 使用虚拟化环境: 系统配置 1)推荐RAID配置 RAID(Redundant Array of Independent Disk,独立磁盘冗余阵列 ...
【bzoj2815】[ZJOI2012]灾难拓扑排序+倍增LCA
题目描述(转自洛谷) 阿米巴是小强的好朋友. 阿米巴和小强在草原上捉蚂蚱.小强突然想,果蚂蚱被他们捉灭绝了,那么吃蚂蚱的小鸟就会饿死,而捕食小鸟的猛禽也会跟着灭绝,从而引发一系列的生态灾难. 学过生物 ...
Str 函数
Str 函数 Visual Studio 2005 返回数字的 String 表示形式. Public Shared Function Str(ByVal Number As Object) ...
noip 2011观光公交
P1315 观光公交 95通过 244提交题目提供者该用户不存在标签贪心递推2011NOIp提高组难度提高+/省选- 提交该题讨论题解记录题目描述风景迷人的小城Y 市,拥有n 个美 ...

pyhton mechanize 学习笔记

1:简单的使用

2：mechanize.urlretrieve

3：form表单登陆

4：Browser

pyhton mechanize 学习笔记的更多相关文章

随机推荐

热门专题