知识点一:PyQuery库详解及其基本使用

  • 初始化

  1. 字符串初始化

    1. html = '''
    2. <div>
    3. <ul>
    4. <li class="item-0">first item</li>
    5. <li class="item-1"><a href="link2.html">second item</a><>/li
    6. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    7. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    8. <li class="item-0"><a href="link5.html">fifth item</a></li>
    9. </ul>
    10. </div>
    11. '''
    12. from pyquery import PyQuery as pq
    13. doc = pq(html)
    14.  
    15. print(doc('li'))#选择器实际上就是CSS选择器,即:选id就加“#”,选class前面加“.”
    1. <li class="item-0">first item</li>
    2. <li class="item-1"><a href="link2.html">second item</a>&lt;&gt;/li
    3. </li><li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    4. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    5. <li class="item-0"><a href="link5.html">fifth item</a></li>

    获得的结果

  2. URL初始化

    1. from pyquery import PyQuery as pq
    2. doc1 = pq(url = "http://www.baidu.com")
    3.  
    4. print(doc1("head"))
    1. <head><meta http-equiv="content-type" content="text/html;charset=utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=Edge"/><meta content="always" name="referrer"/><link rel="stylesheet" type="text/css" href="http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css"/><title>ç™¾åº¦ä¸€ä¸‹ï¼Œä½ å°±çŸ¥é“</title></head>

    获得的结果

  3. 文件初始化

    1. from pyquery import PyQuery as pq
    2. doc2 = pq(filename = "demo.html")#自己下载一个HTML文件
    3.  
    4. print(doc2('li'))
  • 基本CSS选择器

  1. 实例

    1. tml = '''
    2. <div id="container">
    3. <ul class="list">
    4. <li class="item-0">first item</li>
    5. <li class="item-1"><a href="link2.html">second item</a><>/li
    6. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    7. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    8. <li class="item-0"><a href="link5.html">fifth item</a></li>
    9. </ul>
    10. </div>
    11. '''
    12. from pyquery import PyQuery as pq
    13. doc = pq(html)
    14.  
    15. print(doc("#container .list li"))#注意空格,空格代表嵌套关系
    1. <li class="item-0">first item</li>
    2. <li class="item-1"><a href="link2.html">second item</a>&lt;&gt;/li
    3. </li><li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    4. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    5. <li class="item-0"><a href="link5.html">fifth item</a></li>

    获得的结果

  2. 查询元素

    1. 子元素

      1. html = '''
      2. <div id="container">
      3. <ul class="list">
      4. <li class="item-0">first item</li>
      5. <li class="item-1"><a href="link2.html">second item</a><>/li
      6. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      7. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      8. <li class="item-0"><a href="link5.html">fifth item</a></li>
      9. </ul>
      10. </div>
      11. '''
      12. from pyquery import PyQuery as pq
      13. doc = pq(html)
      14. items = doc(".list")#首先选中url标签
      15.  
      16. print(type(items))
      17. print(items)
      18.  
      19. lis = items.find('li')#实际上也是一个CSS选择器,将里面所有的li标签都打印出来;只要在它里面的标签都可以找到
      20. print(type(lis))
      21. print(lis)
      22.  
      23. #查找直接子元素
      24. lis2 = items.children()
      25. print(type(lis2))
      26. print(lis2)
      27.  
      28. lis3 = items.children('.active')
      29. print(lis3)
      1. <class 'pyquery.pyquery.PyQuery'>
      2. <ul class="list">
      3. <li class="item-0">first item</li>
      4. <li class="item-1"><a href="link2.html">second item</a>&lt;&gt;/li
      5. </li><li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      6. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      7. <li class="item-0"><a href="link5.html">fifth item</a></li>
      8. </ul>
      9.  
      10. <class 'pyquery.pyquery.PyQuery'>
      11. <li class="item-0">first item</li>
      12. <li class="item-1"><a href="link2.html">second item</a>&lt;&gt;/li
      13. </li><li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      14. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      15. <li class="item-0"><a href="link5.html">fifth item</a></li>
      16.  
      17. <class 'pyquery.pyquery.PyQuery'>
      18. <li class="item-0">first item</li>
      19. <li class="item-1"><a href="link2.html">second item</a>&lt;&gt;/li
      20. </li><li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      21. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      22. <li class="item-0"><a href="link5.html">fifth item</a></li>
      23.  
      24. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      25. <li class="item-1 active"><a href="link4.html">fourth item</a></li>

      获得的结果

    2. 父元素

      1. #父元素
      2. html = '''
      3. <div id="container">
      4. <ul class="list">
      5. <li class="item-0">first item</li>
      6. <li class="item-1"><a href="link2.html">second item</a></li>
      7. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      9. <li class="item-0"><a href="link5.html">fifth item</a></li>
      10. </ul>
      11. </div>
      12. '''
      13. from pyquery import PyQuery as pq
      14. doc = pq(html)
      15.  
      16. items = doc(".list")#首先选中url标签
      17. #每个标签外面肯定只能套一个父元素
      18. container = items.parent()
      19.  
      20. print(type(container))
      21. print(container)
      1. <class 'pyquery.pyquery.PyQuery'>
      2. <div id="container">
      3. <ul class="list">
      4. <li class="item-0">first item</li>
      5. <li class="item-1"><a href="link2.html">second item</a></li>
      6. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      7. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      8. <li class="item-0"><a href="link5.html">fifth item</a></li>
      9. </ul>
      10. </div>

      获得的结果

      另一种方法:

      1. html = '''
      2. <div class="wrap">
      3.     <div id="container">
      4.         <ul class="list">
      5.             <li class="item-0">first item</li>
      6.             <li class="item-1"><a href="link2.html">second item</a><>/li
      7.             <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      8.             <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      9.             <li class="item-0"><a href="link5.html">fifth item</a></li>
      10.         </ul>
      11.     </div>
      12. </div>
      13. '''
      14.  
      15. from pyquery import PyQuery as pq
      16. doc = pq(html)
      17. items = doc(".list")#首先选中url标签
      18. #将所有祖先节点返回
      19. parents = items.parents()
      20.  
      21. print(parents)
      22. print(type(parents))#打印出两个div

      另一种方法

      1. <html><body><div class="wrap">
      2. <div id="container">
      3. <ul class="list">
      4. <li class="item-0">first item</li>
      5. <li class="item-1"><a href="link2.html">second item</a>&lt;&gt;/li
      6. </li><li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      7. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      8. <li class="item-0"><a href="link5.html">fifth item</a></li>
      9. </ul>
      10. </div>
      11. </div>
      12. </body></html><body><div class="wrap">
      13. <div id="container">
      14. <ul class="list">
      15. <li class="item-0">first item</li>
      16. <li class="item-1"><a href="link2.html">second item</a>&lt;&gt;/li
      17. </li><li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      18. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      19. <li class="item-0"><a href="link5.html">fifth item</a></li>
      20. </ul>
      21. </div>
      22. </div>
      23. </body><div class="wrap">
      24. <div id="container">
      25. <ul class="list">
      26. <li class="item-0">first item</li>
      27. <li class="item-1"><a href="link2.html">second item</a>&lt;&gt;/li
      28. </li><li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      29. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      30. <li class="item-0"><a href="link5.html">fifth item</a></li>
      31. </ul>
      32. </div>
      33. </div>
      34. <div id="container">
      35. <ul class="list">
      36. <li class="item-0">first item</li>
      37. <li class="item-1"><a href="link2.html">second item</a>&lt;&gt;/li
      38. </li><li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      39. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      40. <li class="item-0"><a href="link5.html">fifth item</a></li>
      41. </ul>
      42. </div>
      43.  
      44. <class 'pyquery.pyquery.PyQuery'>

      --->获得的结果

      1. html = '''
      2. <div class="wrap">
      3.     <div id="container">
      4.         <ul class="list">
      5.             <li class="item-0">first item</li>
      6.             <li class="item-1"><a href="link2.html">second item</a><>/li
      7.             <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      8.             <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      9.             <li class="item-0"><a href="link5.html">fifth item</a></li>
      10.         </ul>
      11.     </div>
      12. </div>
      13. '''
      14.  
      15. from pyquery import PyQuery as pq
      16. doc = pq(html)
      17. items = doc(".list")#首先选中url标签
      18.  
      19. #在其中进行搜索
      20. parents1 = items.parents(".wrap")
      21.  
      22. print(parents1)#通过筛选,只剩下一个div

      获取单一内容

      1. <div class="wrap">
      2. <div id="container">
      3. <ul class="list">
      4. <li class="item-0">first item</li>
      5. <li class="item-1"><a href="link2.html">second item</a>&lt;&gt;/li
      6. </li><li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      7. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      8. <li class="item-0"><a href="link5.html">fifth item</a></li>
      9. </ul>
      10. </div>
      11. </div>

      --->获得的结果

    3. 兄弟元素

      1. html = '''
      2. <div class="wrap">
      3.     <div id="container">
      4.         <ul class="list">
      5.             <li class="item-0">first item</li>
      6.             <li class="item-1"><a href="link2.html">second item</a><>/li
      7.             <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      8.             <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      9.             <li class="item-0"><a href="link5.html">fifth item</a></li>
      10.         </ul>
      11.     </div>
      12. </div>
      13. '''
      14. from pyquery import PyQuery as pq
      15. doc = pq(html)
      16. li = doc('.list .item-0.active')#首先选class=“.list”,空格即使选择list里面的标签,再选class=“item-0”,并列active(实际就是一个整体)
      17. print(li)
      18. print(li.siblings())#获取所有的兄弟元素
      1. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      2.  
      3. <li class="item-1"><a href="link2.html">second item</a>&lt;&gt;/li
      4. </li><li class="item-0">first item</li>
      5. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      6. <li class="item-0"><a href="link5.html">fifth item</a></li>

      获得的结果

      另一种方式:

      1. html = '''
      2. <div class="wrap">
      3.     <div id="container">
      4.         <ul class="list">
      5.             <li class="item-0">first item</li>
      6.             <li class="item-1"><a href="link2.html">second item</a><>/li
      7.             <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
      8.             <li class="item-1 active"><a href="link4.html">fourth item</a></li>
      9.             <li class="item-0"><a href="link5.html">fifth item</a></li>
      10.         </ul>
      11.     </div>
      12. </div>
      13. '''
      14. from pyquery import PyQuery as pq
      15. doc = pq(html)
      16.  
      17. li = doc('.list .item-0.active')#首先选class=“.list”,空格即使选择list里面的标签,再选class=“item-0”,并列active(实际就是一个整体)
      18. #在向其中筛选
      19. print(li.siblings('.active'))

      另一种方式

      1. <li class="item-1 active"><a href="link4.html">fourth item</a></li>

      --->获得的结果

  • 遍历

  1. 单个元素

    1. html = '''
    2. <div class="wrap">
    3. <div id="container">
    4. <ul class="list">
    5. <li class="item-0">first item</li>
    6. <li class="item-1"><a href="link2.html">second item</a></li>
    7. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    9. <li class="item-0"><a href="link5.html">fifth item</a></li>
    10. </ul>
    11. </div>
    12. </div>
    13. '''
    14. from pyquery import PyQuery as pq
    15. doc = pq(html)
    16.  
    17. li = doc(".item-0.active")
    18. print(li)
    1. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>

    获得的方法

    另一种方式

    1. html = '''
    2. <div class="wrap">
    3. <div id="container">
    4. <ul class="list">
    5. <li class="item-0">first item</li>
    6. <li class="item-1"><a href="link2.html">second item</a></li>
    7. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    9. <li class="item-0"><a href="link5.html">fifth item</a></li>
    10. </ul>
    11. </div>
    12. </div>
    13. '''
    14. from pyquery import PyQuery as pq
    15. doc = pq(html)
    16.  
    17. lis = doc('li').items()#多个元素,进行遍历,生成一个产生器
    18.  
    19. print(type(lis))
    20. for li in lis:
    21. print(li)

    另一种方式

    1. <class 'generator'>
    2. <li class="item-0">first item</li>
    3.  
    4. <li class="item-1"><a href="link2.html">second item</a></li>
    5.  
    6. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    7.  
    8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    9.  
    10. <li class="item-0"><a href="link5.html">fifth item</a></li>

    --->获得的结果

  • 获取信息

  1. 获取属性

    1. html = '''
    2. <div class="wrap">
    3. <div id="container">
    4. <ul class="list">
    5. <li class="item-0">first item</li>
    6. <li class="item-1"><a href="link2.html">second item</a></li>
    7. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    9. <li class="item-0"><a href="link5.html">fifth item</a></li>
    10. </ul>
    11. </div>
    12. </div>
    13. '''
    14.  
    15. from pyquery import PyQuery as pq
    16. doc = pq(html)
    17. a = doc(".item-0.active a")#选择class同时为item-0active,在选择class里面的啊标签,中间注意空格
    18. print(a)
    19. print(a.attr("href"))
    20. print(a.attr.href)#结果同上
    1. <a href="link3.html"><span class="boid">third item</span></a>
    2. link3.html
    3. link3.html

    获得的结果

  2. 获取文本

    1. html = '''
    2. <div class="wrap">
    3. <div id="container">
    4. <ul class="list">
    5. <li class="item-0">first item</li>
    6. <li class="item-1"><a href="link2.html">second item</a></li>
    7. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    9. <li class="item-0"><a href="link5.html">fifth item</a></li>
    10. </ul>
    11. </div>
    12. </div>
    13. '''
    14. from pyquery import PyQuery as pq
    15. doc = pq(html)
    16. a = doc(".item-0.active a")
    17.  
    18. print(a)
    19. print(a.text())#将上面的选中的class中包围的文字
    1. <a href="link3.html"><span class="boid">third item</span></a>
    2. third item

    获得的结果

  3. 获取HTML

    1. html = '''
    2. <div class="wrap">
    3. <div id="container">
    4. <ul class="list">
    5. <li class="item-0">first item</li>
    6. <li class="item-1"><a href="link2.html">second item</a></li>
    7. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    9. <li class="item-0"><a href="link5.html">fifth item</a></li>
    10. </ul>
    11. </div>
    12. </div>
    13. '''
    14.  
    15. from pyquery import PyQuery as pq
    16. doc = pq(html)
    17. a = doc(".item-0.active")
    18.  
    19. print(a)
    20. print(a.html())
    1. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    2.  
    3. <a href="link3.html"><span class="boid">third item</span></a>

    获得的结果

  • DOM操作

  1. address,removeClass

    1. html = '''
    2. <div class="wrap">
    3. <div id="container">
    4. <ul class="list">
    5. <li class="item-0">first item</li>
    6. <li class="item-1"><a href="link2.html">second item</a></li>
    7. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    9. <li class="item-0"><a href="link5.html">fifth item</a></li>
    10. </ul>
    11. </div>
    12. </div>
    13. '''
    14. from pyquery import PyQuery as pq
    15. doc = pq(html)
    16.  
    17. li = doc(".item-0.active")
    18. print(li)
    19.  
    20. li.removeClass("active")#移除active
    21. print(li)
    22.  
    23. li.addClass("active")#增加active
    24. print(li)
    1. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    2.  
    3. <li class="item-0"><a href="link3.html"><span class="boid">third item</span></a></li>
    4.  
    5. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>

    获得的结果

  2. attr,css

    1. html = '''
    2. <div class="wrap">
    3. <div id="container">
    4. <ul class="list">
    5. <li class="item-0">first item</li>
    6. <li class="item-1"><a href="link2.html">second item</a></li>
    7. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    9. <li class="item-0"><a href="link5.html">fifth item</a></li>
    10. </ul>
    11. </div>
    12. </div>
    13. '''
    14. from pyquery import PyQuery as pq
    15. doc = pq(html)
    16.  
    17. li = doc(".item-0.active")
    18. print(li)
    19.  
    20. li.attr("name","link")#若存在,就会覆盖
    21. print(li)
    22.  
    23. li.css("font-size","14px")#增加style属性
    24. print(li)
    1. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    2.  
    3. <li class="item-0 active" name="link"><a href="link3.html"><span class="boid">third item</span></a></li>
    4.  
    5. <li class="item-0 active" name="link" style="font-size: 14px"><a href="link3.html"><span class="boid">third item</span></a></li>

    获得的结果

  3. remove

    1. html1 = '''
    2. <div class="wrap">
    3. Hello,World
    4. <p>This is a paragraph.</p>
    5. </div>
    6. '''
    7. from pyquery import PyQuery as pq
    8. doc = pq(html1)
    9.  
    10. wrap = doc(".wrap")
    11. print(wrap.text())
    12.  
    13. wrap.find('p').remove()
    14.  
    15. print(wrap.text())
    1. Hello,World
    2. This is a paragraph.
    3. Hello,World

    获得的结果

  4. 其他DOM操作

    1. 其他DOM方法: http://pythonhosted.org/pyquery/
  • 伪类选择器

    1. html = '''
    2. <div class="wrap">
    3. <div id="container">
    4. <ul class="list">
    5. <li class="item-0">first item</li>
    6. <li class="item-1"><a href="link2.html">second item</a></li>
    7. <li class="item-0 active"><a href="link3.html"><span class="boid">third item</span></a></li>
    8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    9. <li class="item-0"><a href="link5.html">fifth item</a></li>
    10. </ul>
    11. </div>
    12. </div>
    13. '''
    14. from pyquery import PyQuery as pq
    15. doc = pq(html)
    16.  
    17. li = doc("li:first-child")#第一个
    18. print(li)
    19.  
    20. li1 = doc('li:last-child')#最后一个
    21. print(li1)
    22.  
    23. li2 = doc('li:nth-child(2)')#指定缩写顺序,第二个
    24. print(li2)
    25.  
    26. li3 = doc("li:gt(2)")#大于2的(从0开始)
    27. print(li3)
    28.  
    29. li4 = doc("li:nth-child(2n)")#偶数
    30. print(li4)
    31.  
    32. li5 = doc("li:contains(second)")#内容包含second
    33. print(li5)
    1. <li class="item-0">first item</li>
    2.  
    3. <li class="item-0"><a href="link5.html">fifth item</a></li>
    4.  
    5. <li class="item-1"><a href="link2.html">second item</a></li>
    6.  
    7. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    8. <li class="item-0"><a href="link5.html">fifth item</a></li>
    9.  
    10. <li class="item-1"><a href="link2.html">second item</a></li>
    11. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    12.  
    13. <li class="item-1"><a href="link2.html">second item</a></li>

    获得的结果

  • 官方文档

PYTHON 爬虫笔记六:PyQuery库基础用法的更多相关文章

  1. PYTHON 爬虫笔记七:Selenium库基础用法

    知识点一:Selenium库详解及其基本使用 什么是Selenium selenium 是一套完整的web应用程序测试系统,包含了测试的录制(selenium IDE),编写及运行(Selenium ...

  2. PYTHON 爬虫笔记五:BeautifulSoup库基础用法

    知识点一:BeautifulSoup库详解及其基本使用方法 什么是BeautifulSoup 灵活又方便的网页解析库,处理高效,支持多种解析器.利用它不用编写正则表达式即可方便实现网页信息的提取库. ...

  3. PYTHON 爬虫笔记三:Requests库的基本使用

    知识点一:Requests的详解及其基本使用方法 什么是requests库 Requests库是用Python编写的,基于urllib,采用Apache2 Licensed开源协议的HTTP库,相比u ...

  4. 芝麻HTTP: Python爬虫利器之Requests库的用法

    前言 之前我们用了 urllib 库,这个作为入门的工具还是不错的,对了解一些爬虫的基本理念,掌握爬虫爬取的流程有所帮助.入门之后,我们就需要学习一些更加高级的内容和工具来方便我们的爬取.那么这一节来 ...

  5. Python爬虫进阶六之多进程的用法

    前言 在上一节中介绍了thread多线程库.python中的多线程其实并不是真正的多线程,并不能做到充分利用多核CPU资源. 如果想要充分利用,在python中大部分情况需要使用多进程,那么这个包就叫 ...

  6. python爬虫笔记----4.Selenium库(自动化库)

    4.Selenium库 (自动化测试工具,支持多种浏览器,爬虫主要解决js渲染的问题) pip install selenium 基本使用 from selenium import webdriver ...

  7. PYTHON 爬虫笔记二:Urllib库基本使用

    知识点一:urllib的详解及基本使用方法 一.基本介绍 urllib是python的一个获取url(Uniform Resource Locators,统一资源定址器)了,我们可以利用它来抓取远程的 ...

  8. Python爬虫利器六之PyQuery的用法

    前言 你是否觉得 XPath 的用法多少有点晦涩难记呢? 你是否觉得 BeautifulSoup 的语法多少有些悭吝难懂呢? 你是否甚至还在苦苦研究正则表达式却因为少些了一个点而抓狂呢? 你是否已经有 ...

  9. Python 爬虫十六式 - 第六式:JQuery的假兄弟-pyquery

    PyQuery:一个类似jquery的python库 学习一时爽,一直学习一直爽   Hello,大家好,我是 Connor,一个从无到有的技术小白.上一次我们说到了 BeautifulSoup 美味 ...

随机推荐

  1. Netty通过心跳保持长链接

    Netty自带心跳检测功能,IdleStateHandler,客户端在写空闲时主动发起心跳请求,服务器接受到心跳请求后给出一个心跳响应.当客户端在一定时间范围内不能够给出响应则断开链接. public ...

  2. Bag标签之校验

    校验输入的内容是不是正确(校验整数.小数.字母.汉字或日文.username.XML节点名.日期.邮件及自己定义) 使用方法: <Bagid=书包名 act=verify> <wen ...

  3. linux 之体验(JDK7+Tomcat7+MySQL5.5)部署环境

    ---------------------------------------------------------------------------------------------------- ...

  4. 策略模式(headfirst设计模式学习笔记)

    鸭子的行为被封装 进入一组类中,能够轻易的扩展和改变.假设须要能够执行时改变行为! 策略模式定义了算法族.分别封装起来.让他们能够相互替换,此模式让算法的变化独立于使用算法的客户. 继承,相似之处用继 ...

  5. c语言字符数组的初始化问题

    1.字符数组的定义与初始化 字符数组的初始化,最容易理解的方式就是逐个字符赋给数组中各元素. char str[10]={ 'I',' ','a','m',' ',‘h’,'a','p','p','y ...

  6. Win8.1离线安装.Net Framework 3.5

     在线安装太慢了! 只要一个命令搞掂 不希望使用Internet连接,可以使用DISM (部署映像服务和管理工具)离线部署 .NET Framework 3.5   1. Win+X选择命令提示符(管 ...

  7. Oracle 唯一 索引 约束 创建 删除

    http://www.blogjava.net/lukangping/articles/340683.html/*给创建bitmap index分配的内存空间参数,以加速建索引*/ show para ...

  8. c++实现二叉搜索树

    自己实现了一下二叉搜索树的数据结构.记录一下: #include <iostream> using namespace std; struct TreeNode{ int val; Tre ...

  9. jQuery 标签切换----之选项卡的实现

    这一次,我自己写了代码,先看html部分: <div class="tab"> <div class="tab_menu"> <u ...

  10. Nginx性能测试

    环境:Centos 7.0  Nginx 1.6.2 测试工具:siege 3.0.7 配置1: I3-3110M 4G 测试1(100用户,1000请求): 平均响应:0.06s 并发数:59.19 ...