0.安装:pip3 install pyquery

1.初始化

1.字符串初始化

  1. # 字符串初始化
  2. html = """
  3. <div>
  4. <ul>
  5. <li class="item-0">first item</li>
  6. <li class="item-1"><a href="link2.html">second item</a></li>
  7. <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
  8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
  9. <li class="item-0"><a href="link5.html">fifth item</a></li>
  10. </ul>
  11. </div>
  12. """
  13. from pyquery import PyQuery as pq
  14. doc = pq(html)
  15. print(doc('li'))

2.URL初始化

  1. from pyquery import PyQuery as pq
  2. doc = pq(url='http://www.baidu.com')
  3. print(doc('head'))

3.文件初始化

  1. from pyquery import PyQuery as pq
  2. doc = pq(filename='demo.html')
  3. print(doc('li'))

2.基本CSS选择器

  1. html = """
  2. <div id="container">
  3. <ul class='list'>
  4. <li class="item-0">first item</li>
  5. <li class="item-1"><a href="link2.html">second item</a></li>
  6. <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
  7. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
  8. <li class="item-0"><a href="link5.html">fifth item</a></li>
  9. </ul>
  10. </div>
  11. """
  12. from pyquery import PyQuery as pq
  13. doc = pq(html)
  14. print(doc('#container .list li'))

3.查找元素

1.子元素

  1. html = """
  2. <div id="container">
  3. <ul class='list'>
  4. <li class="item-0">first item</li>
  5. <li class="item-1"><a href="link2.html">second item</a></li>
  6. <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
  7. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
  8. <li class="item-0"><a href="link5.html">fifth item</a></li>
  9. </ul>
  10. </div>
  11. """
  12. from pyquery import PyQuery as pq
  13. doc = pq(html)
  14. items = doc('.list')
  15. print(items)
  16. # 查询li标签
  17. lis = items.find('li')
  18. print(lis)
  19. # 查询孩子
  20. lis = items.children()
  21. print(type(lis))
  22. print(lis)
  23. # 查询带有'.active'的孩子
  24. lis = items.children('.active')
  25. print(lis)

2.父元素

  1. html = """
  2. <div class='wrap'>
  3. <div id="container">
  4. <ul class='list'>
  5. <li class="item-0">first item</li>
  6. <li class="item-1"><a href="link2.html">second item</a></li>
  7. <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
  8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
  9. <li class="item-0"><a href="link5.html">fifth item</a></li>
  10. </ul>
  11. </div>
  12. </div>
  13. """
  14. from pyquery import PyQuery as pq
  15. doc = pq(html)
  16. items = doc('.list')
  17. print(items)
  18. parents = items.parents()
  19. print(parents)
  20. parent = items.parents('.wrap')
  21. print(parent)

3.兄弟元素

  1. html = """
  2. <div class='wrap'>
  3. <div id="container">
  4. <ul class='list'>
  5. <li class="item-0">first item</li>
  6. <li class="item-1"><a href="link2.html">second item</a></li>
  7. <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
  8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
  9. <li class="item-0"><a href="link5.html">fifth item</a></li>
  10. </ul>
  11. </div>
  12. </div>
  13. """
  14. from pyquery import PyQuery as pq
  15. doc = pq(html)
  16. items = doc('.list .item-0.active')
  17. print(items)
  18. print(items.siblings())
  19. print(items.siblings('.active'))

4.遍历

  1. html = """
  2. <div class='wrap'>
  3. <div id="container">
  4. <ul class='list'>
  5. <li class="item-0">first item</li>
  6. <li class="item-1"><a href="link2.html">second item</a></li>
  7. <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
  8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
  9. <li class="item-0"><a href="link5.html">fifth item</a></li>
  10. </ul>
  11. </div>
  12. </div>
  13. """
  14. from pyquery import PyQuery as pq
  15. doc = pq(html)
  16. # 单个元素
  17. items = doc('.item-0.active')
  18. print(items)
  19. # 遍历元素
  20. lis = doc('li').items()
  21. print(lis)
  22. # <generator object PyQuery.items at 0x0000000003A84468>
  23. for item in lis:
  24. print(item)

5.获取信息

1.获取属性

  1. html = """
  2. <div class='wrap'>
  3. <div id="container">
  4. <ul class='list'>
  5. <li class="item-0">first item</li>
  6. <li class="item-1"><a href="link2.html">second item</a></li>
  7. <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
  8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
  9. <li class="item-0"><a href="link5.html">fifth item</a></li>
  10. </ul>
  11. </div>
  12. </div>
  13. """
  14. from pyquery import PyQuery as pq
  15. doc = pq(html)
  16. a = doc('.item-0.active a')
  17. print(a)
  18. print(a.attr('href'))
  19. print(a.attr.href)

2.获取文本

  1. html = """
  2. <div class='wrap'>
  3. <div id="container">
  4. <ul class='list'>
  5. <li class="item-0">first item</li>
  6. <li class="item-1"><a href="link2.html">second item</a></li>
  7. <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
  8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
  9. <li class="item-0"><a href="link5.html">fifth item</a></li>
  10. </ul>
  11. </div>
  12. </div>
  13. """
  14. from pyquery import PyQuery as pq
  15. doc = pq(html)
  16. a = doc('.item-0.active a')
  17. print(a)
  18. print(a.text())

3.获取HTML

  1. html = """
  2. <div class='wrap'>
  3. <div id="container">
  4. <ul class='list'>
  5. <li class="item-0">first item</li>
  6. <li class="item-1"><a href="link2.html">second item</a></li>
  7. <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
  8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
  9. <li class="item-0"><a href="link5.html">fifth item</a></li>
  10. </ul>
  11. </div>
  12. </div>
  13. """
  14. from pyquery import PyQuery as pq
  15. doc = pq(html)
  16. li = doc('.item-0.active')
  17. print(li)
  18. print(li.html())

6.DOM操作

1.addclass、removeclass

  1. html = """
  2. <div class='wrap'>
  3. <div id="container">
  4. <ul class='list'>
  5. <li class="item-0">first item</li>
  6. <li class="item-1"><a href="link2.html">second item</a></li>
  7. <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
  8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
  9. <li class="item-0"><a href="link5.html">fifth item</a></li>
  10. </ul>
  11. </div>
  12. </div>
  13. """
  14. from pyquery import PyQuery as pq
  15. doc = pq(html)
  16. li = doc('.item-0.active')
  17. print(li)
  18. li.removeClass('active')
  19. print(li)
  20. li.addClass('active')
  21. print(li)

2. attr、css

  1. html = """
  2. <div class='wrap'>
  3. <div id="container">
  4. <ul class='list'>
  5. <li class="item-0">first item</li>
  6. <li class="item-1"><a href="link2.html">second item</a></li>
  7. <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
  8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
  9. <li class="item-0"><a href="link5.html">fifth item</a></li>
  10. </ul>
  11. </div>
  12. </div>
  13. """
  14. from pyquery import PyQuery as pq
  15. doc = pq(html)
  16. li = doc('.item-0.active')
  17. print(li)
  18. li.attr('name','apollo')
  19. print(li)
  20. li.css('font-size','14px')
  21. print(li)

3. remove

  1. html = """
  2. <div class='wrap'>
  3. Hello World
  4. <p>This is paragraph</p>
  5. </div>
  6. """
  7. from pyquery import PyQuery as pq
  8. doc = pq(html)
  9. wrap = doc('.wrap')
  10. print('移除前:',wrap.text())
  11. wrap.find('p').remove()
  12. print('移除后:',wrap.text())

4.其他DOM方法

  1. https://pyquery.readthedocs.io/en/latest/api.html

7.伪类选择器


  1. html = """
  2. <div class='wrap'>
  3. <div id="container">
  4. <ul class='list'>
  5. <li class="item-0">first item</li>
  6. <li class="item-1"><a href="link2.html">second item</a></li>
  7. <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
  8. <li class="item-1 active"><a href="link4.html">fourth item</a></li>
  9. <li class="item-0"><a href="link5.html">fifth item</a></li>
  10. </ul>
  11. </div>
  12. </div>
  13. """
  14. from pyquery import PyQuery as pq
  15. doc = pq(html)
  16. li = doc('li:first-child')
  17. print(li)
  18. li = doc('li:last-child')
  19. print(li)
  20. li = doc('li:nth-child(2)')
  21. print(li)
  22. li = doc('li:gt(2)')
  23. print(li)
  24. li = doc('li:nth-child(2n)')
  25. print(li)
  26. li = doc('li:contains(second)')
  27. print(li)

官网

https://pyquery.readthedocs.io

PyQuery的基本使用详解的更多相关文章

  1. PyQuery详解

    1.What is Pyquery? 答:灵活强大的网页解析库 2.安装: pip3 install pyquery 3.基本使用 初始化操作: 前言:在介绍之前小伙伴们我们先来了解下CSS的基本语法 ...

  2. python爬虫知识点详解

    python爬虫知识点总结(一)库的安装 python爬虫知识点总结(二)爬虫的基本原理 python爬虫知识点总结(三)urllib库详解 python爬虫知识点总结(四)Requests库的基本使 ...

  3. Linq之旅:Linq入门详解(Linq to Objects)

    示例代码下载:Linq之旅:Linq入门详解(Linq to Objects) 本博文详细介绍 .NET 3.5 中引入的重要功能:Language Integrated Query(LINQ,语言集 ...

  4. 架构设计:远程调用服务架构设计及zookeeper技术详解(下篇)

    一.下篇开头的废话 终于开写下篇了,这也是我写远程调用框架的第三篇文章,前两篇都被博客园作为[编辑推荐]的文章,很兴奋哦,嘿嘿~~~~,本人是个很臭美的人,一定得要截图为证: 今天是2014年的第一天 ...

  5. EntityFramework Core 1.1 Add、Attach、Update、Remove方法如何高效使用详解

    前言 我比较喜欢安静,大概和我喜欢研究和琢磨技术原因相关吧,刚好到了元旦节,这几天可以好好学习下EF Core,同时在项目当中用到EF Core,借此机会给予比较深入的理解,这里我们只讲解和EF 6. ...

  6. Java 字符串格式化详解

    Java 字符串格式化详解 版权声明:本文为博主原创文章,未经博主允许不得转载. 微博:厉圣杰 文中如有纰漏,欢迎大家留言指出. 在 Java 的 String 类中,可以使用 format() 方法 ...

  7. Android Notification 详解(一)——基本操作

    Android Notification 详解(一)--基本操作 版权声明:本文为博主原创文章,未经博主允许不得转载. 微博:厉圣杰 源码:AndroidDemo/Notification 文中如有纰 ...

  8. Android Notification 详解——基本操作

    Android Notification 详解 版权声明:本文为博主原创文章,未经博主允许不得转载. 前几天项目中有用到 Android 通知相关的内容,索性把 Android Notificatio ...

  9. Git初探--笔记整理和Git命令详解

    几个重要的概念 首先先明确几个概念: WorkPlace : 工作区 Index: 暂存区 Repository: 本地仓库/版本库 Remote: 远程仓库 当在Remote(如Github)上面c ...

随机推荐

  1. Thrall’s Dream 第四届山东省省赛 (直接暴力DFS)

    题目链接:题目 AC代码: #include<iostream> #include<algorithm> #include<vector> #include< ...

  2. mysql 求季度产量平均值

    表名:product 表结构: 表数据: 如果使用日期查询的话:sql: SELECT QUARTER(create_time) AS '季度',AVG(seller) AS '平均值' FROM p ...

  3. FlashBuilder 4.6序列号破解

    1424-4827-8874-7387-0243-7331 1424-4938-3077-5736-3940-5640 具体步骤如下: 1.到Adobe官网下载FlashBuilder 4.6,有简体 ...

  4. Oracle 计算两个时间的差值

    有两个日期数据START_DATE,END_DATE,欲得到这两个日期的时间差(以天,小时,分钟,秒,毫秒):天:ROUND(TO_NUMBER(END_DATE - START_DATE))小时:R ...

  5. vb 定时执行php程序

    托盘模块 Option Explicit Public Const NIF_ICON = &H2 Public Const NIF_MESSAGE = &H1 Public Const ...

  6. node js 读取mysql

    1.新版node自带npm 2.下载npm不需要node命令 3.懒得配环境变量.直接把生成的npm复制到报错目录,再把mysql模块复制回来 var mysql = require('mysql') ...

  7. String, StringBuffer StringBuilder的区别。

    解答:String的长度是不可变的: StringBuffer的长度是可变的,如果你对字符串中的内容经常进行操作,特别是内容要修改时,那么使用StringBuffer,如果最后需要String,那么使 ...

  8. EF性能分析(一):动态SQL性能差.从OrderBy开始分析

    1. 问题背景 在我的力推下,部门业务开发转向ABP,其中ORM采用的是EntityFrameworkCore. 然而,在数据查询方面,出现了重大的性能问题... 请看代码: //在一个百万数据量的表 ...

  9. Axure9 v9.0.0.3629 ~ v9.0.0.3633 授权密钥 【2019.02.05】

    现在提供一个支持v9.0.0.3629.v9.0.0.3630.v9.0.0.3631.v9.0.0.3632.v9.0.0.3633的授权码(后续的Beta更新版本应该能继续使用) 被授权人:zd4 ...

  10. Dynamics CRM 系统自己定义部分的语言翻译

    Dynamics CRM 自带语言切换功能,在官网下载所需语言包安装后,在设置语言中就能看到你所加入的语言.勾选要启用的语言应用就可以.再打开系统设置--语言就能看到可更改用户界面语言的显示了. wa ...