BeautifulSoup select方法

 1 html = """

 2 <html><head><title>The Dormouse's story</title></head>

 3 <body>

 4 <p class="title" name="dromouse"><b>The Dormouse's story</b></p>

 5 <p class="story">Once upon a time there were three little sisters; and their names were

 6 <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,

 7 <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and

 8 <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;

 9 and they lived at the bottom of a well.</p>

10 <p class="story">...</p>

11 """

我们在写 CSS 时，标签名不加任何修饰，类名前加点，id名前加 #，在这里我们也可以利用类似的方法来筛选元素，用到的方法是 soup.select()，返回类型是 list
（1）通过标签名查找

print soup.select('title')

#[<title>The Dormouse's story</title>]

print soup.select('a')

#[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

print soup.select('b')

#[<b>The Dormouse's story</b>]

（2）通过类名查找

print soup.select('.sister')

#[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

（3）通过 id 名查找

print soup.select('#link1')

#[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]

（4）组合查找

组合查找即和写 class 文件时，标签名与类名、id名进行的组合原理是一样的，例如查找 p 标签中，id 等于 link1的内容，二者需要用空格分开

print soup.select('p #link1')

#[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]

直接子标签查找

print soup.select("head > title")

#[<title>The Dormouse's story</title>]

（5）属性查找

查找时还可以加入属性元素，属性需要用中括号括起来，注意属性和标签属于同一节点，所以中间不能加空格，否则会无法匹配到。

print soup.select("head > title")

#[<title>The Dormouse's story</title>]

print soup.select('a[href="http://example.com/elsie"]')

#[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]

同样，属性仍然可以与上述查找方式组合，不在同一节点的空格隔开，同一节点的不加空格

print soup.select('p a[href="http://example.com/elsie"]')

#[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]

示例代码：

from bs4 import BeautifulSoup

import requests

#定义58同城上杭州区域的起始页面

start_url = 'http://hz.58.com/sale.shtml'

url_host = 'http://hz.58.com'

def get_index_url(url):

    wb_data = requests.get(start_url)

    soup = BeautifulSoup(wb_data.text,'lxml')

    links = soup.select('ul.ym-mainmnu > li > span > a["href"]')

    print(links)

    for link in links:

        page_url = url_host + str(link.get('href'))

        print(page_url)

get_index_url(start_url)

运行结果：

C:\Users\licl11092\AppData\Local\Programs\Python\Python35\python.exe D:/Spider/58spider/channel_extact.py

[<a href="/shouji/">手机</a>, <a href="/tongxunyw/">通讯</a>, <a href="/danche/">摩托车</a>, <a href="/diandongche/">电动车</a>, <a href="/diannao/">电脑</a>, <a href="/shuma/">数码</a>, <a href="/jiadian/">家电</a>, <a href="/ershoujiaju/">家具</a>, <a href="/yingyou/">母婴玩具</a>, <a href="/fushi/">服装箱包</a>, <a href="/meirong/">美容保健</a>, <a href="/yishu/">艺术收藏</a>, <a href="/tushu/">图书音像</a>, <a href="/wenti/">文体户外</a>, <a href="/bangong/">办公设备</a>, <a href="/shebei.shtml">二手设备</a>, <a href="/chengren/" onclick="clickLog('from=pc_index_loucengdb_ershoujiaoyi_gongcheng')">成人用品</a>]

http://hz.58.com/shouji/

http://hz.58.com/tongxunyw/

http://hz.58.com/danche/

http://hz.58.com/diandongche/

http://hz.58.com/diannao/

http://hz.58.com/shuma/

http://hz.58.com/jiadian/

http://hz.58.com/ershoujiaju/

http://hz.58.com/yingyou/

http://hz.58.com/fushi/

http://hz.58.com/meirong/

http://hz.58.com/yishu/

http://hz.58.com/tushu/

http://hz.58.com/wenti/

http://hz.58.com/bangong/

http://hz.58.com/shebei.shtml

http://hz.58.com/chengren/

Process finished with exit code 0

BeautifulSoup select方法的更多相关文章

第14.12节 Python中使用BeautifulSoup解析http报文：使用select方法快速定位内容
一. 引言在<第14.10节 Python中使用BeautifulSoup解析http报文:html标签相关属性的访问>和<第14.11节 Python中使用BeautifulSo ...
C# DataTable的Select()方法不支持 != 判断
异常描述: 用户代码未处理 System.Data.SyntaxErrorException HResult=-2146232032 Message=无法解释位置 23 的标记“!”. Source= ...
[c#基础]DataTable的Select方法
引言可以说DataTable存放数据的一个离线数据库,将数据一下加载到内存,而DataReader是在线查询,而且只进形式的查询,如果后退一步,就不可能了,DataTable操作非常方便,但也有缺点 ...
HTML DOM select() 方法
定义和用法 select() 方法用于选择该元素中的文本. 语法 textareaObject.select() 实例下面的例子可选择文本框中的文本: <html> <head&g ...
Thinkphp中的volist标签（查询数据集（select方法）的结果输出）用法简介
参考网址:http://camnpr.com/archives/1515.html 通常volist标签多用于查询数据集(select方法)的结果输出,通常模型的select方法返回的结果是一个二维数 ...
getField()和select()方法的区别
在ThinkPHP中,查询数据库是必不可少的操作. 那么,getField()方法和select()方法都是查询的方法,到底有什么不同呢? 案例来说明: A.select()方法例子1 $acces ...
input和textarea标签的select()方法----选中文本框中的所有文本
JavaScript select()方法选中文本框中的所有文本 <input>和<textarea>两种文本框都支持select()方法,这个方法用于选择文本框中的所有文本 ...
【jQuery源码】select方法
/** * select方法是Sizzle选择器包的核心方法之一,其主要完成下列任务: * 1.调用tokenize方法完成对选择器的解析 * 2.对于没有初始集合(即seed没有赋值)且是单一块选择 ...
关于DataTable.Select方法偶尔无法正确查到数据的处理方法
项目中经常用DataTable在内存中存储并操作数据,在进行报表开发的时候,报表的各种过滤功能用这个内存表可以大现身手,但最近在使用过程中却遇到一个奇怪的现象,现将该问题及处理方法记录一下.这是在做护 ...

随机推荐

图像分割论文 | DRN膨胀残差网络 | CVPR2017
文章转自:同作者个人微信公众号[机器学习炼丹术].欢迎交流沟通,共同进步,作者微信:cyx645016617 论文名称:'Dilated Residual Networks' 论文链接:https:/ ...
wmic 查看主板信息
查看主板信息的一个命令:wmic baseboard get 当然在命令提示符里查看,真的很费劲,所以我们将命令格式化一下:wmic baseboard get /format:HFORM >c ...
使用gui_upload的总结
今天使用gui_upload函数将文本文件的内容读取到内表.出现了一个问题,总是程序宕掉,出项的提示是 Type conflict when calling a function module. 原来 ...
MySQL库和表的操作
MySQL库和表的操作库操作创建库 1.1 语法 CREATE DATABASE 数据库名 charset utf8; 1.2 数据库命名规则可以由字母.数字.下划线.@.#.＄区分大小写唯 ...
uni-app开发经验分享二： uni-app生命周期记录
应用生命周期(仅可在App.vue中监听) 页面生命周期(在页面中添加) 当页面中需要用到下拉刷新功能时,打开pages.json,在"globalStyle"里设置"e ...
从JAVA内存到垃圾回收，带你深入理解JVM
摘要:学过Java的程序员对JVM应该并不陌生,如果你没有听过,没关系今天我带你走进JVM的世界.程序员为什么要学习JVM呢,其实不懂JVM也可以照样写出优质的代码,但是不懂JVM有可能别被面试官虐得 ...
【Android初级】如何动态添加菜单项（附源码+避坑）
我们平时在开发过程中,为了灵活多变,除了使用静态的菜单,还有动态添加菜单的需求.今天要分享的功能如下: 在界面的右上角有个更多选项,点开后,有两个子菜单:关于和退出点击"关于", ...
为什么MySQL索引使用B+树
为什么MySQL索引使用B+树聚簇索引与非聚簇索引不同的存储引擎,数据文件和索引文件位置是不同的,但是都是在磁盘上而不是内存上,根据索引文件.数据文件是否放在一起而有了分类: 聚簇索引:数据文件和 ...
可视化Go内存管理
小结: 1. Go不需要VM,Go应用程序二进制文件中嵌入了一个小型运行时(Go runtime),可以处理诸如垃圾收集(GC),调度和并发之类的语言功能 Go does not need a VM ...
MFA
什么是 MFA?_启用和解绑MFA_账号常见问题_账号管理-阿里云 https://help.aliyun.com/knowledge_detail/37215.html

BeautifulSoup select方法

BeautifulSoup select方法的更多相关文章

随机推荐

热门专题