# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") # 格式化输出 soup 对象的内容
The Dormouse's story
<p class="title" name="dromouse">
The Dormouse's story
<p class="story">
Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">
<!-- Elsie -->
<a class="sister" href="http://example.com/lacie" id="link2">
<a class="sister" href="http://example.com/tillie" id="link3">
and they lived at the bottom of a well.
<p class="story">
- Tag
- NavigableString
- BeautifulSoup
- Comment
Tag 通俗点讲就是HTML中的一个个标签,例如:
<head><title>The Dormouse's story</title></head>
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
上面title head a p 等等HTML标签加上里面包括的内容就是Tag,那么试着使用BeautifulSoup来获取Tags:
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") # # 打印title标签
print(soup.title) # 打印head标签
print(soup.head) # 打印a标签
print(soup.a) # 打印p标签
print(soup.p) # 打印soup.p的类型
<title>The Dormouse's story</title>
<head><title>The Dormouse's story</title></head>
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<class 'bs4.element.Tag'>
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") # soup对象比较特殊,它的name为[document]
print(soup.name) # 对于其他内部标签,输出的值便为标签本身的名称
print(soup.head.name) # 打印p标签的所有属性,其类型是一个字典
print(soup.p.attrs) # 打印p标签的class属性
# 还可以利用get方法获取属性,传入属性的名称,与上面的方法等价
print(soup.p.get('class')) print(soup.p) # 修改属性
soup.p['class'] = "newClass"
print(soup.p) # 删除属性
del soup.p['class']
{'class': ['title'], 'name': 'dromouse'}
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="newClass" name="dromouse"><b>The Dormouse's story</b></p>
<p name="dromouse"><b>The Dormouse's story</b></p>
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") # 打印p标签的内容
print(soup.p.string) # 打印soup.p.string的类型
The Dormouse's story
<class 'bs4.element.NavigableString'>
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") # 类型
print(type(soup.name)) # 名称
print(soup.name) # 属性
<class 'str'>
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") print(soup.a) print(soup.a.string) print(type(soup.a.string))
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>
<class 'bs4.element.Comment'>
1.直接子节点:.contents .children属性
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") # 输出方式为列表
print(soup.head.contents) print(soup.head.contents[0])
[<title>The Dormouse's story</title>]
<title>The Dormouse's story</title>
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") # 输出方式为列表生成器对象
print(soup.head.children) # 通过遍历获取所有子节点
for child in soup.head.children:
<list_iterator object at 0x008FF950>
<title>The Dormouse's story</title>
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") # 输出方式为列表生成器对象
print(soup.head.descendants) # 通过遍历获取所有子孙节点
for child in soup.head.descendants:
<generator object descendants at 0x00519AB0>
<title>The Dormouse's story</title>
The Dormouse's story
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") print(soup.head.string) print(soup.head.title.string)
The Dormouse's story
The Dormouse's story
1.find_all(name, attrs, recursive, text, **kwargs)
最简单的过滤器就是字符串,在搜索方法中传入一个字符串参数,Beautiful Soup会查找与字符串完整匹配所有的内容,返回一个列表。
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") print(soup.find_all("b")) print(soup.find_all("a"))
[<b>The Dormouse's story</b>]
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
如果传入正则表达式作为参数,Beautiful Soup会通过正则表达式match()来匹配内容
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup
import re html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") for tag in soup.find_all(re.compile("^b")):
如果传入列表参数,Beautiful Soup会将与列表中任一元素匹配的内容以列表方式返回
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") print(soup.find_all(['a', 'b']))
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") print(soup.find_all(id="link1"))
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup
import re html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") # 字符串
print(soup.find_all(text = " Elsie ")) # 列表
print(soup.find_all(text = ["Tillie", " Elsie ", "Lacie"])) # 正则表达式
print(soup.find_all(text = re.compile("Dormouse")))
[' Elsie ']
[' Elsie ', 'Lacie', 'Tillie']
["The Dormouse's story", "The Dormouse's story"]
- 写CSS时,标签名不加任何修饰,类名前加.,id名前加#
- 在这里我们也可以利用类似的方法来筛选元素,用到的方法是soup.select(),返回的类型是list
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") print(soup.select("title")) print(soup.select("b")) print(soup.select("a"))
[<title>The Dormouse's story</title>]
[<b>The Dormouse's story</b>]
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") print(soup.select(".title"))
[<p class="title" name="dromouse"><b>The Dormouse's story</b></p>]
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") print(soup.select("#link1"))
[<p class="title" name="dromouse"><b>The Dormouse's story</b></p>]
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") print(soup.select("p #link1"))
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") print(soup.select("a[class='sister']"))
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") print(soup.select("p a[class='sister']"))
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
# -*- coding:utf-8 -*-
__author__ = 'mayi' from bs4 import BeautifulSoup html = """
<html><head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
""" # 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml") print(soup.select("p a[class='sister']")) for item in soup.select("p a[class='sister']"):
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] Lacie
注意:<!-- Elsie -->为注释内容,未输出
- 爬虫笔记(四)------关于BeautifulSoup4解析器与编码
前言:本机环境配置:ubuntu 14.10,python 2.7,BeautifulSoup4 一.解析器概述 如同前几章笔记,当我们输入: soup=BeautifulSoup(response. ...
- Python爬虫开发【第1篇】【beautifulSoup4解析器】
CSS 选择器:BeautifulSoup4 Beautiful Soup 也是一个HTML/XML的解析器,主要的功能也是如何解析和提取 HTML/XML 数据. pip 安装:pip instal ...
- 爬虫中BeautifulSoup4解析器
CSS 选择器:BeautifulSoup4 和 lxml 一样,Beautiful Soup 也是一个HTML/XML的解析器,主要的功能也是如何解析和提取 HTML/XML 数据. lxml 只会 ...
- 关于BeautifulSoup4 解析器的说明
一.解析器概述 如同前几章笔记,当我们输入: soup=BeautifulSoup(response.body) 对网页进行析取时,并未规定解析器,此时使用的是python内部默认的解析器“html. ...
- Python爬虫beautifulsoup4常用的解析方法总结
摘要 如何用beautifulsoup4解析各种情况的网页 beautifulsoup4的使用 关于beautifulsoup4,官网已经讲的很详细了,我这里就把一些常用的解析方法做个总结,方便查阅. ...
- Python爬虫beautifulsoup4常用的解析方法总结(新手必看)
今天小编就为大家分享一篇关于Python爬虫beautifulsoup4常用的解析方法总结,小编觉得内容挺不错的,现在分享给大家,具有很好的参考价值,需要的朋友一起跟随小编来看看吧摘要 如何用beau ...
- python爬虫主要就是五个模块:爬虫启动入口模块,URL管理器存放已经爬虫的URL和待爬虫URL列表,html下载器,html解析器,html输出器 同时可以掌握到urllib2的使用、bs4(BeautifulSoup)页面解析器、re正则表达式、urlparse、python基础知识回顾(set集合操作)等相关内容。
本次python爬虫百步百科,里面详细分析了爬虫的步骤,对每一步代码都有详细的注释说明,可通过本案例掌握python爬虫的特点: 1.爬虫调度入口(crawler_main.py) # coding: ...
- Python爬虫(十四)_BeautifulSoup4 解析器
CSS选择器:BeautifulSoup4 和lxml一样,Beautiful Soup也是一个HTML/XML的解析器,主要的功能也是如何解析和提取HTML/XML数据. lxml只会局部遍历,而B ...
- Python HTML解析器BeautifulSoup(爬虫解析器)
BeautifulSoup简介 我们知道,Python拥有出色的内置HTML解析器模块——HTMLParser,然而还有一个功能更为强大的HTML或XML解析工具——BeautifulSoup(美味的 ...
- CSS动画效果
CSS变形效果 Transform translate:平移 translate(x,y) translateX(x) translateY(y)相对于元素原始位置平移. scale:缩放 大于1放大 ...
- axios 发 post 请求,后端接收不到参数的解决方案
问题场景 场景很简单,就是一个正常 axios post 请求: axios({ headers: { 'deviceCode': 'A95ZEF1-47B5-AC90BF3' }, method: ...
- Angular选项卡
前几天我发的东西,可能对于没有基础的人很难理解,那么今天,咱们就发点简单点的东西吧! Angular显示隐藏,选项卡! 还是那句话,话不多说,上代码: <!DOCTYPE html> &l ...
- STROME --realtime & online parallel computing
Data Collections ---> Stream to Channel (as source input) ----> Parallel Computing---> Resu ...
- Eclipse Common API
Platform runtime Defines the extension point and plug-in model. It dynamically discovers plug-ins an ...
- freebsd mount linprocfs
mount用来做什么? to prepare and graft a special device or the remote node(rhost:path) on to the file syst ...
- [转]查找问题的利器 - Git Bisect
转自:http://gitbook.liuhui998.com/5_4.html 假设你在项目的'2.6.18'版上面工作, 但是你当前的代码(master)崩溃(crash)了. 有时解决这种问题的 ...
- MySQL联合索引最左匹配范例
MySQL联合索引最左匹配范例 参考文章:http://blog.jobbole.com/24006/ 创建示例表. 示例表来自MySQL官方文档: https://dev.mysql.com/doc ...
- pthread使用
https://developer.apple.com/library/content/documentation/Cocoa/Conceptual/Multithreading/CreatingTh ...
- HDU 3625 第一类斯特林数
题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=3625 题意: n个房间,房间里面放着钥匙,允许破门而入k个,拿到房间里面的钥匙后可以打开对应的门,但是 ...