Python -- BeautifulSoup的学习使用
BeautifulSoup4.3 的使用
下载和安装
# 下载 http://www.crummy.com/software/BeautifulSoup/bs4/download/ # 解压后 使用root执行 # python setup.py install # 最后 在python中测试是否成功 >>> import bs4
简单使用:
供练习的 Html Document
html_doc = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """
>>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup(html_doc) # 知识点1: 打印漂亮的html soup.prettify() >>> print(soup.prettify()) <html> <head> <title> The Dormouse's story </title> </p> </body> </html> # 知识点2 解析获取html标签 >>> soup.title <title>The Dormouse's story</title> >>> soup.title.name 'title' >>> soup.title.string u"The Dormouse's story" >>> soup.title.parent.name 'head' >>> soup.p <p class="title"><b>The Dormouse's story</b></p> >>> soup.p['class'] ['title'] >>> soup.a <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> >>> soup.find_all('a') [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] >>> soup.find(id='link3') <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> 总结: >>> soup.title # 获取第一个title标签 >>> soup.title.name # 获取标签名 --> title >>> soup.title.string # 获取标签内容 >>> soup.title.parent.name # 获取title标签的父标签 >>> soup.p['class'] # 获取第一个p标签的class属性的值 >>> soup.find_all('a') #获取所有a标签 >>> soup.find(id='link3') #获取第一个id的值为link3的标签 # 知识点3 获取所有超链接 >>> for link in soup.find_all('a'): ... print(link.get('href')) ... http://example.com/elsie http://example.com/lacie http://example.com/tillie # 知识点4 获取所有文本 >>> print(soup.get_text()) The Dormouse's story Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.
BeautifulSoup的四大对象 Tag, NavigableString, BeautifulSoup, and Comment.
Tag
>>> soup = BeautifulSoup('<b class="boldest">Extremely bold</b>') >>> tag = soup.b >>> type(tag) <class 'bs4.element.Tag'> >>> tag.name 'b' >>> tag.name = 'blockquote' >>> tag <blockquote class="boldest">Extremely bold</blockquote> >>> tag['class'] ['boldest'] >>> tag.attrs {'class': ['boldest']} >>> tag['class'] = 'verybold' >>> tag['id'] = 1 >>> tag <blockquote class="verybold" id="1">Extremely bold</blockquote> >>> del tag['class'] >>> del tag['id'] >>> tag <blockquote>Extremely bold</blockquote> >>> print(tag.get('class')) None >>> css_soup = BeautifulSoup('<p class="body strikeout"></p>') >>> css_soup.p['class'] ['body', 'strikeout'] >>> css_soup = BeautifulSoup('<p class="body"></p>') >>> css_soup.p['class'] ['body'] >>> id_soup = BeautifulSoup('<p id="my id"></p>') >>> id_soup.p['id'] 'my id' >>> rel_soup = BeautifulSoup('<p>Back to the <a rel="index">homepage</a>') >>> rel_soup.a['rel'] ['index'] >>> rel_soup.a['rel'] = ['index', 'contents'] >>> print(rel_soup.p) <p>Back to the <a rel="index contents">homepage</a></p> 总结: tag['class'] # 获取tag标签class属性的值 tag.attrs # 获取tag标签所有属性 del tag['class'] # 删除tag标签的class属性 关于多值属性 >>> css_soup = BeautifulSoup('<p class="body strikeout"></p>') >>> css_soup.p['class'] ['body', 'strikeout'] >>> id_soup = BeautifulSoup('<p id="my id"></p>') >>> id_soup.p['id'] 'my id' 总结: BeatifulSoup对于允许多值的属性 返回list, 对于不是多值的属性, 就只放回str
NavigableString -- 和String差不多
>>> tag.string u'Extremely bold' >>> type(tag.string) <class 'bs4.element.NavigableString'> >>> unicode_string = unicode(tag.string) >>> unicode_string u'Extremely bold' >>> type(unicode_string) <type 'unicode'> >>> tag.string.replace_with('No loger bold') u'Extremely bold' >>> tag <blockquote>No loger bold</blockquote> 总结: 1. NavigableString可以转换为unicode 2. 如果想替换NavigableString的值, 使用 replace_with()方法
BeautifulSoup对象 -- 整个Html Document对象
>>> soup.name u'[document]' >>> soup <html><body><blockquote>No loger bold</blockquote></body></html> >>> type(soup) <class 'bs4.BeautifulSoup'>
Comments and other special strings
markup = "<b><!--Hey, buddy. Want to buy a used parser?--></b>" soup = BeautifulSoup(markup) comment = soup.b.string type(comment) # <class 'bs4.element.Comment'> print(soup.b.prettify()) # <b> # <!--Hey, buddy. Want to buy a used parser?--> # </b>
解析HTML
准备
html_doc = """ <html><head><title>The Dormouse's story</title></head> <p class="title"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """ from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc)
最简单 -- 使用标签名
>>> soup.head <head><title>The Dormouse's story</title></head> >>> soup.body.b <b>The Dormouse's story</b> >>> soup.find_all('a') [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] 总结: 1. soup.head # 获取第一个head标签 2. soup.body.b # 获取第一个body下第一个b标签 3. soup.find_all('a') # 获取所有a标签
.contents and .children
html_doc = ''' <html> <body> <a> href1 </a> <a> href2 </a> </body> </html> ''' # 获取子标签的方法1 -- 使用 .contents 用contents[0], contents[1]访问 >>> soup2 = BeautifulSoup(html_doc) >>> contents = soup2.body.contents >>> contents[0] <a>href1</a> >>> contents[1] <a>href2</a> # 方法2 -- 使用 .children 用于遍历 >>> for child in soup2.body.children: ... print(child) ... <a>href1</a> <a>href2</a>
.descendants
.children 和 .contents只能获取直接后代 而 .descendants可以获得所有后代 >>> head_tag.contents [<title>The Dormouse's story</title>] >>> for child in head_tag.descendants: ... print(child) ... <title>The Dormouse's story</title> The Dormouse's story >>> head_tag <head><title>The Dormouse's story</title></head> >>> >>> len(list(soup.children)) 1 >>> len(list(soup.descendants)) 25 >>>
.string
>>> title_tag <title>The Dormouse's story</title> >>> title_tag.string u"The Dormouse's story" >>> print(soup.html.string) None 总结: 1. 如果一个标签下面没有其他标签, 那么.string就是他的值 2. 如果一个标签下面还有子标签, 那么.string为 None
.strings and .stripped_strings
>>> for string in soup.strings: ... print(repr(string)) ... u"The Dormouse's story" u'\n' u"The Dormouse's story" u'\n' u'Once upon a time there were three little sisters; and their names were\n' u'Elsie' u',\n' u'Lacie' u' and\n' u'Tillie' u';\nand they lived at the bottom of a well.' u'\n' u'...' u'\n' >>> for string in soup.stripped_strings: ... print(repr(string)) ... u"The Dormouse's story" u"The Dormouse's story" u'Once upon a time there were three little sisters; and their names were' u'Elsie' u',' u'Lacie' u'and' u'Tillie' u';\nand they lived at the bottom of a well.' u'...' 总结: 1. .strings 获取一个标签下面所有的string 2. .stripped_strings: 忽略 值为'\n'的string 3. 关于 repr --> 讲object转换成 string
.parent
# 例子1 >>> title_tag = soup.title >>> title_tag <title>The Dormouse's story</title> >>> title_tag.parent <head><title>The Dormouse's story</title></head> # 例子2 >>> title_tag.string.parent <title>The Dormouse's story</title> # 例子3 >>> html_tag = soup.html >>> type(html_tag.parent) <class 'bs4.BeautifulSoup'> # 例子4 >>> print(soup.parent) None 总结: 1. html标签的父标签是 BeautifulSoup对象 2. BeautifulSoup没有父标签 (根节点)
.parents
>>> link = soup.a >>> link <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> >>> for parent in link.parents: ... if parent is None: ... print(parent) ... else: ... print(parent.name) ... p body html
兄弟节点
预备解析HTML
>>> sibling_soup = BeautifulSoup("<a><b>text1</b><c>text2</c></b></a>") >>> print(sibling_soup.prettify()) <html> <body> <a> <b> text1 </b> <c> text2 </c> </a> </body> </html>
.next_sibling and .previous_sibling
# 例子1 对照上面的prettify()输出 >>> sibling_soup.b.next_sibling <c>text2</c> >>> sibling_soup.c.previous_sibling <b>text1</b> # 例子2 看prettify()的输出, 可以看到 b标签上面没有兄弟标签 c标签下面也没有兄弟标签 因此输出是None >>> print(sibling_soup.b.previous_sibling) None >>> print(sibling_soup.c.next_sibling) None # 例子3 注意点: text1没有兄弟节点 因为它和text2不是同一个父亲! >>> sibling_soup.b.string u'text1' >>> sibling_soup.b.string.next_sibling None # 例子4 注意点: 第一个a标签的下一个兄弟节点是 '\n', 而不是 下一个<a>标签!(如果没有排版就不会) 先看所有的a标签 <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a> <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a> >>> link = soup.a >>> link <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> >>> link.next_sibling u',\n' >>> link.next_sibling.next_sibling <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> # 例子5: 验证上面的说法 -- 没有排版的话, a标签的下一个标签就不是 “\n‘ >>> html_doc = '<a href="link1"></a><a href="link2"></a>' >>> soup2 = BeautifulSoup(html_doc) >>> link = soup2.a >>> link <a href="link1"></a> >>> link.next_sibling <a href="link2"></a>
.next_siblings and .previous_siblings
# 例子1 >>> for sibling in soup.a.next_siblings: ... print(repr(sibling)) ... u',\n' <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> u' and\n' <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> u';\nand they lived at the bottom of a well.' # 例子2 >>> for sibling in soup.find(id='link3').previous_siblings: ... print(repr(sibling)) ... u' and\n' <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> u',\n' <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> u'Once upon a time there were three little sisters; and their names were\n'
.next_element and .previous_element
预备知识: <html><head><title>The Dormouse's story</title></head></html> HTML解析器如何解析这段? 打开html标签, 打开head标签, 打开title标签, 保存 'The Dormouse's stroy'这个string. 关闭 title标签, 关闭 head标签, 关闭html标签 <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> # 例子 >>> second_a_tag = soup.find('a', id='link2') >>> second_a_tag <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> >>> second_a_tag.next_sibling u' and\n' >>> second_a_tag.next_element u'Lacie' 总结: HTML解析器读取到<a id='link2'>处, 所以下一个元素是 Lacie, 再下一个元素是 u' and\n' (注: 结束标签不算在这里面)
.next_elements and .previous_elements
>>> last_a_tag = soup.find('a', id='link3') >>> for element in last_a_tag.next_elements: ... print(repr(element)) ... u'Tillie' u';\nand they lived at the bottom of a well.' u'\n' <p class="story">...</p> u'...' u'\n'
find() and find_all()
find_all() 的简单使用 Signature: find_all(name, attrs, recursive, text, limit, **kwargs)
# 例子1. 搜索所有title标签 >>> soup.find_all('title') [<title>The Dormouse's story</title>] # 例子2. 搜索所有class为title 的 p标签 >>> soup.find_all('p', 'title') [<p class="title"><b>The Dormouse's story</b></p>] # 例子3. 搜索所有a标签 >>> soup.find_all('a') [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] # 例子4. 搜索所有 id为link2的标签 >>> soup.find_all(id='link2') [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>] # 例子5. 搜索text中带有 sisters的标签 >>> import re >>> soup.find(text=re.compile('sisters')) u'Once upon a time there were three little sisters; and their names were\n'
使用函数作为参数
>>> def has_class_but_no_id(tag): ... return tag.has_attr('class') and not tag.has_attr('id') ... >>> soup.find_all(has_class_but_no_id) [<p class="title"><b>The Dormouse's story</b></p>, <p class="story">Once upon a time there were three little sisters; and their names were <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>; and they lived at the bottom of a well.</p>, <p class="story">...</p>]
find_all()的进阶使用
# 例子1. 搜索所有有id属性的标签 >>> soup.find_all(id=True) [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] # 例子2. 搜索href中带有 elsie 并且 id的值为link1的标签 >>> soup.find_all(href=re.compile('elsie'), id='link1') [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>] # 例子3. 对于特殊的属性名 >>> data_soup = BeautifulSoup('<div data-foo="value">foo!</div>') >>> data_soup.find_all(data-foo='value') File "<stdin>", line 1 SyntaxError: keyword can't be an expression 这样是不行的 使用 attrs={} >>> data_soup.find_all(attrs={'data-foo': 'value'}) [<div data-foo="value">foo!</div>]
Searching by CSS class
注意: class是Python的保留字, 所以使用的时候, 用 class_替代 (class的最后躲一下划线) # 例子1 搜索所有class为sister的a标签 >>> soup.find_all('a', class_='sister') [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] # 例子2 搜索class中带 itl的标签 >>> soup.find_all(class_=re.compile('itl')) [<p class="title"><b>The Dormouse's story</b></p>] # 例子3 使用函数作为参数 如果返回结果为True, 则matches这个标签 >>> def has_six_characters(css_class): ... return css_class is not None and len(css_class) == 6 ... >>> soup.find_all(class_=has_six_characters) [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] # 例子4 一个标签可以有多个值的属性 比如 class >>> css_soup = BeautifulSoup('<p class="body strikeout"></p>') >>> css_soup.find_all('p', class_='strikeout') [<p class="body strikeout"></p>] >>> css_soup.find_all('p', class_='body') [<p class="body strikeout"></p>] 注: 对于有多个值的属性, 我们可以通过其中的一个值搜索到它们 # 例子5 不过 如果一起搜索 顺序不能颠倒 >>> css_soup.find_all('p', class_='strikeout body') [] # 例子6 我们可以通过CSS selector选择我们要的标签 >>> css_soup.select('p.strikeout.body') [<p class="body strikeout"></p>] # 例子7 对于不支持 class_的早期版本, 使用 attrs={} >>> soup.find_all('a', attrs={'class': 'sister'}) [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
The text argument
With text you can search for strings instead of tags. As with name and the keyword arguments, you can pass in a string, a regular expression, a list, a function, or the value True. Here are some examples: # 例子1. 使用string作为参数 >>> soup.find_all(text='Elsie') [u'Elsie'] # 例子2. 使用List作为参数 >>> soup.find_all(text=['Tillie', 'Elsie', 'Lacie']) [u'Elsie', u'Lacie', u'Tillie'] # 例子3. 使用正则表达式作为参数 >>> soup.find_all(text=re.compile('Dormouse')) [u"The Dormouse's story", u"The Dormouse's story"] # 例子4. 使用函数作为参数 >>> def is_the_only_string_within_a_tag(s): ... return (s == s.parent.string) ... >>> soup.find_all(text=is_the_only_string_within_a_tag) [u"The Dormouse's story", u"The Dormouse's story", u'Elsie', u'Lacie', u'Tillie', u'...'] # 例子5. 联合其他参数一起搜索 >>> soup.find_all('a', text='Elsie') [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
The limit arguement
# 如果HTML文件很多, 解析速度就慢 这个时候 可以指定BeautifulSoup搜索的个数 # 例子: 只搜索符合条件的前两个结果 >>> soup.find_all('a', limit=2) [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
The recursive argument
解析的HTML <html> <head> <title> The Dormouse's story </title> </head> # 例子:title是head下面的, 而不是html下面的(直接) 如果关闭递归, 就找不到title了。 也就是说, 开启递归, 不仅搜索儿子, 还搜索孙子。 如果关闭递归, 就只搜索儿子 >>> soup.html.find_all('title') [<title>The Dormouse's story</title>] >>> soup.html.find_all('title', recursive=False) []
Calling a tag is like calling find_all()
tag标签也可以使用find_all(), 像BeautifulSoup对象一样 # 下面这两个是相等的 soup.title.find_all(text=True) soup.title(text=True)
find()
Signature: find(name, attrs, recursive, text, **kwargs)
find()的简单使用
# 例子1 这两个是等价 不过find_all返回的是所有结果中的前1个结果 而 find只是返回一个结果 find_all会搜索所有的文档 速度较慢 >>> soup.find_all() [<title>The Dormouse's story</title>] >>> soup.find('title') <title>The Dormouse's story</title> # 例子2 如果搜索不到相关的标签, find返回的是None 而find_all返回的是 list >>> print(soup.find('nosuchtag')) None >>> print(soup.find_all('nosuchtag')) [] # 例子3 这两个是相等的 >>> soup.head.title <title>The Dormouse's story</title> >>> soup.find('head').find('title') <title>The Dormouse's story</title>
find_parents() and find_parent()
Signature: find_parents(name, attrs, text, limit, **kwargs)
Signature: find_parent(name, attrs, text, **kwargs)
>>> a_string = soup.find(text='Lacie') >>> a_string u'Lacie' >>> a_string.find_parents('a') [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>] >>> a_string.find_parent('p') <p class="story">Once upon a time there were three little sisters; and their names were <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> >>> a_string.find_parents('p', class_='title') []
find_next_silbings() and find_next_sibling()
>>> first_link = soup.a >>> first_link <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> >>> first_link.find_next_siblings('a') [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] >>> first_stroy_paragraph = soup.find('p', 'story') >>> first_stroy_paragraph.find_next_sibling('p') <p class="story">...</p>
find_previous_siblings() and find_previous_sibling()
Signature: find_previous_siblings(name, attrs, text, limit, **kwargs)
Signature: find_previous_sibling(name, attrs, text, **kwargs)
>>> last_link = soup.find('a', id='link3') >>> last_link <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> >>> last_link.find_previous_siblings('a') [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>] >>> first_story_paragraph = soup.find('p', 'story') >>> first_story_paragraph.find_previous_sibling('p') <p class="title"><b>The Dormouse's story</b></p>
find_all_next() and find_next()
Signature: find_all_next(name, attrs, text, limit, **kwargs)
Signature: find_next(name, attrs, text, **kwargs)
>>> first_link = soup.a >>> first_link <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> >>> first_link.find_all_next(text=True) [u'Elsie', u',\n', u'Lacie', u' and\n', u'Tillie', u';\nand they lived at the bottom of a well.', u'\n', u'...', u'\n'] >>> first_link.find_next('p') <p class="story">...</p>
find_all_previous() and find_previous()
>>> first_link = soup.a >>> first_link <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> >>> first_link.find_all_previous('p') [<p class="story">Once upon a time there were three little sisters; and their names were <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>; and they lived at the bottom of a well.</p>, <p class="title"><b>The Dormouse's story</b></p>] >>> first_link.find_previous('title') <title>The Dormouse's story</title>
CSS selector 略
对于HTML的修改删除添加 略
Beautiful Soup Documentation
http://www.crummy.com/software/BeautifulSoup/bs4/doc
Python -- BeautifulSoup的学习使用的更多相关文章
- python beautifulsoup爬虫学习
BeautifulSoup(page_html, "lxml").select(),这里可以通过浏览器开发者模式选择copy selector,并且并不需要完整路径. github ...
- Python 应用领域及学习重点
笔者认为不管学习什么编程语言,首先要知道:学完之后在未来能做些什么? 本文将浅谈 Python 的应用领域及其在对应领域的学习重点.也仅是介绍了 Python 应用领域的"冰山一角" ...
- 一个Python爬虫工程师学习养成记
大数据的时代,网络爬虫已经成为了获取数据的一个重要手段. 但要学习好爬虫并没有那么简单.首先知识点和方向实在是太多了,它关系到了计算机网络.编程基础.前端开发.后端开发.App 开发与逆向.网络安全. ...
- PyQt(Python+Qt)学习随笔:富文本编辑器QTextEdit功能详解
专栏:Python基础教程目录 专栏:使用PyQt开发图形界面Python应用 专栏:PyQt入门学习 老猿Python博文目录 一.概述 QTextEdit是一个高级的所见即所得的文档查看器和编辑器 ...
- Python爬虫系统学习(1)
Python爬虫系统化学习(1) 前言:爬虫的学习对生活中很多事情都很有帮助,比如买房的时候爬取房价,爬取影评之类的,学习爬虫也是在提升对Python的掌握,所以我准备用2-3周的晚上时间,提升自己对 ...
- Python爬虫系统化学习(4)
Python爬虫系统化学习(4) 在之前的学习过程中,我们学习了如何爬取页面,对页面进行解析并且提取我们需要的数据. 在通过解析得到我们想要的数据后,最重要的步骤就是保存数据. 一般的数据存储方式有两 ...
- Python爬虫系统化学习(5)
Python爬虫系统化学习(5) 多线程爬虫,在之前的网络编程中,我学习过多线程socket进行单服务器对多客户端的连接,通过使用多线程编程,可以大大提升爬虫的效率. Python多线程爬虫主要由三部 ...
- Python 装饰器学习
Python装饰器学习(九步入门) 这是在Python学习小组上介绍的内容,现学现卖.多练习是好的学习方式. 第一步:最简单的函数,准备附加额外功能 1 2 3 4 5 6 7 8 # -*- c ...
- Requests:Python HTTP Module学习笔记(一)(转)
Requests:Python HTTP Module学习笔记(一) 在学习用python写爬虫的时候用到了Requests这个Http网络库,这个库简单好用并且功能强大,完全可以代替python的标 ...
随机推荐
- 【UVA 1451】Average
题 题意 求长度为n的01串中1占总长(大于L)的比例最大的一个子串起点和终点. 分析 前缀和s[i]保存前i个数有几个1,[j+1,i] 这段区间1的比例就是(s[i]-s[j])/(i-j),于是 ...
- BZOJ-3670 动物园 KMP+奇怪的东西
YveH爷再刷KMP,DCrusher看他刷KMP,跟着两个人一块刷KMP... 3670: [Noi2014]动物园 Time Limit: 10 Sec Memory Limit: 512 MB ...
- POJ-2352 Stars 树状数组
Stars Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 39186 Accepted: 17027 Description A ...
- codevs3031 最富有的人
题目描述 Description 在你的面前有n堆金子,你只能取走其中的两堆,且总价值为这两堆金子的xor值,你想成为最富有的人,你就要有所选择. 输入描述 Input Description 第一行 ...
- Dinic 算法
#include <bits/stdc++.h> using namespace std; ), M(1e5+); int head[N]; struct Edge{ /* r: resi ...
- The Longest Increasing Subsequence (LIS)
传送门 The task is to find the length of the longest subsequence in a given array of integers such that ...
- Oracle DBA从小白到入职实战应用
现如今Oracle依然是RDBMS的王者,在技术上和战略上,Oracle仍然一路高歌猛进,并且全面引领行业迈入了云时代,伴随着12cR2即将在2016年正式发布,学习Oracle之路依旧任重道远,目前 ...
- FOJProblem 2214 Knapsack problem(01背包+变性思维)
http://acm.fzu.edu.cn/problem.php?pid=2214 Accept: 4 Submit: 6Time Limit: 3000 mSec Memory Lim ...
- Spring学习4-面向切面(AOP)之Spring接口方式
一.初识AOP 关于AOP的学习可以参看帮助文档:spring-3.2.0.M2\docs\reference\html目录下index.html的相关章节 1.AOP:Aspect ...
- Mysql数据库的工作原理