Finding an Element with the select() Method

调用select()方法从BeautifulSoup对象索取网页元素,并用CSS 选择器传递你寻找的元素。
选择器像正则表达式

不同选择器模式可以组合,产生复杂配对。
例如soup.select('p #author')匹配有id的作者,并在<p>元素内。

你可以从BeautifulSoup对象
You can retrieve a web page element from a BeautifulSoup object by calling the select()method and passing a string of a CSS selector for the element you are looking for. Selectors are like regular expressions: They specify a pattern to look for, in this case, in HTML pages instead of general text strings.

A full discussion of CSS selector syntax is beyond the scope of this book (there’s a good selector tutorial in the resources athttp://nostarch.com/automatestuff/), but here’s a short introduction to selectors. Table 11-2 shows examples of the most common CSS selector patterns.

常见CSS 选择器

Table 11-2. Examples of CSS Selectors

Selector passed to the select()method

Will match...

soup.select('div')

All elements named <div>

soup.select('#author')

The element with an id attribute of author

soup.select('.notice')

All elements that use a CSS class attribute named notice

soup.select('div span')

All elements named <span> that are within an element named <div>

soup.select('div > span')

All elements named <span> that are directly within an element named <div>, with no other element in between

soup.select('input[name]')

All elements named <input> that have a name attribute with any value

soup.select('input[type="button"]')

All elements named <input> that have an attribute namedtype with value button

不同选择器模式可以组合,产生复杂配对。
例如soup.select('p #author')匹配有id的作者,并在<p>元素内。
The various selector patterns can be combined to make sophisticated matches. For example, soup.select('p #author') will match any element that has an id attribute of author, as long as it is also inside a <p> element.

The select() method will return a list of Tag objects, which is how Beautiful Soup represents an HTML element. The list will contain one Tag object for every match in the BeautifulSoup object’s HTML. Tag values can be passed to the str()function to show the HTML tags they represent. Tag values also have an attrsattribute that shows all the HTML attributes of the tag as a dictionary. Using the example.html file from earlier, enter the following into the interactive shell:

>>> import bs4
>>> exampleFile = open('example.html')
>>> exampleSoup = bs4.BeautifulSoup(exampleFile.read()) #read()把文件当做一个字符串读取
>>> elems = exampleSoup.select('#author')
>>> type(elems)
 <class 'list'>
>>> len(elems)
 
1
>>> type(elems[0])
 
<class 'bs4.element.Tag'>
>>> elems[0].getText()
 
'Al Sweigart'
>>> str(elems[0])
 
'<span id="author">Al Sweigart</span>'
>>> elems[0].attrs
 
{'id': 'author'}

这代码把 id="author" 的元素从example HTML文档中提取出来。
我们把Tag列表对象存储进elems变量,
 len(elems)告诉我们列表里只有一个Tag标签
元素调用函数getText() 返回元素的文字内容。
attrs返回元素属性 
str() 返回字符串,字符串包含标签符 

This code will pull the element with id="author" out of our example HTML. We useselect('#author') to return a list of all the elements with id="author". We store this list of Tag objects in the variable elems, and len(elems) tells us there is one Tag object in the list; there was one match. Calling getText() on the element returns the element’s text, or inner HTML. The text of an element is the content between the opening and closing tags: in this case, 'Al Sweigart'.

Passing the element to str() returns a string with the starting and closing tags and the element’s text. Finally, attrs gives us a dictionary with the element’s attribute, 'id', and the value of the id attribute, 'author'.

You can also pull all the <p> elements from the BeautifulSoup object. Enter this into the interactive shell:

>>> pElems = exampleSoup.select('p')
>>> str(pElems[0])
 
'<p>Download my <strong>Python</strong> book from <a href="http://inventwithpython.com">my website</a>.</p>'
>>> pElems[0].getText()
 
'Download my Python book from my website.'
>>> str(pElems[1])
 
'<p class="slogan">Learn Python the easy way!</p>'
>>> pElems[1].getText()
 
'Learn Python the easy way!'
>>> str(pElems[2])
 '<p>By <span id="author">Al Sweigart</span></p>'
>>> pElems[2].getText()
 
'By Al Sweigart'

This time, select() gives us a list of three matches, which we store in pElems. Using str() on pElems[0]pElems[1], and pElems[2] shows you each element as a string, and using getText() on each element shows you its text.

Getting Data from an Element’s Attributes

The get() method for Tag objects makes it simple to access attribute values from an element. The method is passed a string of an attribute name and returns that attribute’s value. Using example.html, enter the following into the interactive shell:

>>> import bs4
>>> soup = bs4.BeautifulSoup(open('example.html'))
>>> spanElem = soup.select('span')[0]
>>> str(spanElem)
 
'<span id="author">Al Sweigart</span>'
>>> spanElem.get('id')
 
'author'
>>> spanElem.get('some_nonexistent_addr') == None
 
True
>>> spanElem.attrs
 
{'id': 'author'}

这里我们选择 select()方法找到<span> 元素,并把匹配的第一元素存储在spanElem变量里。
传输id属性到get()函数,返回属性值 'author'

Here we use select() to find any <span> elements and then store the first matched element in spanElem. Passing the attribute name 'id' to get() returns the attribute’s value, 'author'.

bs4_3select()的更多相关文章

随机推荐

  1. 【Alpha版本】冲刺阶段——Day 6

    我说的都队 031402304 陈燊 031402342 许玲玲 031402337 胡心颖 03140241 王婷婷 031402203 陈齐民 031402209 黄伟炜 031402233 郑扬 ...

  2. “CEPH浅析”系列之八——小结

    最初决定写这些文章的时候,本打算大致记录一下,几千字也就了事了.可是越写越觉得东西多,不说明白总有些不甘心,于是就越写越长,到这儿为止貌似已经有一万七千多字了.除了博士论文之外,应该是没有写过更长的东 ...

  3. jsp内置对象作业3-application用户注册

    1,注册页面 zhuCe.jsp <%@ page language="java" contentType="text/html; charset=UTF-8&qu ...

  4. 类-string/Manth/Random/DateTime-及练习

    类一.string类:.Length 字符串的长度 .Trim() 去掉开头以及结尾的空格.TrimStart() 去掉开头的空格.TrimEnd() 去掉结尾的空格 .ToLower() 全部转换为 ...

  5. 初识React

    React 是Facebook开源的一个用于构建用户界面的Javascript库,已经 应用于Facebook及旗下Instagram React专注于MVC架构中的V,即视图 React引入了 虚拟 ...

  6. 【转】ListView学习笔记(一)——缓存机制

    要想优化ListView首先要了解它的工作原理,列表的显示需要三个元素:ListView.Adapter.显示的数据: 这里的Adapter就是用到了适配器模式,不管传入的是什么View在ListVi ...

  7. java高新技术-泛型

    1.体验泛型 泛型是提供给javac编译器使用的,可以限定集合中的输入类型,让编译器挡住源程序中的非法输入,编译器编译带类型说明的集合时去除掉"类型"信息,使程序运行小效率不受影响 ...

  8. 微信小程序一步步搭建商城系列-01-开篇

    1.小程序介绍 小程序是一种不需要下载安装即可使用的应用,它实现了应用“触手可及”的梦想,用户扫一扫或者搜一下即可打开应用.也体现了“用完即走”的理念,用户不用关心是否安装太多应用的问题.应用将无处不 ...

  9. BZOJ2432 [Noi2011]兔农

    本文版权归ljh2000和博客园共有,欢迎转载,但须保留此声明,并给出原文链接,谢谢合作. 本文作者:ljh2000作者博客:http://www.cnblogs.com/ljh2000-jump/转 ...

  10. JSP的JSTL标签使用

    JSTL标签和asp.net中的webform控件很像,但是功能确比asp.net的强很多. 配置过程,从最简单的项目开始: 1.下载JSTL标签库:http://archive.apache.org ...