BeautifulSoup_python3
1.错误排除
bsObj = BeautifulSoup(html.read())
报错:
UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
解决办法:
bsObj = BeautifulSoup(html.read(),"html.parser")
BeautifulSoup
简介:通过定位HTML标签来格式化和组织复杂的网络信息,用简单的python对象来展现XML结构信息。
python3 安装 版本4 BeautifulSoup4 (BS4)
运行实例:
#!/usr/bin/env python
# encoding: utf-8
"""
@author: 侠之大者kamil
@file: beautifulsoup.py
@time: 2016/4/19 16:36
"""
from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen('http://www.cnblogs.com/kamil/')
print(type(html))
bsObj = BeautifulSoup(html.read(),"html.parser") #html.read() 获取网页内容,并且传输到BeautifulSoup 对象。
print(type(bsObj))
print(bsObj.h1)
第12 行注意,需要加上 "html.parser"
结果:
ssh://kamil@xzdz.hk:22/usr/bin/python3 -u /home/kamil/windows_python3/python3/Day11/day12/beautifulsoup.py
<class 'http.client.HTTPResponse'>
<class 'bs4.BeautifulSoup'>
<h1><a class="headermaintitle" href="http://www.cnblogs.com/kamil/" id="Header1_HeaderTitle">侠之大者kamil</a></h1> Process finished with exit code 0
BeautifulSoup_python3的更多相关文章
随机推荐
- Linux—C内存管理
程序(可执行文件)存储结构与进程存储结构: 查看文件基本情况:file fileName.查看文件存储情况:size fileName(代码区text segment.全局初始化/静态数据区data ...
- 内裤:DataTable转Model
public class ConvertHelper<T> where T : new() { /// <summary> /// 利用反射和泛型 /// </summa ...
- 用django实现一个微信图灵机器人
微信的post请求格式是xml,所以django需要做的就是将xml请求解析出来,把content发送到图灵机器人接口, 接口返回的json数据把主要内容给解析出来,然后重新封装成xml返回给微信客户 ...
- 10 Things Every Java Programmer Should Know about String
String in Java is very special class and most frequently used class as well. There are lot many thin ...
- Broadmann分区
来源: http://blog.sina.com.cn/s/blog_60a751620100k2hj.html Brodmann areas Name 中文名 Function 1 Somatose ...
- NOIP2016提高组解题报告
NOIP2016提高组解题报告 更正:NOIP day1 T2天天爱跑步 解题思路见代码. NOIP2016代码整合
- 解决Ehcache缓存警告问题
警告: Creating a new instance of CacheManager using the diskStorePath "D:\Apache Tomcat 6.0.18\te ...
- struts2: config-browser-plugin 与 convention-plugin 学习
struts2被很多新手诟病的一个地方在于“配置过于复杂”,相信不少初学者因为这个直接改投Spring-MVC了.convention-plugin. config-browser-plugin这二个 ...
- jboss:跟踪所有sql语句及sql参数
默认情况下,hibernate/JPA 在server.log中记录的SQL语句,参数都是用?代替的,这样不太方便. 网上留传的p6spy在最新的jboss上(EAP 6.0+版本)貌似已经不起作用了 ...
- python 图
class Graph(object): def __init__(self,*args,**kwargs): self.node_neighbors = {} self.visited = {} d ...