Web Scraping using Python Scrapy_BS4 - using BeautifulSoup and Python
Use BeautifulSoup and Python to scrap a website
Lib:
- urllib
- Parsing HTML Data
Web scraping script
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup quotes_page = "https://bluelimelearning.github.io/my-fav-quotes/"
uClient = uReq(quotes_page)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
quotes = page_soup.findAll("div", {"class":"quotes"}) for quote in quotes:
fav_quote = quote.findAll("p", {"class":"aquote"})
aquote = fav_quote[0].text.strip() fav_authors = quote.findAll("p",{"class":"author"})
author = fav_authors[0].text.strip() print(aquote)
print(author)
Run this script successfully

Following is the whole result of this scraping.
I hear and i forget. I see and i remember. I do and i understand.
Confucious
Feeling gratitude and not expressing it is like wrapping a present and not giving it.
William Arthur Ward
Our greatest glory is not in never falling but in rising every time we fall.
Confucious
The secret of getting aheadis getting started.
Mark Twain
Believe you can and you're halfway there.
Theodore Roosevelt
Resentment is like drinking Poison and waiting for your enemies to die.
Nelson Mandela
Silence is a true friend who never betrays.
Confucius
The best way to find yourself is to lose yourself in the service of others.
Mahatma Gandhi
Never succumb to the temptation of bitterness.
Martin Luther King Jnr
The journey of a thousand miles begins with one step.
Lao Tzu
It is health that is real wealth and not pieces of gold and silver.
Mahatma Gandhi
Yesterday is not ours to recover but tomorrow is ours to win or lose.
Lyndon B Johnson
It's not what happens to you but how you react to it that matters .
Epictetus
Beware of what you become in pursuit of what you want.
Jim Rohn
The best revenge is massive success.
Frank Sinatra
Do not take life too seriously You will never get out of it alive.
Elbert Hubbard
Don't judge each day by the harvest you reap but by the seeds that yiu plant.
Robert Loius Stevenson
Your attitude and not your aptitude will determine your altitude
Zig Ziglar
Imagination is more important than knowledge.
Albert Einstein
.
Web Scraping using Python Scrapy_BS4 - using BeautifulSoup and Python的更多相关文章
- Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(2)
Scrapy Architecture Creating a Spider. Spiders are classes that you define that Scrapy uses to scrap ...
- Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(1)
Create a new Scrapy project first. scrapy startproject projectName . Open this project in Visual Stu ...
- Web Scraping using Python Scrapy_BS4 - Software
Install the following software before web scraping. Visual Studio Code Python and Pip pip install vi ...
- Web Scraping using Python Scrapy_BS4 - Introduction
What is Web Scraping This is also referred to as web harvesting and web data extraction. This is the ...
- Web Scraping with Python读书笔记及思考
Web Scraping with Python读书笔记 标签(空格分隔): web scraping ,python 做数据抓取一定一定要明确:抓取\解析数据不是目的,目的是对数据的利用 一般的数据 ...
- <Web Scraping with Python>:Chapter 1 & 2
<Web Scraping with Python> Chapter 1 & 2: Your First Web Scraper & Advanced HTML Parsi ...
- 《Web Scraping With Python》Chapter 2的学习笔记
You Don't Always Need a Hammer When Michelangelo was asked how he could sculpt a work of art as mast ...
- 阅读OReilly.Web.Scraping.with.Python.2015.6笔记---Crawl
阅读OReilly.Web.Scraping.with.Python.2015.6笔记---Crawl 1.函数调用它自身,这样就形成了一个循环,一环套一环: from urllib.request ...
- 阅读OReilly.Web.Scraping.with.Python.2015.6笔记---找出网页中所有的href
阅读OReilly.Web.Scraping.with.Python.2015.6笔记---找出网页中所有的href 1.查找以<a>开头的所有文本,然后判断href是否在<a> ...
随机推荐
- TiDB初探
TiDB是一个开源的分布式NewSQL数据库,设计的目标是满足100%的OLTP和80%的OLAP,支持SQL.水平弹性扩展.分布式事务.跨数据中心数据强一致性保证.故障自恢复的高可用.海量数据高并发 ...
- SpringBoot中注入ApplicationContext对象的三种方式
[本文版权归微信公众号"代码艺术"(ID:onblog)所有,若是转载请务必保留本段原创声明,违者必究.若是文章有不足之处,欢迎关注微信公众号私信与我进行交流!] 在项目中,我们可 ...
- TCP实战二(半连接队列、全连接队列)
TCP实验一我们利用了tcpdump以及Wireshark对TCP三次握手.四次挥手.流量控制做了深入的分析,今天就让我们一同深入理解TCP三次握手中两个重要的结构:半连接队列.全连接队列. 参考文献 ...
- Navicat15安装激活版教程
navicat15安装 一键式安装,安装包如下 链接:https://pan.baidu.com/s/1VTJmJ7ulUySWoWBu-fugiw 提取码:fz5u 先安装软件包点击安装,一直下一步 ...
- 吐血推荐,想进BAT必看
不必太纠结于当下,也不必太忧虑未来,人生没有无用的经历,当你经历过一些事情后,眼前的风景已经和从前不一样了.--村上春树 一.包含如下内容 ActiveMQ消息中间件面试专题 BAT80道面试题 BA ...
- 第二部分用户交互程序开发,通过paramiko记录ssh会话记录
需求及任务:实现一个给用户登录的界面(通过ssh登到堡垒机上,然后给它展现一个命令行的页面,然后他选择登哪台机器,一选择就连上去且把日志也记录下来). 先在admin创建几条组数据并与用户关联如下图: ...
- 数据解析_bs进行数据解析
1.bs4进行数据解析 数据解析的原理 1.标签定位 2.提取标签,标签属性中存储的数据值 bs4数据解析的原理 1.实例化一个BeautifulSoup对象,并且将页面源码数据加载到该对象中 2.通 ...
- Qt-绘图
1 简介 参考视频:https://www.bilibili.com/video/BV1XW411x7NU?p=37 参考文档:<Qt教程.docx> 本文简单介绍Qt的绘图与绘图设备. ...
- css modules是什么?
什么是CSS Modules? 官方的介绍是: 所有的 class 的名称和动画的名称默认属于本地作用域的 CSS 文件.所以 CSS Modules 不是一个官方的规范,也不是浏览器的一种机制,它是 ...
- css3动画添加间隔
因项目需要,需要在元素上实现动画效果,并且需要有动画间隔.坑爹的是animation-delay只有在第一次动画开始的时候才起效. 在网上找了很多方法,最终的方法基本都是改动画规则,比如 @keyfr ...