Web Scraping using Python Scrapy_BS4 - using BeautifulSoup and Python
Use BeautifulSoup and Python to scrap a website
Lib:
- urllib
- Parsing HTML Data
Web scraping script
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup quotes_page = "https://bluelimelearning.github.io/my-fav-quotes/"
uClient = uReq(quotes_page)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
quotes = page_soup.findAll("div", {"class":"quotes"}) for quote in quotes:
fav_quote = quote.findAll("p", {"class":"aquote"})
aquote = fav_quote[0].text.strip() fav_authors = quote.findAll("p",{"class":"author"})
author = fav_authors[0].text.strip() print(aquote)
print(author)
Run this script successfully
Following is the whole result of this scraping.
I hear and i forget. I see and i remember. I do and i understand.
Confucious
Feeling gratitude and not expressing it is like wrapping a present and not giving it.
William Arthur Ward
Our greatest glory is not in never falling but in rising every time we fall.
Confucious
The secret of getting aheadis getting started.
Mark Twain
Believe you can and you're halfway there.
Theodore Roosevelt
Resentment is like drinking Poison and waiting for your enemies to die.
Nelson Mandela
Silence is a true friend who never betrays.
Confucius
The best way to find yourself is to lose yourself in the service of others.
Mahatma Gandhi
Never succumb to the temptation of bitterness.
Martin Luther King Jnr
The journey of a thousand miles begins with one step.
Lao Tzu
It is health that is real wealth and not pieces of gold and silver.
Mahatma Gandhi
Yesterday is not ours to recover but tomorrow is ours to win or lose.
Lyndon B Johnson
It's not what happens to you but how you react to it that matters .
Epictetus
Beware of what you become in pursuit of what you want.
Jim Rohn
The best revenge is massive success.
Frank Sinatra
Do not take life too seriously You will never get out of it alive.
Elbert Hubbard
Don't judge each day by the harvest you reap but by the seeds that yiu plant.
Robert Loius Stevenson
Your attitude and not your aptitude will determine your altitude
Zig Ziglar
Imagination is more important than knowledge.
Albert Einstein
.
Web Scraping using Python Scrapy_BS4 - using BeautifulSoup and Python的更多相关文章
- Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(2)
Scrapy Architecture Creating a Spider. Spiders are classes that you define that Scrapy uses to scrap ...
- Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(1)
Create a new Scrapy project first. scrapy startproject projectName . Open this project in Visual Stu ...
- Web Scraping using Python Scrapy_BS4 - Software
Install the following software before web scraping. Visual Studio Code Python and Pip pip install vi ...
- Web Scraping using Python Scrapy_BS4 - Introduction
What is Web Scraping This is also referred to as web harvesting and web data extraction. This is the ...
- Web Scraping with Python读书笔记及思考
Web Scraping with Python读书笔记 标签(空格分隔): web scraping ,python 做数据抓取一定一定要明确:抓取\解析数据不是目的,目的是对数据的利用 一般的数据 ...
- <Web Scraping with Python>:Chapter 1 & 2
<Web Scraping with Python> Chapter 1 & 2: Your First Web Scraper & Advanced HTML Parsi ...
- 《Web Scraping With Python》Chapter 2的学习笔记
You Don't Always Need a Hammer When Michelangelo was asked how he could sculpt a work of art as mast ...
- 阅读OReilly.Web.Scraping.with.Python.2015.6笔记---Crawl
阅读OReilly.Web.Scraping.with.Python.2015.6笔记---Crawl 1.函数调用它自身,这样就形成了一个循环,一环套一环: from urllib.request ...
- 阅读OReilly.Web.Scraping.with.Python.2015.6笔记---找出网页中所有的href
阅读OReilly.Web.Scraping.with.Python.2015.6笔记---找出网页中所有的href 1.查找以<a>开头的所有文本,然后判断href是否在<a> ...
随机推荐
- new jup在新一代中存在
1.灰度发布服务动态路由 动态配置路由规则,实现对调用流量的精确控制.可配置基于版本.IP.自定义标签等复杂的规则.2.服务鉴权示例2需求:服务 provider-demo 只允许来自 consume ...
- SpringCloud之初识Feign
在前面的学习中,我们使用了Ribbon的负载均衡功能,大大简化了远程调用时的代码: String baseUrl = "http://user-service/user/"; Us ...
- junit基本介绍视频笔记1
程序员每天工作的基本流程: 1.从svn检出代码: 2.运行单元测试,测试无误,进入下一步: 3.开始一天的代码编写工作: 4.代码提交到服务器之前进行单元测试: 5.单元测试通过提交到svn服务器. ...
- disruptor架构三 使用场景 使用WorkHandler和BatchEventProcessor辅助创建消费者
在helloWorld的实例中,我们创建Disruptor实例,然后调用getRingBuffer方法去获取RingBuffer,其实在很多时候,我们可以直接使用RingBuffer,以及其他的API ...
- Jmeter系列(29)- 详解 JDBC Connection Configuration
如果你想从头学习Jmeter,可以看看这个系列的文章哦 https://www.cnblogs.com/poloyy/category/1746599.html 前言 发起 jdbc 请求前,需要有 ...
- Python实用笔记 (11)高级特性——迭代器
这些可以直接作用于for循环的对象统称为可迭代对象:Iterable. 可以使用isinstance()判断一个对象是否是Iterable对象: >>> from collectio ...
- java.math.BigDecimal cannot be cast to [Ljava.lang.Object;
从数据库中使用sum函数取出统计值后,放进list中,遍历list的时候强转化成Object是报错. BigDecimal .Integer不是基本类型,是int的包装类,无法把包装当做基本类型来用. ...
- centos6.4 卸载 vim7.2 安装vim7.4
一.# rpm -qa|grep vim vim-minimal-7.2.-1.8.el6.x86_64 vim-enhanced-7.2.-1.8.el6.x86_64 vim-common-7.2 ...
- 关于SQL SERVER 的日期格式化
--日期格式化Select CONVERT(varchar(100), GETDATE(), 0): 05 16 2006 10:57AM Select CONVERT(varchar(100), G ...
- java语言基础(七)_继承_super_this_抽象类
继承 1. 继承概述 2. 继承格式 在继承的关系中,"子类就是一个父类".也就是说,子类可以被当做父类看待. 例如父类是员工,子类是讲师,那么"讲师就是一个员工&quo ...