Use BeautifulSoup and Python to scrap a website

Lib:

  • urllib
  • Parsing HTML Data

Web scraping script

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup quotes_page = "https://bluelimelearning.github.io/my-fav-quotes/"
uClient = uReq(quotes_page)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
quotes = page_soup.findAll("div", {"class":"quotes"}) for quote in quotes:
fav_quote = quote.findAll("p", {"class":"aquote"})
aquote = fav_quote[0].text.strip() fav_authors = quote.findAll("p",{"class":"author"})
author = fav_authors[0].text.strip() print(aquote)
print(author)

Run this script successfully

Following is the whole result of this scraping.

I hear and i forget. I see and i remember. I do and i understand.
Confucious
Feeling gratitude and not expressing it is like wrapping a present and not giving it.
William Arthur Ward
Our greatest glory is not in never falling but in rising every time we fall.
Confucious
The secret of getting aheadis getting started.
Mark Twain
Believe you can and you're halfway there.
Theodore Roosevelt
Resentment is like drinking Poison and waiting for your enemies to die.
Nelson Mandela
Silence is a true friend who never betrays.
Confucius
The best way to find yourself is to lose yourself in the service of others.
Mahatma Gandhi
Never succumb to the temptation of bitterness.
Martin Luther King Jnr
The journey of a thousand miles begins with one step.
Lao Tzu
It is health that is real wealth and not pieces of gold and silver.
Mahatma Gandhi
Yesterday is not ours to recover but tomorrow is ours to win or lose.
Lyndon B Johnson
It's not what happens to you but how you react to it that matters .
Epictetus
Beware of what you become in pursuit of what you want.
Jim Rohn
The best revenge is massive success.
Frank Sinatra
Do not take life too seriously You will never get out of it alive.
Elbert Hubbard
Don't judge each day by the harvest you reap but by the seeds that yiu plant.
Robert Loius Stevenson
Your attitude and not your aptitude will determine your altitude
Zig Ziglar
Imagination is more important than knowledge.
Albert Einstein

.

Web Scraping using Python Scrapy_BS4 - using BeautifulSoup and Python的更多相关文章

  1. Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(2)

    Scrapy Architecture Creating a Spider. Spiders are classes that you define that Scrapy uses to scrap ...

  2. Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(1)

    Create a new Scrapy project first. scrapy startproject projectName . Open this project in Visual Stu ...

  3. Web Scraping using Python Scrapy_BS4 - Software

    Install the following software before web scraping. Visual Studio Code Python and Pip pip install vi ...

  4. Web Scraping using Python Scrapy_BS4 - Introduction

    What is Web Scraping This is also referred to as web harvesting and web data extraction. This is the ...

  5. Web Scraping with Python读书笔记及思考

    Web Scraping with Python读书笔记 标签(空格分隔): web scraping ,python 做数据抓取一定一定要明确:抓取\解析数据不是目的,目的是对数据的利用 一般的数据 ...

  6. <Web Scraping with Python>:Chapter 1 & 2

    <Web Scraping with Python> Chapter 1 & 2: Your First Web Scraper & Advanced HTML Parsi ...

  7. 《Web Scraping With Python》Chapter 2的学习笔记

    You Don't Always Need a Hammer When Michelangelo was asked how he could sculpt a work of art as mast ...

  8. 阅读OReilly.Web.Scraping.with.Python.2015.6笔记---Crawl

    阅读OReilly.Web.Scraping.with.Python.2015.6笔记---Crawl 1.函数调用它自身,这样就形成了一个循环,一环套一环: from urllib.request ...

  9. 阅读OReilly.Web.Scraping.with.Python.2015.6笔记---找出网页中所有的href

    阅读OReilly.Web.Scraping.with.Python.2015.6笔记---找出网页中所有的href 1.查找以<a>开头的所有文本,然后判断href是否在<a> ...

随机推荐

  1. 深入浅出PyTorch(算子篇)

    Tensor 自从张量(Tensor)计算这个概念出现后,神经网络的算法就可以看作是一系列的张量计算.所谓的张量,它原本是个数学概念,表示各种向量或者数值之间的关系.PyTorch的张量(torch. ...

  2. robot framework使用小结(一)

    项目组要用到robot framework验收web,因此花了两天时间了解了一下这个框架.我把网上各位大侠分享的内容整理成一个小小demo,参考的出处没有列出来,在此一并感谢各位. demo仍旧是打开 ...

  3. windows下 react-native环境搭建

    跟着慕课网做案例,搭建rn环境遇到很大问题. 下面说一下: 首先看一下文档:http://reactnative.cn/docs/0.44/getting-started.html#content 注 ...

  4. EntityFramework Core 迁移忽略主外键关系

    前言 本文来源于一位公众号童鞋私信我的问题,在我若加思索后给出了其中一种方案,在此之前我也思考过这个问题,借此机会我稍微看了下,目前能够想到的也只是本文所述方案. 为何要忽略主外键关系 我们不仅疑惑为 ...

  5. Syntax error, insert "}" to complete MethodBody

    jsp中代码在Eclipse中打开正常,导入项目导入MyEclipse后显示如下异常: Syntax error, insert "}" to complete MethodBod ...

  6. Mybatis 动态insert语句

    mybatis的一个比较先进的思想是把Sql语句写在了配置xml文件(也支持注解),通过配置文件的方式,免去了一般软件开发的硬编码,当业务需求改变的时候,只需要更改sql语句即可! 下面是个人在学习m ...

  7. 每天一个LINUX命令(pwd)

    每天一个LINUX命令(pwd) 基本信息 pwd: /bin/pwd,显示当前路径的绝对路径         语法:pwd 应用程序位置     which pwd PWD作用 pwd --help ...

  8. 02 . 分布式存储之FastDFS 高可用集群部署

    单节点部署和原理请看上一篇文章 https://www.cnblogs.com/you-men/p/12863555.html 环境 [Fastdfs-Server] 系统 = CentOS7.3 软 ...

  9. 常见的H5移动端Web页面Bug问题解决方案总汇

    解决jquery ajax调用远程接口的跨域问题 首先,接口必须允许远程调用.这是后端或者运维的事情.你必须保证你得到的一个接口是允许远程调用的.否则,就没啥了. $.ajax({ type:'get ...

  10. 常用API - Scanner、Random、ArrayList

    API 概述 API(Application Programming Interface),应用程序编程接口. Java API是一本程序员的 字典 ,是JDK中提供给我们使用的类的说明文档. 这些类 ...