安装python2.7 参见CentOS升级python 2.6到2.7 安装pip 参见CentOS安装python setuptools and pip 依赖 https://docs.scrapy.org/en/latest/intro/install.html lxml, an efficient XML and HTML parser parsel, an HTML/XML data extraction library written on top of lxml, w3lib,…
抓取到的item 会被发送到Item Pipeline进行处理 Item Pipeline常用于 cleansing HTML data validating scraped data (checking that the items contain certain fields) checking for duplicates (and dropping them) storing the scraped item in a database 目录 [隐藏] 1 写一个自己的item pip…