深入爬虫书scrapy 之json内容没有写入文本

settings.py设置

ITEM_PIPELINES = {

   'tets.pipelines.TetsPipeline': 300,

}

spider代码

xpath后缀添加.extract() parse()返回return item

import scrapy

from tets.items import TetsItem

class KugouSpider(scrapy.Spider):

    name = 'kugou'

    allowed_domains = ['www.kugou.com']

    start_urls = ['http://www.kugou.com/']

    def parse(self, response):

        item = TetsItem()

        item['title'] = response.xpath("/html/head/title/text()").extract()

        print(item['title'])

        return item

piplines代码

# -*- coding: utf-8 -*-

# Define your item pipelines here

#

# Don't forget to add your pipeline to the ITEM_PIPELINES setting

# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html

import codecs

import json

class TetsPipeline(object):

    def __init__(self):

        # self.file = codecs.open("D:/git/learn_scray/day11/mydata2.txt", "wb", encoding="utf-8")

        self.file = codecs.open("D:/git/learn_scray/day11/1.json", "wb", encoding="utf-8")

    # 处理文本(xx.txt)

    # def process_item(self, item, spider):

    #     l = str(item) + "\n"

    #     print(l)

    #     self.file.write(l)

    #     return item

    def process_item(self, item, spider):

        print("进入")

        # print(item)

        i = json.dumps(dict(item), ensure_ascii=False)

        # print("进入json")

        # print(i)

        l = i + "\n"

        print(l)

        self.file.write(l)

        return item

    def close_spider(self, spider):

        slef.file.close()

items.py

# -*- coding: utf-8 -*-

# Define here the models for your scraped items

#

# See documentation in:

# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy

class TetsItem(scrapy.Item):

    # define the fields for your item here like:

    # name = scrapy.Field()

    title = scrapy.Field()

结果如图下

深入爬虫书scrapy 之json内容没有写入文本的更多相关文章

服务端JSON内容中有富文本时
问题背景由于数据中存在复杂的富文本,包含各种引号和特殊字符,导致后端和前端通过JSON格式进行数据交互引发前端JSON解析出错. 解决方案后端将富文本内容 ConvertToBase64Strin ...
python根据索引删除内容并写入文本
在python中,有个好用的模块linecache,该模块允许从任何文件里得到任何的行,并且使用缓存进行优化,常见的情况是从单个文件读取多行.linecache.getline(filename,li ...
scrapy(四): 爬取二级页面的内容
scrapy爬取二级页面的内容 1.定义数据结构item.py文件 # -*- coding: utf-8 -*- ''' field: item.py ''' # Define here the m ...
Scrapy 框架使用 selenium 爬取动态加载内容
使用 selenium 爬取动态加载内容开启中间件 DOWNLOADER_MIDDLEWARES = { 'wangyiPro.middlewares.WangyiproDownloaderMidd ...
Learning Scrapy笔记（六）- Scrapy处理JSON API和AJAX页面
摘要:介绍了使用Scrapy处理JSON API和AJAX页面的方法有时候,你会发现你要爬取的页面并不存在HTML源码,譬如,在浏览器打开http://localhost:9312/static/, ...
爬虫框架scrapy的基本内容
Scrapy介绍 Scrapy是一个为了爬取网站数据,提取结构性数据而编写的应用框架. 可以帮助用户简单快速的部署一个专业的网络爬虫.如果说前面我们写的定制bs4爬虫是”手动挡“,那Scrapy就相当 ...
【C#】菜单功能，将剪贴板JSON内容或者xml内容直接粘贴为类
VS 2015菜单功能,将剪贴板JSON内容或者xml内容直接粘贴为类
.net mvc web api 返回 json 内容，过滤值为null的属性
原文:http://blog.csdn.net/xxj_jing/article/details/49508557 版权声明:本文为博主原创文章,未经博主允许不得转载. .net mvc web ap ...
使用jsonpath解析json内容
JsonPath提供的json解析非常强大,它提供了类似正则表达式的语法,基本上可以满足所有你想要获得的json内容.下面我把官网介绍的每个表达式用代码实现,可以更直观的知道该怎么用它. 一.首先需要 ...

随机推荐

JAVA遍历map元素
第一种: Map map = new HashMap(); Iterator iter = map.entrySet().iterator(); while (iter.hasNext()) { Ma ...
vue+element UI如何导出excel表
导出excel表应按如下规则首先要先安装如下依赖 npm install --save xlsx npm install --save file-saver 接下在在你的代码中去引用这两个 impo ...
javascipt的forEach
1.Array let arr = [1, 2, 3]; arr.forEach(function (element, index, array) { console.log('数组中每个元素:', ...
MySql中引擎
1. InnoDB 引擎 MySQL 5.5 及以后版本中的默认存储引擎,它的优点如下:灾难恢复性好,支持事务,使用行级锁,支持外键关联,支持热备份. InnoDB引擎中的表,其数据的物理组织形式是簇 ...
GIMP中的新建Layer与更改Layer大小
这边可以直接New Layer,新建一个Layer,还可以New from Visible,第二种是将当前的状态下图像复制出来. 改变Layer的大小,一般的方法两种: Crop to Selecti ...
Linux下关于/tmp目录的清理规则
本文将介绍Linux下/tmp目录的清理规则,rhel6和rhel7将以完全不同的两种方式进行清理. RHEL6 tmpwatch命令 tmpwatch 是专门用于解决“删除 xxx天没有被访问/修改 ...
docker系列之基础命令-2
一.查看本地镜像 docker images 二.需要基础的镜像两种方式 1.docker pull centos 可以直接拉起镜像 2.直接用xshell导入就行,docker load -i 加 ...
浏览器中如何获取想要的offsetwidth、、、clientwidth、、offsetheight、、、clientheight。。。
clientWidth是对象看到的宽度(不含边线,即border)scrollWidth是对象实际内容的宽度(若无padding,那就是边框之间距离,如有padding,就是左padding和右pad ...
CSS3-盒模型-resize属性
作用:用来改变元素尺寸大小. 1.resize:none|both|horizontal|vertical|inherit none:不能拖动修改尺寸大小 both:可以拖动元素,修改元素宽高 hor ...
node.js----服务器http
请求网址过程: 1.用户通过浏览器发送一个http的请求到指定的主机 2.服务器接收到该请求,对该请求进行分析和处理 3.服务器处理完成以后,返回对应的数据到用户机器 4.浏览器接收服务器返回的数据, ...

深入爬虫书scrapy 之json内容没有写入文本

settings.py设置

spider代码

piplines代码

items.py

结果如图下

深入爬虫书scrapy 之json内容没有写入文本的更多相关文章

随机推荐

热门专题