Scrapy——settings配置文件

# -*- coding: utf-8 -*-

# Scrapy settings for tencent project

#

# For simplicity, this file contains only settings considered important or

# commonly used. You can find more settings consulting the documentation:

#

#     https://doc.scrapy.org/en/latest/topics/settings.html

#     https://doc.scrapy.org/en/latest/topics/downloader-middleware.html

#     https://doc.scrapy.org/en/latest/topics/spider-middleware.html

# 项目名

BOT_NAME = 'tencent'

# 爬虫位置

SPIDER_MODULES = ['tencent.spiders']

NEWSPIDER_MODULE = 'tencent.spiders'

# 日志输出

LOG_LEVEL = "INFO"

# Crawl responsibly by identifying yourself (and your website) on the user-agent

USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'

# Robot协议

# Obey robots.txt rules

ROBOTSTXT_OBEY = True

# 设置最大并发请求

# Configure maximum concurrent requests performed by Scrapy (default: 16)

#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)

# See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay

# See also autothrottle settings and docs

# 每次请求之前的下载延迟

#DOWNLOAD_DELAY = 3

# The download delay setting will honor only one of:

#CONCURRENT_REQUESTS_PER_DOMAIN = 16   #每个域名的最大并发请求数

#CONCURRENT_REQUESTS_PER_IP = 16       #每个ip的最大并发请求数

# 下次请求之前携带之前请求的cookie

# Disable cookies (enabled by default)

#COOKIES_ENABLED = False

# 禁用Telnet控制台（默认启用）

# Disable Telnet Console (enabled by default)

#TELNETCONSOLE_ENABLED = False

# 覆盖默认请求头，USER_AGENT在之前，放此处将不起作用

# Override the default request headers:

#DEFAULT_REQUEST_HEADERS = {

#   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

#   'Accept-Language': 'en',

#}

# 启用或禁用spider中间件

# Enable or disable spider middlewares

# See https://doc.scrapy.org/en/latest/topics/spider-middleware.html

#SPIDER_MIDDLEWARES = {

#    'tencent.middlewares.TencentSpiderMiddleware': 543,

#}

# 启用或禁用下载器中间件

# Enable or disable downloader middlewares

# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html

#DOWNLOADER_MIDDLEWARES = {

#    'tencent.middlewares.TencentDownloaderMiddleware': 543,

#}

#启用或禁用扩展程序

# Enable or disable extensions

# See https://doc.scrapy.org/en/latest/topics/extensions.html

#EXTENSIONS = {

#    'scrapy.extensions.telnet.TelnetConsole': None,

#}

# 配置项目管道

# Configure item pipelines

# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html

ITEM_PIPELINES = {

   'tencent.pipelines.TencentPipeline': 300,

}

# 启用和配置AutoThrottle扩展（默认情况下禁用）

# Enable and configure the AutoThrottle extension (disabled by default)

# See https://doc.scrapy.org/en/latest/topics/autothrottle.html

#AUTOTHROTTLE_ENABLED = True

#初始下载延迟

# The initial download delay

#AUTOTHROTTLE_START_DELAY = 5

#在高延迟的情况下设置的最大下载延迟

# The maximum download delay to be set in case of high latencies

#AUTOTHROTTLE_MAX_DELAY = 60

# The average number of requests Scrapy should be sending in parallel to

# each remote server

#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0

#启用显示所收到的每个响应的调节统计信息：

# Enable showing throttling stats for every response received:

#AUTOTHROTTLE_DEBUG = False

#启用和配置HTTP缓存（默认情况下禁用）

# Enable and configure HTTP caching (disabled by default)

# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings

#HTTPCACHE_ENABLED = True

#HTTPCACHE_EXPIRATION_SECS = 0

#HTTPCACHE_DIR = 'httpcache'

#HTTPCACHE_IGNORE_HTTP_CODES = []

#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

Scrapy——settings配置文件的更多相关文章

maven settings 配置文件
maven settings 配置文件 <?xml version="1.0" encoding="UTF-8"?> <settings xm ...
Scrapy 框架配置文件
配置文件基本配置 #1.项目名称,默认的USER_AGENT由它来构成,也作为日志记录的日志名 BOT_NAME = 'Amazon' #2.爬虫应用路径 SPIDER_MODULES = ['Am ...
高仿 django插拔式及 settings配置文件
目录基于django中间件实现插拔式仿django settings 基于django中间件实现插拔式 start.py from notify import * send_all('好嗨哦') ...
【UE4 C++】 Config Settings配置文件(.ini)
简介常见存储路径 \Engine\Config\ \Engine\Saved\Config\ (运行后生成) [ProjectName]\Config\ [ProjectName]\Saved\Co ...
Django settings配置文件
由来:为什么我在用django配置的时候导入的不是我项目名下的那个settings 但是我配置了之后依然能够起作用,这是为什么? from django.conf import settings # ...
django settings配置文件ALLOWED_HOSTS
ALLOWED_HOSTS列表为了防止黑客入侵,只允许列表中的ip地址访问填写上“*”可以使所有的网址都能访问Django项目了,项目测试的时候,可以这么做.这样就失去了保护
Django的settings配置文件
一.邮件配置 EMAIL_BACKEND = 'django.core.mail.backends.smtp.EmailBackend' EMAIL_HOST = 'smtp.qq.com' EMAI ...
Scrapy 框架安装五大核心组件 settings 配置管道存储
scrapy 框架的使用博客: https://www.cnblogs.com/bobo-zhang/p/10561617.html 安装: pip install wheel 下载 Twisted ...
第三百四十九节，Python分布式爬虫打造搜索引擎Scrapy精讲—cookie禁用、自动限速、自定义spider的settings，对抗反爬机制
第三百四十九节,Python分布式爬虫打造搜索引擎Scrapy精讲—cookie禁用.自动限速.自定义spider的settings,对抗反爬机制 cookie禁用就是在Scrapy的配置文件set ...

随机推荐

汉诺塔（hanoi）
汉诺塔代码: def hanoi(n,x,y,z): if n == 1: print(x,'-->',z) else: hanoi(n-1,x,z,y) print(x,'-->',z) ...
项目中遇到的死锁问题： Lock wait timeout exceeded; try restarting transaction
最近项目中频繁出现 Lock wait timeout exceeded; try restarting transaction这个错误,把我们弄得痛苦不堪啊,为了解决问题,上网上找好多资料,终于把 ...
SpringMVC错误集中营
1.eclipse里的错误提示为The import javax.servlet.http.HttpServletRequest cannot be resolved 1.这是因为工程里面web-in ...
用word发CSDN blog
目前大部分的博客作者在用Word写博客这件事情上都会遇到以下3个痛点: 1.所有博客平台关闭了文档发布接口,用户无法使用Word,Windows Live Writer等工具来发布博客.使用Word写 ...
七）oracle 2 mysql
/* Navicat MySQL Data Transfer Source Server : localhost Source Server Version : 50527 Source Host : ...
团体程序设计天梯赛L1-024 后天 2017-03-22 17:59 68人阅读评论(0) 收藏
L1-024. 后天时间限制 400 ms 内存限制 65536 kB 代码长度限制 8000 B 判题程序 Standard 作者陈越如果今天是星期三,后天就是星期五:如果今天是星期六,后天就 ...
log4j.properties加入内容
log4j.rootLogger=INFO, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender. ...
docker swarm 命令
初始化swarm manager并制定网卡地址 docker swarm init --advertise-addr 192.168.10.117 强制删除集群,如果是manager,需要加–forc ...
C# Redis辅助类封装与简单聊天室的实现思路说明
虽然redis api的功能比较齐全,但个人觉得为了更好的方便学习和使用,还是很有必有做一个类似DBHelper的帮助类辅助类主要功能(代码会在最后放出来) 1. 事件监听: 重新配置广播时(主从同 ...
layer模态窗简单使用
layer.open({ type: 1,//模态窗种类 skin: "layui-layer-rim", title: "编辑信息", area: [&quo ...

Scrapy——settings配置文件

Scrapy——settings配置文件的更多相关文章

随机推荐

热门专题