Scrapy框架: settings.py设置

# -*- coding: utf-8 -*-

# Scrapy settings for maitian project

#

# For simplicity, this file contains only settings considered important or

# commonly used. You can find more settings consulting the documentation:

#

#     https://doc.scrapy.org/en/latest/topics/settings.html

#     https://doc.scrapy.org/en/latest/topics/downloader-middleware.html

#     https://doc.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'maitian'

SPIDER_MODULES = ['maitian.spiders']

NEWSPIDER_MODULE = 'maitian.spiders'

#不能批量设置

# Crawl responsibly by identifying yourself (and your website) on the user-agent

USER_AGENT = 'maitian (+http://www.yourdomain.com)'

#默认遵守robots协议

# Obey robots.txt rules

ROBOTSTXT_OBEY = False

#设置日志文件

LOG_FILE="maitian.log"

#日志等级分为5种：1.DEBUG 2.INFO 3.Warning 4.ERROR 5.CRITICAL

#等级越高 输出的日志越少

# LOG_LEVEL="INFO"

#scrapy设置最大并发数 默认16

# Configure maximum concurrent requests performed by Scrapy (default: 16)

#CONCURRENT_REQUESTS = 32

#设置批量延迟请求16 等待3秒再发16 秒

# Configure a delay for requests for the same website (default: 0)

# See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay

# See also autothrottle settings and docs

#DOWNLOAD_DELAY = 3

# The download delay setting will honor only one of:

#CONCURRENT_REQUESTS_PER_DOMAIN = 16

#CONCURRENT_REQUESTS_PER_IP = 16

#cookie 不生效 默认是True

# Disable cookies (enabled by default)

#COOKIES_ENABLED = False

#远程

# Disable Telnet Console (enabled by default)

#TELNETCONSOLE_ENABLED = False

#加载默认的请求头

# Override the default request headers:

#DEFAULT_REQUEST_HEADERS = {

#   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

#   'Accept-Language': 'en',

#}

#爬虫中间件

# Enable or disable spider middlewares

# See https://doc.scrapy.org/en/latest/topics/spider-middleware.html

#SPIDER_MIDDLEWARES = {

#    'maitian.middlewares.MaitianSpiderMiddleware': 543,

#}

#下载中间件

# Enable or disable downloader middlewares

# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html

#DOWNLOADER_MIDDLEWARES = {

#    'maitian.middlewares.MaitianDownloaderMiddleware': 543,

#}

# Enable or disable extensions

# See https://doc.scrapy.org/en/latest/topics/extensions.html

#EXTENSIONS = {

#    'scrapy.extensions.telnet.TelnetConsole': None,

#}

#在配置文件 开启管道

#优先级的范围 0--1000；值越小 优先级越高

# Configure item pipelines

# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html

#ITEM_PIPELINES = {

#    'maitian.pipelines.MaitianPipeline': 300,

#}

# Enable and configure the AutoThrottle extension (disabled by default)

# See https://doc.scrapy.org/en/latest/topics/autothrottle.html

#AUTOTHROTTLE_ENABLED = True

# The initial download delay

#AUTOTHROTTLE_START_DELAY = 5

# The maximum download delay to be set in case of high latencies

#AUTOTHROTTLE_MAX_DELAY = 60

# The average number of requests Scrapy should be sending in parallel to

# each remote server

#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0

# Enable showing throttling stats for every response received:

#AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)

# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings

#HTTPCACHE_ENABLED = True

#HTTPCACHE_EXPIRATION_SECS = 0

#HTTPCACHE_DIR = 'httpcache'

#HTTPCACHE_IGNORE_HTTP_CODES = []

#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

Scrapy框架: settings.py设置的更多相关文章

Scrapy框架: middlewares.py设置
# -*- coding: utf-8 -*- # Define here the models for your spider middleware # # See documentation in ...
Scrapy框架: pipelines.py设置
保存数据到json文件 # -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your p ...
Django settings.py设置 DEBUG=False后静态文件无法加载解决
解决办法: settings.py 文件 DEBUG = False STATIC_ROOT = os.path.join(BASE_DIR,'static') #新增 urls.py文件(项目的) ...
django的settings.py设置static
DEBUG = True ################ STATICFILES ################ # A list of locations of additional stati ...
django的settings.py设置session
############ # SESSIONS # ############ SESSION_CACHE_ALIAS = 'default' # Cache to store session data ...
django 配置文件settings.py 设置模板
INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'dj ...
Django settings.py 的media路径设置
转载自:http://www.xuebuyuan.com/676599.html 在一个 models 中使用 FileField 或 ImageField 需要以下步骤: 1. 在你的 settin ...
Django之settings.py 的media路径设置
在一个 models 中使用 FileField 或 ImageField 需要以下步骤: 1. 在你的 settings.py文件中, 定义一个完整路径给MEDIA_ROOT 以便让 Django在 ...
Scrapy框架实战-妹子图爬虫
Scrapy这个成熟的爬虫框架,用起来之后发现并没有想象中的那么难.即便是在一些小型的项目上,用scrapy甚至比用requests.urllib.urllib2更方便,简单,效率也更高.废话不多说, ...

随机推荐

python 装饰器第七步：带有参数的装饰器
#第七步:带有参数的装饰器 #两个基本函数用同一个装饰器装饰 def outer(arg): print(arg) #这是装饰器的代码 def kuozhan(func): print(func) # ...
Vue页面跳转动画效果实现
Vue页面跳转动画效果实现:https://juejin.im/post/5ba358a56fb9a05d2068401d
Spark Streaming + Kafka 整合向导之createDirectStream
启动zk: zkServer.sh start 启动kafka:kafka-server-start.sh $KAFKA_HOME/config/server.properties 创建一个topic ...
【题解】小X的AK计划
题目描述虽然在小X的家乡,有机房一条街,街上有很多机房.每个机房里都有一万个人在切题.小X刚刷完CodeChef,准备出来逛逛.机房一条街有n个机房,第i个机房的坐标为xi,小X的家坐标为0.小X在 ...
Java-技术专区-异步编程指南
通过本文你可以了解到下面这些知识点: Future 模式介绍以及核心思想核心线程数.最大线程数的区别,队列容量代表什么: ThreadPoolTaskExecutor 饱和策略: SpringBoo ...
Codeforces The Child and Toy
The Child and Toy time limit per test1 second On Children's Day, the child got a toy from Delayyy as ...
HTML学习笔记（基础部分）
一.基本概念 1.HTML:超文本标记语言(HyperText Markup Language)是一种用于创建网页的标准标记语言. 2.HTML文档的后缀名:.html 或 .htm 3.标签:由尖括 ...
开源的android客户端，ghost网站
https://github.com/TryGhost/Ghost-Android http://docs.ghostchina.com/zh/
CentOS 安装 docker-compose
1.sudo curl -L "https://get.daocloud.io/docker/compose/releases/download/1.24.1/docker-compose- ...
Java中的多表&事务&DCL&一个多表操作例子
准备sql: 创建部门表 CREATE TABLE dept( id INT PRIMARY KEY AUTO_INCREMENT, NAME VARCHAR(20) ); INSERT INTO d ...

Scrapy框架: settings.py设置

Scrapy框架: settings.py设置的更多相关文章

随机推荐

热门专题