Scrapy框架: settings.py设置
# -*- coding: utf-8 -*-
# Scrapy settings for maitian project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# https://doc.scrapy.org/en/latest/topics/settings.html
# https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
# https://doc.scrapy.org/en/latest/topics/spider-middleware.html
BOT_NAME = 'maitian'
SPIDER_MODULES = ['maitian.spiders']
NEWSPIDER_MODULE = 'maitian.spiders'
#不能批量设置
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'maitian (+http://www.yourdomain.com)'
#默认遵守robots协议
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
#设置日志文件
LOG_FILE="maitian.log"
#日志等级分为5种:1.DEBUG 2.INFO 3.Warning 4.ERROR 5.CRITICAL
#等级越高 输出的日志越少
# LOG_LEVEL="INFO"
#scrapy设置最大并发数 默认16
# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32
#设置批量延迟请求16 等待3秒再发16 秒
# Configure a delay for requests for the same website (default: 0)
# See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16
#cookie 不生效 默认是True
# Disable cookies (enabled by default)
#COOKIES_ENABLED = False
#远程
# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False
#加载默认的请求头
# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
# 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
# 'Accept-Language': 'en',
#}
#爬虫中间件
# Enable or disable spider middlewares
# See https://doc.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
# 'maitian.middlewares.MaitianSpiderMiddleware': 543,
#}
#下载中间件
# Enable or disable downloader middlewares
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
# 'maitian.middlewares.MaitianDownloaderMiddleware': 543,
#}
# Enable or disable extensions
# See https://doc.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
# 'scrapy.extensions.telnet.TelnetConsole': None,
#}
#在配置文件 开启管道
#优先级的范围 0--1000;值越小 优先级越高
# Configure item pipelines
# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
#ITEM_PIPELINES = {
# 'maitian.pipelines.MaitianPipeline': 300,
#}
# Enable and configure the AutoThrottle extension (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False
# Enable and configure HTTP caching (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
Scrapy框架: settings.py设置的更多相关文章
- Scrapy框架: middlewares.py设置
# -*- coding: utf-8 -*- # Define here the models for your spider middleware # # See documentation in ...
- Scrapy框架: pipelines.py设置
保存数据到json文件 # -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your p ...
- Django settings.py设置 DEBUG=False后静态文件无法加载解决
解决办法: settings.py 文件 DEBUG = False STATIC_ROOT = os.path.join(BASE_DIR,'static') #新增 urls.py文件(项目的) ...
- django的settings.py设置static
DEBUG = True ################ STATICFILES ################ # A list of locations of additional stati ...
- django的settings.py设置session
############ # SESSIONS # ############ SESSION_CACHE_ALIAS = 'default' # Cache to store session data ...
- django 配置文件settings.py 设置模板
INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'dj ...
- Django settings.py 的media路径设置
转载自:http://www.xuebuyuan.com/676599.html 在一个 models 中使用 FileField 或 ImageField 需要以下步骤: 1. 在你的 settin ...
- Django之settings.py 的media路径设置
在一个 models 中使用 FileField 或 ImageField 需要以下步骤: 1. 在你的 settings.py文件中, 定义一个完整路径给MEDIA_ROOT 以便让 Django在 ...
- Scrapy框架实战-妹子图爬虫
Scrapy这个成熟的爬虫框架,用起来之后发现并没有想象中的那么难.即便是在一些小型的项目上,用scrapy甚至比用requests.urllib.urllib2更方便,简单,效率也更高.废话不多说, ...
随机推荐
- python 装饰器 第七步:带有参数的装饰器
#第七步:带有参数的装饰器 #两个基本函数用同一个装饰器装饰 def outer(arg): print(arg) #这是装饰器的代码 def kuozhan(func): print(func) # ...
- Vue页面跳转动画效果实现
Vue页面跳转动画效果实现:https://juejin.im/post/5ba358a56fb9a05d2068401d
- Spark Streaming + Kafka 整合向导之createDirectStream
启动zk: zkServer.sh start 启动kafka:kafka-server-start.sh $KAFKA_HOME/config/server.properties 创建一个topic ...
- 【题解】小X的AK计划
题目描述 虽然在小X的家乡,有机房一条街,街上有很多机房.每个机房里都有一万个人在切题.小X刚刷完CodeChef,准备出来逛逛.机房一条街有n个机房,第i个机房的坐标为xi,小X的家坐标为0.小X在 ...
- Java-技术专区-异步编程指南
通过本文你可以了解到下面这些知识点: Future 模式介绍以及核心思想 核心线程数.最大线程数的区别,队列容量代表什么: ThreadPoolTaskExecutor 饱和策略: SpringBoo ...
- Codeforces The Child and Toy
The Child and Toy time limit per test1 second On Children's Day, the child got a toy from Delayyy as ...
- HTML学习笔记(基础部分)
一.基本概念 1.HTML:超文本标记语言(HyperText Markup Language)是一种用于创建网页的标准标记语言. 2.HTML文档的后缀名:.html 或 .htm 3.标签:由尖括 ...
- 开源的android客户端,ghost网站
https://github.com/TryGhost/Ghost-Android http://docs.ghostchina.com/zh/
- CentOS 安装 docker-compose
1.sudo curl -L "https://get.daocloud.io/docker/compose/releases/download/1.24.1/docker-compose- ...
- Java中的多表&事务&DCL&一个多表操作例子
准备sql: 创建部门表 CREATE TABLE dept( id INT PRIMARY KEY AUTO_INCREMENT, NAME VARCHAR(20) ); INSERT INTO d ...