爬豆瓣影评，记下解决maximum recursion depth exceeded in cmp

#主要是爬取后给别人做自然语言分析，没其他意思。

#coding=utf8

import requests,re

from lxml import etree

import sys

reload(sys)

sys.setdefaultencoding('utf8')

sys.setrecursionlimit()  #解决maximum recursion depth exceeded in cmp

def craw(url):

    headerx={

    'Cookie':'bid=OIBtzThxxA; ct=y; _pk_ref.100001.4cf6=%5B%22%22%2C%22%22%2C1502186407%2C%22http%3A%2F%2Fqianxun.baidu.com%2Fmovie%2Fcard_2162.html%22%5D; __utmt=1; ps=y; dbcl2="165xxx93:UV/wbzXasBQ"; ck=d-ep; _pk_id.100001.4cf6=7bff167cf6dxxxxxx.10.1502186411; __utmc=30149280; __utmz=30149280.1501649654.7.7.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; __utma=22369xxxxx59553.1502186411.2; __utmb=223695111.48.10.1502186411; __utmc=223695111; __utmz=223695111.1500959553.1.1.utmcsr=qianxun.baidu.com|utmccn=(referral)|utmcxxxxx.html; push_noty_num=0; push_doumail_num=0; ap=1'

    }

    while :

        try:

            resp=requests.get(url,headers=headerx)

            if resp.status_code==:

                flag=

                break

            else:

                pass

        except Exception,e:

            print e

    selector=etree.HTML(resp.content.decode('utf8'))

    #print resp.content.decode('utf8')

    all_comment=selector.xpath('//div[@class="comment-item"]')

    for comment in all_comment:

        #print  etree.tounicode(comment),'************************'

        star_class=comment.xpath('.//span[contains(@class,"allstar")]/@class')

        if star_class:

            starx=re.findall('tar(.*?)0',star_class[])[]

        else:

            starx=   #有的评论没有打星

        textx=comment.xpath('.//div[@class="comment"]/p/text()')[]

        print starx,'星  ',textx

        f.write('%s星 %s\r\n'%(starx,textx))

    next_start=re.search(u'前页[\s\S]*?<a href="\?start=(.*?)&amp[\s\S]*?后页',resp.content.decode('utf8')).group()

    next_url='https://movie.douban.com/subject/25662329/comments?start=%s&limit=20&sort=new_score&status=P'%next_start

    print '爬取下一页：',next_url

    craw(next_url)                            ####这行调用自己

if __name__=="__main__":

  f = open('pinlun.txt', 'a')

  craw('https://movie.douban.com/subject/25662329/comments?start=71726&limit=20&sort=new_score&status=P')

豆瓣的影评，每一页的页面链接不是可以预知的，url中start从0开始，第二页是20，第三页是40，这是理想状态,但真实不是这样。所以要从网页中提取下一页的链接，单线程，自己调用自己，每爬了几十分钟后报错maximum recursion depth exceeded in cmp，以为是个偶然，反复把脚本停了再重启了好几次都是这样，就找下答案了，上面是函数里面有个调用函数本身的代码，反复的调用自己，这种次数不能超过900，设置了sys.setrecursionlimit=100000果然好了。

爬豆瓣影评，记下解决maximum recursion depth exceeded in cmp的更多相关文章

记 suds 模块循环依赖的坑-RuntimeError: maximum recursion depth exceeded
下面是soa接口调用的核心代码 #! /usr/bin/python # coding:utf-8 from suds.client import Clientdef SoaRequest(wsdl, ...
python递归深度报错--RuntimeError: maximum recursion depth exceeded
当你的程序递归的次数超过999次的时候,就会引发RuntimeError: maximum recursion depth exceeded. 解决方法两个: 1.增加系统的递归调用的次数: impo ...
Python递归报错：RuntimeError: maximum recursion depth exceeded in comparison
Python中默认的最大递归深度是989,当尝试递归第990时便出现递归深度超限的错误: RuntimeError: maximum recursion depth exceeded in compa ...
python maximum recursion depth exceeded 处理办法
1.在执行命令 pyinstaller -F D:\py\programe\banksystem.py打包生成.exe文件时报错:python maximum recursion depth exce ...
Odoo8查询产品时提示"maximum recursion depth exceeded while calling a Python object"
今天在生产系统中查询产品时,莫名提示错误:maximum recursion depth exceeded while calling a Python object,根据错误日志提示,发现在查询产品 ...
scrapy RuntimeError: maximum recursion depth exceeded while calling a Python object 超出python最大递归数异常
2019-10-21 19:01:00 [scrapy.core.engine] INFO: Spider opened2019-10-21 19:01:00 [scrapy.extensions.l ...
python --RecursionError: maximum recursion depth exceeded in comparison
在学习汉娜塔的时候,遇到一个error RecursionError: maximum recursion depth exceeded in comparison 经过百度,百度的方法: 加上: i ...
python 报错：RecursionError: maximum recursion depth exceeded
RecursionError:maximun recursion depth exceeded 超过了最大递归深度原因: 在使用@property装饰器时,方法名与实例的属性同名,在赋值进从而产生了 ...
pyinstaller打包报错： RecursionError: maximum recursion depth exceeded 已经解决
看上去似乎是某个库自己递归遍历超过了python的栈高度限制搜索了一番,很快找到了解决办法: https://stackoverflow.com/questions/38977929/pyinsta ...

随机推荐

led子系统
最简单的led驱动就是从端口输出0或1来关闭或点亮灯.而我们这里讲的led子系统,主要是对led事件进行了分装和优化,这里我们主要讲的是可以实现跨平台的led驱动.不管你是使用三星的平台,还是Atm ...
Android——使用Toolbar + DrawerLayout快速实现高大上菜单侧滑（转）
今天就来使用官方支持库来快速实现这类效果,需要使用到Toolbar和DrawerLayout,详细步骤如下:(如果你还不知道这两个Widget,先自己Google吧~) 1.首先需要添加appcomp ...
【C】——strtok()和strtok_r()
下面的说明摘自于最新的Linux内核2.6.29,说明了strtok()这个函数已经不再使用,由速度更快的strsep()代替 /** linux/lib/string.c** Copyright ( ...
PHP和MySQL实现消息队列
最近遇到一个批量发送短信的需求,短信接口是第三方提供的.刚开始想到,获取到手机号之后,循环调用接口发送不就可以了吗? 但很快发现问题:当短信数量很大时,不仅耗时,而且成功率很低. 于是想到,用PHP和 ...
R语言：读取数据
主要学习如何把几种常用的数据格式导入到R中进行处理,并简单介绍如何把R中的数据保存为R数据格式和csv文件. 1.保存和加载R的数据(与R.data的交互:save()函数和load()函数)a &l ...
《FPGA全程进阶---实战演练》第二十一章之几种常用电平分析及特性
TTL,CMOS以及LVTTL,LVCMOS TTL和CMOS是数字电路中两种常见的逻辑电平,LVTTL和LVCMOS是两者低电平版本.TTL是流控器件,输入电阻小,TTL电平器件速度快,驱动能力大, ...
Self20171218_Eclipse+TestNg HelloWorld
作为一个经典的入门例子,这里展示如何开始使用TestNG单元测试框架. 使用的工具 : TestNG 6.8.7 Maven 3 Eclipse IDE TestNG下载并安装从这里 http:// ...
MAC配置Xcode的Cocos2d-x环境
Version:0.9 StartHTML:-1 EndHTML:-1 StartFragment:00000099 EndFragment:00003988 1.Mac配置环境变量,即编辑命令: o ...
iPhone开发中，关于视图跳转的总结（转）
iPhone开发中,关于视图跳转的总结 iPhone开发中从一个视图跳到另一个视图有三种方法: 1. self.view addSubView:view .self.window addSubView ...
C#提高-------------------Assembly和Module的使用-------反射内涵
转 :C#反射技术概念作用和要点反射(Reflection)是.NET中的重要机制,通过放射,可以在运行时获得.NET中每一个类型(包括类.结构.委托.接口和枚举等)的成员,包括方法.属性.事件,以 ...

爬豆瓣影评，记下解决maximum recursion depth exceeded in cmp

爬豆瓣影评，记下解决maximum recursion depth exceeded in cmp的更多相关文章

随机推荐

热门专题