用python requests库写一个人人网相册爬虫

担心人人网会黄掉，写个爬虫，把我的相册照片都下载下来。代码如下：

# -*- coding: utf-8 -*-
import requests
import json
import os
 
def mkdir(path):
    path=path.strip()
    path=path.rstrip("\\")
    isExists=os.path.exists(path)
    if not isExists:
        print path+u' 创建成功'
        os.makedirs(path)
        return "yes"
    else:
        print path+u' 目录已存在'
        return "no"
 
def login_renren(s):
    origin_url = 'http://www.renren.com'
    login_data = {
        'email':'用户名',
        'domain':'renren.com',
        'origURL':'http://www.renren.com/home',
        'key_id':'',
        'captcha_type':'web_login',
        'password':'密码抓包获得',
        'rkey':'rkey抓包获得'
    }
    r = s.post("http://www.renren.com/ajaxLogin/login?1=1&uniqueTimestamp=2016742045262", data = login_data)
    if 'true' in r.content:
        print u'登录人人网成功'
    return s
def get_albums(s):
    r = s.get('http://photo.renren.com/photo/278382090/albumlist/v7?showAll=1#')
    #print r.content
    content = r.content
    index1 = content.find('nx.data.photo = ')
    #print index1
    index2 = content.find('nx.data.hasHiddenAlbum =')
    #print index2
    target_json = content[index1+16:index2].strip()
    target_json = target_json[0:len(target_json)-1]
    #print target_json
    data = json.loads(target_json.replace("\'", '"'));
    album_list = data['albumList']
    album_count = album_list['albumCount']
    tip = u'一共有'+str(album_count)+u'个相册'
    print tip
    album_ids = []
    for album in album_list['albumList']:
        #print album['albumName']
        album_ids.append(album['albumId'])
    return album_ids,s
 
def download_albums(album_ids,s):
    #访问相册
    for album_id in album_ids:
        album_url = 'http://photo.renren.com/photo/278382090/album-'+album_id+'/v7'
        r = s.get(album_url)
        if "photoId" in r.content:
            print u'进入相册成功'
            #print r.content
            content = r.content
            index1 = content.find('nx.data.photo = ')
            #print index1
            index2 = content.find('; define.config')
            #print index2
            target_json = content[index1+16:index2].strip()
            target_json = target_json[13:len(target_json)-2]
            #print target_json
            data = json.loads(target_json.replace("\'", '"'));
            photos = data['photoList']
            album_name = data['albumName']
            # 定义并创建目录
            album_path = 'd:\\'+album_name
            #print album_path
            if mkdir(album_path)=='yes':
                for photo in photos:
                    #print photo['url']
                    image_name = photo['photoId']
                    photo_url = photo['url']
                    r = requests.get(photo_url)
                    image_path = album_path+'/'+image_name+'.jpg'
                    f = open(image_path, 'wb')
                    f.write(r.content)
                    f.close()
                    tip = u'相片'+image_name+u'下载成功'
                    print tip
            else:
                print u'相册已经下载'
 
#执行该文件的主过程
if __name__ == '__main__':
    #创建requests会话
    s = requests.Session()
    #登录人人网
    s = login_renren(s)
    #获取相册列表
    album_ids,s = get_albums(s)
    #下载相册
    download_albums(album_ids,s)

搞定！运行效果如下：

用python requests库写一个人人网相册爬虫的更多相关文章

使用python requests库写接口自动化测试--记录学习过程中遇到的坑（1）
一直听说python requests库对于接口自动化测试特别合适,但由于自身代码基础薄弱,一直没有实践: 这次赶上公司项目需要,同事小伙伴们一起学习写接口自动化脚本,听起来特别给力,赶紧实践一把: ...
大概看了一天python request源码。写下python requests库发送 get,post请求大概过程。
python requests库发送请求时,比如get请求,大概过程. 一.发起get请求过程:调用requests.get(url,**kwargs)-->request('get', url ...
用python的time库写一个进度条
运算符算数运算如a=10,b=20 +两个数相加 a+b=30 -两个数相减 a-b=-10 两个数相乘 a****b =200 /两个数相除b/a=2 %取模,并返回余数b%a=0 幂,a*** ...
【python爬虫】用requests库模拟登陆人人网
说明:以前是selenium登陆取cookie的方法比较复杂,改用这个 """ 用requests库模拟登陆人人网 """ import r ...
Python Requests库简单入门
我对Python网络爬虫的学习主要是基于中国慕课网上嵩天老师的讲授,写博客的目的是为了更好触类旁通,并且作为学习笔记之后复习回顾. 1.引言 requests 库是一个简洁且简单的处理HTTP请求的第 ...
python requests库学习笔记（上）
尊重博客园原创精神,请勿转载! requests库官方使用手册地址:http://www.python-requests.org/en/master/:中文使用手册地址:http://cn.pytho ...
Python——Requests库的开发者接口
本文介绍 Python Requests 库的开发者接口,主要内容包括: 目录一.主要接口 1. requests.request() 2. requests.head().get().post() ...
Python:requests库、BeautifulSoup4库的基本使用（实现简单的网络爬虫）
Python:requests库.BeautifulSoup4库的基本使用(实现简单的网络爬虫) 一.requests库的基本使用 requests是python语言编写的简单易用的HTTP库,使用起 ...
Python requests库的使用（一）
requests库官方使用手册地址:http://www.python-requests.org/en/master/:中文使用手册地址:http://cn.python-requests.org/z ...

随机推荐

使用Navicat for Oracle新建表空间、用户及权限赋予---来自烂泥
Navicat for Oracle是有关Oracle数据库的客户端工具.通过这个客户端,我们可以图形方式对Oracle数据库进行操作. 说明我们此次试验的Oracle数据库版本是Oracle 10 ...
Codeforce Round #220 Div2
这场气场太大,居然一个题不会! 所以没交! 赛后发现 A:陷阱多- -!不要超过上下界,可以上去再下来! B:不会做! C:自己想太多- -!
map容器
map容器一般用于对字符串进行编号,主要用于建图方面,例如把城市名按数字进行编号 #include"stdio.h" #include"string.h" #i ...
"数学口袋精灵"bug的发现及单元测试
1.项目内容: 团队项目:二次开发至此,我们有了初步的与人合作经验,接下来投入到更大的团队中去. 也具备了一定的个人能力,能将自己的代码进行测试.接下来尝试在别人已有的基础上进行开发. 上一界51冯 ...
如何在OneNote2013中粘贴高亮的代码
有的时候想在OneNote粘贴代码,但是直接复制粘贴进去的代码没有高亮,下面有一个办法让自己的代码在OneNote里面更加完整美观. 工具/原料 Notepad++ word2013 OneNote2 ...
什么是XML
什么是 XML? XML 指可扩展标记语言(EXtensible Markup Language) XML 是一种标记语言,很类似 HTML XML 的设计宗旨是传输数据,而非显示数据 XML 标签没 ...
夺命雷公狗—angularjs—8—ng-class的简单用法
我们在正常的业务处理中往往会遇到一些逻辑类的问题,比如各行换色,现在angularjs里面也给我们提供了一个小小的的class处理的方式,废话不多说,如下所示: <!doctype html&g ...
夺命雷公狗---linux之红帽的安装
夺命雷公狗分享的第二套安装linux方法是RadHad的安装方法,,, 点击然后就自动重启了
关于CentOS 7.1后期维护的问题
1.问题描述:在使用ssh服务远程登录的时候,当显示输入密码,特别特别的慢,刚刚搭建服务器的时候还很正常,经过一个假期我实在忍不了它了,故决定解决此问题.服务器系统:CentOS 7.1 解决方式: ...
「ruby」使用rmagick处理图像
安装rmagick gem A new release 2.13.2 of RMagick is now available on github as well as rubygems. This r ...

用python requests库写一个人人网相册爬虫

用python requests库写一个人人网相册爬虫的更多相关文章

随机推荐

热门专题