1.记我的第一次python爬虫爬取网页视频

$KAMISAMALZ 2024-09-08 03:08:05 原文

It is my first time to public some notes on this platform, and I just want to improve myself by recording something that I learned everyday.

Partly , I don't know much about network crawler , and that makes me just understanding something that floats on the surface.

But since I was learning three days when I got a method to craw some videos on the web.

I am very excited, I just know how to craw something from the internet to computer hard disk. It is a start, surely, this is the first step, I just got to keep moving.

Step 1: Find a video on the web page, then plays the video online, press the keyboard shortcuts F12, it occurs element-checked page

as the following pictures:

Click .ts file and then you will see the URL, that is the point.

Step 2: Writing python code, as following:

 from multiprocessing import Pool

 import requests

 def demo(i):

     try:

         url = "https://vip.holyshitdo.com/2019/5/8/c2417/playlist%0d.ts"%i

         #simulate browser

         print(url)

         headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36Name','Referer':'http://91.com','Content-Type': 'multipart/form-data; session_language=cn_CN'}

         r = requests.get(url, headers=headers)

         #print(r.content) save the video with binary format

         with open('./mp4/{}'.format(url[-10:]),'wb')as f:

             f.write(r.content)

     except:

         return ""

 if __name__=='__main__':　　　　　　　　　　 #　program entry

     pool = Pool(10)　　　　　　　　　　　　　 #　create a process pool

     for i in range(193):

         pool.apply_async(demo,(i,))　　　　#　execute

     pool.close()

     pool.join()

Step 3:Running code

Step 4 : Last but not least, merge .ts fragments into MP4 format.

Get to the terminal interface , under the saved diretory and use command line "copy /b *.ts newfile.mp4"

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

THAT IS ALL FOR NOW, TO BE CONTINUED～(￣▽￣～)~

1.记我的第一次python爬虫爬取网页视频的更多相关文章

python爬虫——爬取网页数据和解析数据
1.网络爬虫的基本概念网络爬虫(又称网络蜘蛛,机器人),就是模拟客户端发送网络请求,接收请求响应,一种按照一定的规则,自动地抓取互联网信息的程序.只要浏览器能够做的事情,原则上,爬虫都能够做到. 2 ...
Python爬虫爬取qq视频等动态网页全代码
环境:py3.4.4 32位需要插件:selenium BeautifulSoup xlwt # coding = utf-8 from selenium import webdriverfrom ...
Python爬虫爬取网页图片
没想到python是如此强大,令人着迷,以前看见图片总是一张一张复制粘贴,现在好了,学会python就可以用程序将一张张图片,保存下来. 今天逛贴吧看见好多美图,可是图片有点多,不想一张一张地复制粘贴 ...
Python爬虫 - 爬取百度html代码前200行
Python爬虫 - 爬取百度html代码前200行 - 改进版, 增加了对字符串的.strip()处理源代码如下: # 改进版, 增加了 .strip()方法的使用 # coding=utf-8 ...
用Python爬虫爬取广州大学教务系统的成绩（内网访问）
用Python爬虫爬取广州大学教务系统的成绩(内网访问) 在进行爬取前,首先要了解: 1.什么是CSS选择器? 每一条css样式定义由两部分组成,形式如下: [code] 选择器{样式} [/code ...
使用Python爬虫爬取网络美女图片
代码地址如下:http://www.demodashi.com/demo/13500.html 准备工作安装python3.6 略安装requests库(用于请求静态页面) pip install ...
Python爬虫|爬取喜马拉雅音频
"GOOD Python爬虫|爬取喜马拉雅音频喜马拉雅是知名的专业的音频分享平台,用户规模突破4.8亿,汇集了有声小说,有声读物,儿童睡前故事,相声小品等数亿条音频,成为国内发展最快.规模 ...
python爬虫爬取内容中，-xa0，-u3000的含义
python爬虫爬取内容中,-xa0,-u3000的含义 - CSDN博客 https://blog.csdn.net/aiwuzhi12/article/details/54866310
Python爬虫爬取全书网小说，程序源码+程序详细分析
Python爬虫爬取全书网小说教程第一步:打开谷歌浏览器,搜索全书网,然后再点击你想下载的小说,进入图一页面后点击F12选择Network,如果没有内容按F5刷新一下点击Network之后出现如下 ...

随机推荐

Java实现 LeetCode 783 二叉搜索树节点最小距离（遍历）
783. 二叉搜索树节点最小距离给定一个二叉搜索树的根节点 root,返回树中任意两节点的差的最小值. 示例: 输入: root = [4,2,6,1,3,null,null] 输出: 1 解释: ...
opencl(7) 内核执行命令入队]工作组、工作项
1:将内核执行命令入队(该命令可被多个工作项执行) cl_int clEnqueueNDRangeKernel( cl_command_queue command_queue, cl_kernel k ...
源码分析（5）-ArrayList、Vector和LinkedList（JDK1.8）
一.概述 1.线程安全:ArrayList和LinkedList非线程安全的.Vector线程安全的. 2.底层数据结构:ArrayList和Vector底层数据结构是数组:LinkedList双向链 ...
河青的持久层框架hqbatis
谈到对数据库的操作,powerbuilder 的嵌入式SQL还是最方便的,增.删.改.查都无比的方便,可惜它落败于BS架构的盛起.java 以mvc的框架,实现对数据库的操作,写起来是相当麻烦,jav ...
Base-64字符串无效，The input is not a valid Base-64 string as it contains a non-base 64 character
base64规则: * 字符串只可能包含A-Z,a-z,0-9,+,/,=字符* 字符串长度是4的倍数* =只会出现在字符串最后,可能没有或者一个等号或者两个等号首先,C# 做上传文件的时候,需要替 ...
【javascript的那些事】等待加载完js后执行方法
很多时候,你也许会碰到使用的情景: js文件b.js还没有从服务器端加载到web端,而吧a.js中已经调用了b.js中的方法实例: 这里是加载echart的时候碰到的具体实例引入js " ...
JPA 中 find() 和 getReference() 的区别
在查询的时候有两个方法:find()和getReference(),这两个方法的参数以及调用方式都相同.那么这两个方法有什么不一样的呢? find()称为立即加载,顾名思义就是在调用的时候立即执行查 ...
使用VBS实现SSH远程登录并自动执行命令
set ws=createobject("wscript.shell")ws.run "Putty所在路径\putty.exe -ssh -pw 你的密码用户名@192 ...
mitmproxy的简单使用
第1则 ---抓包工具mitmdump的使用--- 一.什么是抓包?怎么抓包? 1.抓包(packet capture)就是将网络传输发送与接收的数据包进行截获.重发.编辑.转存等操作,也用来检查网络 ...
VS Code项目中通过npm包的方式共享代码片段的方案实现
VS Code项目中通过npm包的方式共享代码片段的方案实现上周在 "VS Code项目中共享自定义的代码片段方案" 的文章中提到过一个共享代码片段的方案,上周经过调研后并没有发 ...