0.

splash: 美人鱼  溅,泼

1.参考

Splash使用初体验

docker在windows下的安装

https://blog.scrapinghub.com/2015/03/02/handling-javascript-in-scrapy-with-splash/

Splash is our in-house solution for JavaScript rendering, implemented in Python using Twisted and QT.  官方博客介绍,splash 是 scrapinghub 的内部解决方案???

https://scrapinghub.com/

We're the creators and the main maintainers of Scrapy. 创始人和维护者...背后的大佬

github: scrapinghub/splash

Splash is a javascript rendering service with an HTTP API. It's a lightweight browser with an HTTP API, implemented in Python 3 using Twisted and QT5.

It's fast, lightweight and state-less which makes it easy to distribute. 用于渲染js页面

http://splash.readthedocs.io/en/latest/index.html

splash 官方文档

github: scrapy-plugins/scrapy-splash

This library provides Scrapy and JavaScript integration using Splash. 如何在 scrapy 中使用 splash

http://splash.readthedocs.io/en/stable/api.html#request-filters  

Splash supports filtering requests based on Adblock Plus rules.  还没有搞定

2.安装使用

https://stackoverflow.com/questions/30345623/scraping-dynamic-content-using-python-scrapy

提到 ScrapyJS,但是链接地址跳转 https://github.com/scrapy-plugins/scrapy-splash#installation

https://pypi.python.org/pypi/scrapyjs

https://pypi.python.org/pypi/scrapy-splash

2.1 安装 scrapy-splash

C:\Users\win7>pip install scrapy-splash
Collecting scrapy-splash
Downloading scrapy_splash-0.7.2-py2.py3-none-any.whl
Installing collected packages: scrapy-splash
Successfully installed scrapy-splash-0.7.2

2.2 通过 docker 安装 image:scrapinghub/splash

官网找到下载链接

https://store.docker.com/editions/community/docker-ce-desktop-windows

Get Docker Community Edition for Windows

Docker for Windows is available for free.

Requires Microsoft Windows 10 Professional or Enterprise 64-bit. For previous versions get Docker Toolbox.

右键管理员安装,最好勾选非必要项???

右键管理员启动 Docker Quickstart Terminal ,提示没找到 bash.exe

输出:

Creating CA: C:\Users\win7\.docker\machine\certs\ca.pem
Creating client certificate: C:\Users\win7\.docker\machine\certs\cert.pem
Running pre-create checks...
(default) Image cache directory does not exist, creating it at C:\Users\win7\.docker\machine\cache...
(default) No default Boot2Docker ISO found locally, downloading the latest release...
(default) Latest release for github.com/boot2docker/boot2docker is v17.09.0-ce
(default) Downloading C:\Users\win7\.docker\machine\cache\boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v17.09.0-ce/boot2docker.iso...
(default) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Creating machine...
(default) Copying C:\Users\win7\.docker\machine\cache\boot2docker.iso to C:\Users\win7\.docker\machine\machines\default\boot2docker.iso...
(default) Creating VirtualBox VM...
(default) Creating SSH key...
(default) Starting the VM...
(default) Check network to re-create if needed...
(default) Windows might ask for the permission to create a network adapter. Sometimes, such confirmation window is minimized in the taskbar.
(default) Found a new host-only adapter: "VirtualBox Host-Only Ethernet Adapter #2"
(default) Windows might ask for the permission to configure a network adapter. Sometimes, such confirmation window is minimized in the taskbar.
(default) Windows might ask for the permission to configure a dhcp server. Sometimes, such confirmation window is minimized in the taskbar.
(default) Waiting for an IP...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with boot2docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: D:\Program Files\Docker Toolbox\docker-machine.exe env default ## .
## ## ## ==
## ## ## ## ## ===
/"""""""""""""""""\___/ ===
~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ / ===- ~~~
\______ o __/
\ \ __/
\____\_______/ docker is configured to use the default machine with IP 192.168.99.100
For help getting started, check out the docs at https://docs.docker.com Start interactive shell win7@win7-PC MINGW64 ~
$ docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 17.09.0-ce
Storage Driver: aufs
Root Dir: /mnt/sda1/var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 0
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.4.89-boot2docker
Operating System: Boot2Docker 17.09.0-ce (TCL 7.2); HEAD : 06d5c35 - Wed Sep 27 23:22:43 UTC 2017
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 995.8MiB
Name: default
ID: O33J:6GDF:AQ6P:RBM7:6KLF:OZHY:2N3J:QZKV:YIJT:G3AI:XCPD:NZ3G
Docker Root Dir: /mnt/sda1/var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 17
Goroutines: 26
System Time: 2017-10-18T09:58:42.414047781Z
EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
provider=virtualbox
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false win7@win7-PC MINGW64 ~
$ ipconfig Windows IP 配置 以太网适配器 lan: 连接特定的 DNS 后缀 . . . . . . . :
本地链接 IPv6 地址. . . . . . . . : fe80::f950:bf55:726b:b7a6%14
IPv4 地址 . . . . . . . . . . . . : 192.168.144.100
子网掩码 . . . . . . . . . . . . : 255.255.255.0
默认网关. . . . . . . . . . . . . : 192.168.144.254 以太网适配器 VirtualBox Host-Only Network #2: 连接特定的 DNS 后缀 . . . . . . . :
本地链接 IPv6 地址. . . . . . . . : fe80::1c18:13ad:7ed2:c0ff%29
IPv4 地址 . . . . . . . . . . . . : 192.168.99.1
子网掩码 . . . . . . . . . . . . : 255.255.255.0
默认网关. . . . . . . . . . . . . : 隧道适配器 isatap.{CE007B04-2C7A-4A52-8BBF-1BCB4682EEB9}: 媒体状态 . . . . . . . . . . . . : 媒体已断开
连接特定的 DNS 后缀 . . . . . . . : 隧道适配器 Teredo Tunneling Pseudo-Interface: 媒体状态 . . . . . . . . . . . . : 媒体已断开
连接特定的 DNS 后缀 . . . . . . . : 隧道适配器 isatap.{93C68FD9-301C-484C-AFCB-5549CA24453B}: 媒体状态 . . . . . . . . . . . . : 媒体已断开
连接特定的 DNS 后缀 . . . . . . . : win7@win7-PC MINGW64 ~
$

里面重要信息:

(default) Copying C:\Users\win7\.docker\machine\cache\boot2docker.iso to C:\Users\win7\.docker\machine\machines\default\boot2docker.iso...
(default) Creating VirtualBox VM... docker is configured to use the default machine with IP 192.168.99.100
For help getting started, check out the docs at https://docs.docker.com

putty 连接:

192.168.99.100
22 docker
tcuser

第一次需要从docker hub下载相关镜像文件

sudo docker pull scrapinghub/splash

后面每次启动splash服务,并通过http,https,telnet提供服务

#通常一般使用http模式 ,可以只启动一个8050就好
#Splash 将运行在 0.0.0.0 at ports 8050 (http), 8051 (https) and 5023 (telnet).
sudo docker run -p 5023:5023 -p 8050:8050 -p 8051:8051 scrapinghub/splash

浏览器打开

http://192.168.99.100:8050

scrapy相关:splash安装 A javascript rendering service 渲染的更多相关文章

  1. scrapy相关:splash 实践

    0. 1.参考 https://github.com/scrapy-plugins/scrapy-splash#configuration 以此为准 scrapy相关:splash安装 A javas ...

  2. Scrapy爬虫框架(实战篇)【Scrapy框架对接Splash抓取javaScript动态渲染页面】

    (1).前言 动态页面:HTML文档中的部分是由客户端运行JS脚本生成的,即服务器生成部分HTML文档内容,其余的再由客户端生成 静态页面:整个HTML文档是在服务器端生成的,即服务器生成好了,再发送 ...

  3. Python之Scrapy爬虫框架安装及简单使用

    题记:早已听闻python爬虫框架的大名.近些天学习了下其中的Scrapy爬虫框架,将自己理解的跟大家分享.有表述不当之处,望大神们斧正. 一.初窥Scrapy Scrapy是一个为了爬取网站数据,提 ...

  4. scrapy的splash 的简单使用

    安装Splash(拉取镜像下来)docker pull scrapinghub/splash安装scrapy-splashpip install scrapy-splash启动容器docker run ...

  5. scrapy之环境安装

    scrapy之环境安装 在之前我安装了scrapy,但是在pycharm中却无法使用. 具体情况是: 我的电脑上存在多个python,有python2,python3,anaconda,其中anaco ...

  6. Scrapy对接Splash基础知识学习

    一:什么是Splash Splash是一个 JavaScript渲染服务,是一个带有 HTTPAPI 的轻量级浏览器 1 功能介绍 利用 Splash,我们可以实现如下功能: 口异步方式处理多个网页渲 ...

  7. Nutch相关框架安装使用最佳指南(转帖)

    Nutch相关框架安装使用最佳指南 Chinese installing and using instruction  -  The best guidance in installing and u ...

  8. Docker 容器中相关软件安装

    Docker 容器中相关软件安装 1.介绍 我们从docker hub下载的centos镜像是只有很少的命令,需要单独安装我们所需的相关软件. 2.安装软件 安装yum-utils软件包 该软件包是辅 ...

  9. Win8.1安装VirtualSVN Server发生service visualSVN Server failed to start解决办法

    Service 'VisualSVN Server' failed to start. Please check VisualSVN Server log in Event Viewer for mo ...

随机推荐

  1. git 回退各种场景操作

    在git的一般使用中,如果发现错误的将不想提交的文件add进入index之后,想回退取消,则可以使用命令:git reset HEAD <file>...,同时git add完毕之后,gi ...

  2. 使用chrome开发者工具中的network面板测量网站网络性能

    前面的话 Chrome 开发者工具是一套内置于Google Chrome中的Web开发和调试工具,可用来对网站进行迭代.调试和分析.使用 Network 面板测量网站网络性能.本文将详细介绍chrom ...

  3. MyBatis:二级缓存原理分析

    MyBatis从入门到放弃七:二级缓存原理分析 前言 说起mybatis的一级缓存和二级缓存我特意问了几个身边的朋友他们平时会不会用,结果没有一个人平时业务场景中用. 好吧,那我暂且用来学习源码吧.一 ...

  4. Drag(拖拽)和Move(移动)两个脚本

    Drag using System.Collections; using System.Collections.Generic; using UnityEngine; public class Dra ...

  5. [SHOI2014]三叉神经树

    题目描述 计算神经学作为新兴的交叉学科近些年来一直是学术界的热点.一种叫做SHOI 的神经组织因为其和近日发现的化合物 SHTSC 的密切联系引起了人们的极大关注. SHOI 组织由若干个 SHOI ...

  6. Codeforces 1037E Trips

    原题 题目大意: 有\(n\)个人,起初他们都不是朋友.总共有\(m\)天,每天会有两个人成为朋友.他们计划在晚上出去旅游,对于一个人,有如下两种情况: 1.要么他不出去旅游 2.要么有至少\(k\) ...

  7. 金融量化分析【day111】:Matplotib-绘制K线图

    一.绘制k线图 1.使用金融包出错解决 1.错误代码 ImportError: No module named finance 2.解决办法 https://github.com/matplotlib ...

  8. EF CodeFirst系列(4)--- 数据注释属性

    EFCodeFirst模式使用的是约定大于配置的编程模式,这种模式利用默认约定根据我们的领域模型建立概念模型.然后我们也可以通过配置领域类来覆盖默认约定. 覆盖默认约定主要用两种手段: 1.数据注释属 ...

  9. [物理学与PDEs]第2章第1节 理想流体力学方程组 1.4 一维理想流体力学方程组

    1.  一维理想流体力学方程组 $$\beex \bea \cfrac{\p\rho}{\p t}+\cfrac{\p}{\p x}(\rho u)&=0,\\ \cfrac{\p}{\p t ...

  10. SSH框架之hibernate《四》

    hibernate第四天     一.JPA相关概念         1.1JPA概述             全称是:Java Persistence API.是sun公司推出的一套基于ORM的规范 ...