网络爬虫技术实现java依赖库整理输出

目录

1       简介... 2

1.1      背景介绍... 2

1.2      现有方法优缺点对比... 2

2       实现方法... 2

2.1      通过配置文件配置需要查询的依赖库... 2

2.2      获取最新版本号... 3

2.3      版本号解析算法实现... 4

2.4      获取依赖库信息... 6

2.5      依赖库信息解析算法实现... 6

2.6      输出依赖库信息结果... 10

3       软件操作步骤... 17

1         简介

1.1   背景介绍

Java有很多依赖库,而且依赖库的版本不断的更新,在产品开发中,使用新的依赖库,需要更新对应的依赖库的版本信息,往往存在牵一发而动全身的效果。目前公司采用的方法都是人工去查询,然后整理输出到java的配置文件中,需要根据依赖库的artifact去网站https://mvnrepository.com/上逐个查询,这种方法费时费力,且容易出错。而且版本不断的更新,可能要经常去更新依赖配置文件。需要经常去更新依赖库的版本信息;造成大量的时间浪费。为了解决该问题。采用网络爬虫技术,去检索网页中依赖库的版本信息,将依赖库信息抽取出来,然后按照java配置文件中依赖库的pom要求的xml格式和ReadMe需要的格式自动输出依赖关系。

1.2   现有方法优缺点对比

人工查询的具有容易出错、耗时耗力的缺点。而通过工具去查询,具有快速、准确的优点,而且能够按照java配置文件的格式进行输出。无需人为去整理。版本更新迭代时,只需要几秒钟的时间就可以完成人工查询几天的任务量;

2         实现方法

2.1   通过配置文件配置需要查询的依赖库

具体格式和java配置文件中格式相同,如果指定了版本则查询该版本号的依赖库,如果没有指定版本则查询最新版本的依赖库信息,包括groupId,artifactId,version;配置文件的格式如下:

<dependencies>

<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-web</artifactId>

<version>1.5.19.RELEASE</version>

</dependency>

<dependency>

<groupId>org.springframework.cloud</groupId>

<artifactId>spring-cloud-starter-sleuth</artifactId>

</dependency>

<dependency>

<groupId>com.github.pagehelper</groupId>

<artifactId>pagehelper-spring-boot-starter</artifactId>

</dependency>

</dependencies>

2.2   获取最新版本号

读取配置文件中需要查询的依赖库,获取版本信息,没有配置版本号的,通过调用接口从https://mvnrepository.com/查询最新的版本号。

调用接口https://mvnrepository.com/artifact/org.springframework.boot/spring-boot-starter-web,https://mvnrepository.com/artifact/+groupid/+artifactid,获取依赖库的版本信息。对应的网页界面如下:

调用接口可以获取到字符串格式的内容,通过观察字符串中的节点名称和组织规律,设计解析算法,获取最新的版本信息。最新的版本信息为2.1.5.RELEASE

2.3   版本号解析算法实现

int MvnRepository::ParseNewestVersion(string strResponse, Dependence& dep)

{

int pos = strResponse.find("License</th><td><span class=");

string strTemp = "";

if (pos!= string::npos)

{

strResponse = strResponse.substr(pos);

strTemp = strResponse.substr(0, 60);

pos = strTemp.find("b lic");

while (pos!= string::npos)

{

strResponse = strResponse.substr(pos + 7);

pos = strResponse.find("<");

if (pos== string::npos)

{

break;

}

strTemp = strResponse.substr(0, pos);

dep.vecLicense.push_back(strTemp);

strTemp = strResponse.substr(0, 60);

pos= strTemp.find("b lic");

}

}

pos = strResponse.find("Categories</th><td>");

if (pos!=string::npos)

{

strResponse = strResponse.substr(pos);

strTemp = strResponse.substr(0, 120);

pos = strTemp.find("b c");

while (pos != string::npos)

{

strResponse = strResponse.substr(pos + 5);

pos = strResponse.find("<");

if (pos == string::npos)

{

break;

}

strTemp = strResponse.substr(0, pos);

dep.vecLicense.push_back(strTemp);

strTemp = strResponse.substr(0, 60);

pos = strTemp.find("b c");

}

}

pos = strResponse.find("vbtn release");

if (pos == string::npos)

{

LOGIC_ERROR("find vbtn release failed");

return HPR_ERROR;

}

strResponse = strResponse.substr(pos + 14);

pos = strResponse.find("<");

if (pos == string::npos)

{

LOGIC_ERROR("find < failed");

return HPR_ERROR;

}

dep.strNewestVersion = strResponse.substr(0, pos);

if (dep.strCurrentVersion=="")

{

dep.strCurrentVersion = dep.strNewestVersion;

}

return HPR_OK;

}

2.4   获取依赖库信息

调用接口获取版本号之后,再调用接口获取依赖库信息,接口为https://mvnrepository.com/artifact/org.springframework.boot/spring-boot-starter-web/2.1.5.RELEASE,获取依赖库的Compile Dependencies。同样是解析字符串,按照格式抓取Compile Dependencies信息。保存起来。

2.5   依赖库信息解析算法实现

int MvnRepository::ParseDependences(string strResponse,map<string, Dependence>& mapDependence)

{

int iReval = HPR_ERROR;

int pos = string::npos;

do

{

if (strResponse=="")

{

break;

}

pos = strResponse.find("Compile Dependencies");

if (pos== string::npos)

{

break;

}

strResponse = strResponse.substr(pos);

pos = strResponse.find("Test Dependencies");

if (pos != string::npos)

{

strResponse=strResponse.substr(0, pos);

}

pos = strResponse.find(" class=\"b ");

if (pos== string::npos)

{

pos= strResponse.find("vbtn release");

}

string strtemp = "";

while (pos!= string::npos)

{

Dependence dep;

pos = strResponse.find(" class=\"b ");

if (pos!= string::npos)

{

strResponse = strResponse.substr(pos);

strtemp = strResponse.substr(0, 60);

while (strtemp.find(" class=\"b ") != string::npos)

{

pos = strtemp.find(" class=\"b ");

strResponse = strResponse.substr(pos);

pos = strResponse.find(">");

if (pos == string::npos)

{

LOGIC_ERROR("strResponse.find(>) failed");

break;

}

strResponse = strResponse.substr(pos + 1);

pos = strResponse.find("<");

if (pos == string::npos)

{

LOGIC_ERROR("strResponse.find(<) failed");

break;

}

strtemp = strResponse.substr(0, pos);

dep.vecLicense.push_back(strtemp);

LOGIC_TRACE("vecLicense:%s", strtemp.c_str());

strtemp = strResponse.substr(0, 60);

}

}

pos = strResponse.find("vbtn release");

if (pos== string::npos)

{

LOGIC_ERROR("strResponse.find vbtn release failed!");

break;

}

strResponse = strResponse.substr(pos + 30);

pos = strResponse.find("\">");

if (pos== string::npos)

{

LOGIC_ERROR("strResponse.find \"> failed!");

break;

}

strtemp = strResponse.substr(0, pos);

strResponse = strResponse.substr(pos);

pos = strtemp.find("/");

if (pos == string::npos)

{

LOGIC_ERROR("strResponse.find / failed!");

break;

}

dep.strGroupid = strtemp.substr(0, pos);

LOGIC_TRACE("strGroupid:%s", dep.strGroupid.c_str());

strtemp = strtemp.substr(pos + 1);

pos = strtemp.find("/");

if (pos == string::npos)

{

LOGIC_ERROR("strResponse.find / failed!");

break;

}

dep.strArtifact = strtemp.substr(0, pos);

LOGIC_TRACE("strArtifact:%s", dep.strArtifact.c_str());

if (dep.strArtifact=="sqljet")

{

int i = 0;

}

strtemp = strtemp.substr(pos + 1);

dep.strCurrentVersion = strtemp;

LOGIC_TRACE("strCurrentVersion:%s", dep.strCurrentVersion.c_str());

strtemp = strResponse.substr(0,120);

pos = strtemp.find("vbtn release");

/*if (pos == string::npos)

{

LOGIC_ERROR("strResponse.find vbtn release failed!");

break;

}

strResponse = strResponse.substr(pos);*/

//pos = strResponse.find(dep.strArtifact);

if (pos != string::npos)

{

strResponse = strResponse.substr(pos);

pos = strResponse.find("\">");

if (pos == string::npos)

{

LOGIC_ERROR("strResponse.find \"> failed!");

break;

}

strResponse = strResponse.substr(pos + 2);

pos= strResponse.find("<");

if (pos== string::npos)

{

LOGIC_ERROR("strResponse.find \"> failed!");

break;

}

strtemp = strResponse.substr(0, pos);

dep.strNewestVersion = strtemp;

LOGIC_TRACE("strNewestVersion:%s", dep.strNewestVersion.c_str());

}

mapDependence[dep.strArtifact + dep.strCurrentVersion]=dep;

pos= strResponse.find(" class=\"b ");

if (pos==string::npos)

{

pos = strResponse.find("vbtn release");

}

}

iReval = HPR_OK;

} while (0);

return iReval;

}

2.6   输出依赖库信息结果

解析完依赖库信息之后,按照java配置文件的格式输出到文件。

1)Pom.xml文件输出格式如下

<?xml version="1.0" encoding="UTF-8" ?><output>

<properties>

<cdi-api.version>1.0</cdi-api.version>

<ejb-api.version>3.0</ejb-api.version>

<guava.version>19.0</guava.version>

<javaslang.version>2.0.6</javaslang.version>

<javax.annotation-api.version>1.3</javax.annotation-api.version>

<javax.servlet-api.version>3.0.1</javax.servlet-api.version>

<joda-time.version>2.10.1</joda-time.version>

<json-path.version>2.4.0</json-path.version>

<kotlin-reflect.version>1.2.71</kotlin-reflect.version>

<kotlin-stdlib.version>1.2.71</kotlin-stdlib.version>

<mybatis-spring-boot-starter.version>1.3.2</mybatis-spring-boot-starter.version>

<pagehelper-spring-boot-autoconfigure.version>1.2.10</pagehelper-spring-boot-autoconfigure.version>

<pagehelper-spring-boot-starter.version>1.2.10</pagehelper-spring-boot-starter.version>

<pagehelper.version>5.1.8</pagehelper.version>

<querydsl-apt.version>4.2.1</querydsl-apt.version>

<querydsl-collections.version>4.2.1</querydsl-collections.version>

<querydsl-core.version>4.2.1</querydsl-core.version>

<reactor-core.version>3.2.6.RELEASE</reactor-core.version>

<rxjava-reactive-streams.version>1.2.1</rxjava-reactive-streams.version>

<rxjava.version>1.3.8</rxjava.version>

<rxjava.version>2.2.6</rxjava.version>

<scala-library.version>2.11.7</scala-library.version>

<spring-boot-starter.version>2.0.1.RELEASE</spring-boot-starter.version>

<spring-data-commons.version>2.1.5.RELEASE</spring-data-commons.version>

<spring-hateoas.version>0.25.1.RELEASE</spring-hateoas.version>

<threetenbp.version>1.3.8</threetenbp.version>

<vavr.version>0.9.3</vavr.version>

<xmlprojector.version>1.4.15</xmlprojector.version>

</properties>

<dependencies>

<dependency>

<groupId>javax.enterprise</groupId>

<artifactId>cdi-api</artifactId>

<version>${cdi-api.version}</version>

</dependency>

<dependency>

<groupId>javax.ejb</groupId>

<artifactId>ejb-api</artifactId>

<version>${ejb-api.version}</version>

</dependency>

<dependency>

<groupId>com.google.guava</groupId>

<artifactId>guava</artifactId>

<version>${guava.version}</version>

</dependency>

<dependency>

<groupId>io.javaslang</groupId>

<artifactId>javaslang</artifactId>

<version>${javaslang.version}</version>

</dependency>

<dependency>

<groupId>javax.annotation</groupId>

<artifactId>javax.annotation-api</artifactId>

<version>${javax.annotation-api.version}</version>

</dependency>

<dependency>

<groupId>javax.servlet</groupId>

<artifactId>javax.servlet-api</artifactId>

<version>${javax.servlet-api.version}</version>

</dependency>

<dependency>

<groupId>joda-time</groupId>

<artifactId>joda-time</artifactId>

<version>${joda-time.version}</version>

</dependency>

<dependency>

<groupId>com.jayway.jsonpath</groupId>

<artifactId>json-path</artifactId>

<version>${json-path.version}</version>

</dependency>

<dependency>

<groupId>org.jetbrains.kotlin</groupId>

<artifactId>kotlin-reflect</artifactId>

<version>${kotlin-reflect.version}</version>

</dependency>

<dependency>

<groupId>org.jetbrains.kotlin</groupId>

<artifactId>kotlin-stdlib</artifactId>

<version>${kotlin-stdlib.version}</version>

</dependency>

<dependency>

<groupId>org.mybatis.spring.boot</groupId>

<artifactId>mybatis-spring-boot-starter</artifactId>

<version>${mybatis-spring-boot-starter.version}</version>

</dependency>

<dependency>

<groupId>com.github.pagehelper</groupId>

<artifactId>pagehelper-spring-boot-autoconfigure</artifactId>

<version>${pagehelper-spring-boot-autoconfigure.version}</version>

</dependency>

<dependency>

<groupId>com.github.pagehelper</groupId>

<artifactId>pagehelper-spring-boot-starter</artifactId>

<version>${pagehelper-spring-boot-starter.version}</version>

</dependency>

<dependency>

<groupId>com.github.pagehelper</groupId>

<artifactId>pagehelper</artifactId>

<version>${pagehelper.version}</version>

</dependency>

<dependency>

<groupId>com.querydsl</groupId>

<artifactId>querydsl-apt</artifactId>

<version>${querydsl-apt.version}</version>

</dependency>

<dependency>

<groupId>com.querydsl</groupId>

<artifactId>querydsl-collections</artifactId>

<version>${querydsl-collections.version}</version>

</dependency>

<dependency>

<groupId>com.querydsl</groupId>

<artifactId>querydsl-core</artifactId>

<version>${querydsl-core.version}</version>

</dependency>

<dependency>

<groupId>io.projectreactor</groupId>

<artifactId>reactor-core</artifactId>

<version>${reactor-core.version}</version>

</dependency>

<dependency>

<groupId>io.reactivex</groupId>

<artifactId>rxjava-reactive-streams</artifactId>

<version>${rxjava-reactive-streams.version}</version>

</dependency>

<dependency>

<groupId>io.reactivex</groupId>

<artifactId>rxjava</artifactId>

<version>${rxjava.version}</version>

</dependency>

<dependency>

<groupId>io.reactivex.rxjava2</groupId>

<artifactId>rxjava</artifactId>

<version>${rxjava.version}</version>

</dependency>

<dependency>

<groupId>org.scala-lang</groupId>

<artifactId>scala-library</artifactId>

<version>${scala-library.version}</version>

</dependency>

<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter</artifactId>

<version>${spring-boot-starter.version}</version>

</dependency>

<dependency>

<groupId>org.springframework.data</groupId>

<artifactId>spring-data-commons</artifactId>

<version>${spring-data-commons.version}</version>

</dependency>

<dependency>

<groupId>org.springframework.hateoas</groupId>

<artifactId>spring-hateoas</artifactId>

<version>${spring-hateoas.version}</version>

</dependency>

<dependency>

<groupId>org.threeten</groupId>

<artifactId>threetenbp</artifactId>

<version>${threetenbp.version}</version>

</dependency>

<dependency>

<groupId>io.vavr</groupId>

<artifactId>vavr</artifactId>

<version>${vavr.version}</version>

</dependency>

<dependency>

<groupId>org.xmlbeam</groupId>

<artifactId>xmlprojector</artifactId>

<version>${xmlprojector.version}</version>

</dependency>

</dependencies>

</output>

2)Readme文件输出格式如下

## com.github.pagehelper/pagehelper-spring-boot-starter/1.2.10(1.2.10)/MIT

-引入: mybatis-spring-boot-starter (org.mybatis.spring.boot)/ 1.3.2(最新版 2.0.0)/ Apache 2.0

-引入: pagehelper-spring-boot-autoconfigure (com.github.pagehelper)/ 1.2.10(最新版 )/ MIT

-引入: pagehelper (com.github.pagehelper)/ 5.1.8(最新版 )/ MIT

-引入: spring-boot-starter (org.springframework.boot)/ 2.0.1.RELEASE(最新版 2.1.3.RELEASE)/ Apache 2.0

## org.springframework.data/spring-data-commons/2.1.5.RELEASE(2.1.5.RELEASE)/Apache 2.0

-引入: cdi-api (javax.enterprise)/ 1.0(最新版 )/ Dep Injection,Apache 2.0

-引入: ejb-api (javax.ejb)/ 3.0(最新版 )/ Java Spec,CDDL 1.1

-引入: guava (com.google.guava)/ 19.0(最新版 27.1-jre)/ JSON Lib,Apache 2.0

-引入: javaslang (io.javaslang)/ 2.0.6(最新版 0.10.0)/ Functional Programming,Apache 2.0

-引入: javax.annotation-api (javax.annotation)/ 1.3(最新版 1.3.2)/ Java Spec,CDDL,GPL 2.0

-引入: javax.servlet-api (javax.servlet)/ 3.0.1(最新版 4.0.1)/ Java Spec,CDDL,GPL 2.0

-引入: joda-time (joda-time)/ 2.10.1(最新版 )/ Date/Time,Apache 2.0

-引入: json-path (com.jayway.jsonpath)/ 2.4.0(最新版 )/ JSON Lib,Apache 2.0

-引入: kotlin-reflect (org.jetbrains.kotlin)/ 1.2.71(最新版 1.3.21)/ Reflection,Apache 2.0

-引入: kotlin-stdlib (org.jetbrains.kotlin)/ 1.2.71(最新版 1.3.21)/ JVM Languages,Apache 2.0

-引入: querydsl-apt (com.querydsl)/ 4.2.1(最新版 )/ Apache 2.0

-引入: querydsl-collections (com.querydsl)/ 4.2.1(最新版 )/ Apache 2.0

-引入: querydsl-core (com.querydsl)/ 4.2.1(最新版 )/ Apache 2.0

-引入: reactor-core (io.projectreactor)/ 3.2.6.RELEASE(最新版 )/ Apache 2.0

-引入: rxjava-reactive-streams (io.reactivex)/ 1.2.1(最新版 )/ Apache 2.0

-引入: rxjava (io.reactivex)/ 1.3.8(最新版 2.2.7)/ Apache 2.0

-引入: rxjava (io.reactivex.rxjava2)/ 2.2.6(最新版 2.2.7)/ Apache 2.0

-引入: scala-library (org.scala-lang)/ 2.11.7(最新版 2.12.8)/ JVM Languages,Apache 2.0

-引入: spring-hateoas (org.springframework.hateoas)/ 0.25.1.RELEASE(最新版 )/ Core Utils,Apache 2.0

-引入: threetenbp (org.threeten)/ 1.3.8(最新版 )/ BSD 3-clause

-引入: vavr (io.vavr)/ 0.9.3(最新版 0.10.0)/ Functional Programming,Apache 2.0

-引入: xmlprojector (org.xmlbeam)/ 1.4.15(最新版 1.4.16)/ Apache 2.0

3         软件操作步骤

(1)    将需要查询的依赖库按照格式输入根目录的pom.xml文件夹下,配置三个选项,如果指定了version,则根据指定的版本去查找,没有指定的库,从网站上查找最新的版本。

<dependencies>

<dependency>

<groupId>com.github.pagehelper</groupId>

<artifactId>pagehelper-spring-boot-starter</artifactId>

<version>1.2.10</version>

</dependency>

<dependency>

<groupId>org.springframework.data</groupId>

<artifactId>spring-data-commons</artifactId>

</dependency>

</dependencies>

(2)    双击打开JavaDependence.exe软件,点击读取按钮,从配置文件中读取需要查询的库。在对话框中会显示读取的数量;

(3)    点击查询按钮进行查询,大概每个3秒左右的时间,慢慢等待。查询结束后,会在对话框中显示成功失败的数量,如果失败了几个,再刷新下网页,然后继续点击查询按钮,会将失败的继续查询,直到所有的都查询成功;

(4)    所有的都查询成功后,点击输出按钮进行输出。会按照格式要求输出到文件中。对话框会显示输出成功。Pom.xml中会按照artifact的字母顺序输出。然后在根目录下会有两个文件。

解析函数实现 头文件MvnRepository.h

#pragma once
#include "HPR_Singleton.h"
#include <string>
#include <vector>
#include <map>
using namespace std;
struct Dependence
{
string strGroupid;
string strArtifact;
string strCurrentVersion;
string strNewestVersion;
vector<string> vecLicense;
Dependence()
{
strGroupid = "";
strArtifact = "";
strCurrentVersion = "";
strNewestVersion = "";
}
};
class MvnRepository:public singleton<MvnRepository>
{
public:
MvnRepository();
~MvnRepository(); public:
int GetNewestVersion(string artifactid, Dependence& dep);
int GetDependences(string strArtifactid, map<string, Dependence>& mapDependence);
int ParseNewestVersion(string strResponse, Dependence& dep);
int ParseDependences(string strResponse, map<string, Dependence>& mapDependence); };

MvnRepository.cpp 源文件

#include "stdafx.h"
#include "MvnRepository.h"
#include "SimpleHttpClient.h"
#include "hlog1.h"
#include "RestClient.h"
MvnRepository::MvnRepository()
{
} MvnRepository::~MvnRepository()
{
}
int MvnRepository::GetNewestVersion(string artifactid,Dependence& dep)
{ string strUrl = "https://mvnrepository.com/artifact/";
strUrl = strUrl + artifactid;
//CSimpleHttpClient findresByAuthclient("GET", strUrl.c_str(), 5);
//findresByAuthclient.setHttpHeader("Content-Type", "application/json");
//if (!findresByAuthclient.sendHttpRequest())
//{
// LOGIC_ERROR("send findResourcesByAuth request error,url %s,return %s", strUrl.c_str(), findresByAuthclient.getHttpResponseBody().c_str());
//}
//else
//{
// std::string error_code;
// std::string error_msg;
// std::string strResponsefindResByAuth = findresByAuthclient.getHttpResponseBody();
// //LOGIC_TRACE("strResponsefindResByAuth1: %s", strResponsefindResByAuth.c_str());
//
// strVersion=ParseNewestVersion(strResponsefindResByAuth);
//}
string strResponsefindResByAuth = "";
if (CHttpClient::instance()->Gets(strUrl, strResponsefindResByAuth)==HPR_ERROR)
{
LOGIC_ERROR("Gets failed");
return HPR_ERROR;
}
return ParseNewestVersion(strResponsefindResByAuth, dep);
}
int MvnRepository::GetDependences(string strArtifactid, map<string, Dependence>& mapDependence)
{
int iReval = HPR_ERROR;
do
{
string strVersion = "";
string strUrl = "https://mvnrepository.com/artifact/";
strUrl = strUrl + strArtifactid;
CSimpleHttpClient findresByAuthclient("GET", strUrl.c_str(), );
findresByAuthclient.setHttpHeader("Content-Type", "application/json");
if (!findresByAuthclient.sendHttpRequest())
{
LOGIC_ERROR("send findResourcesByAuth request error,url %s,return %s", strUrl.c_str(), findresByAuthclient.getHttpResponseBody().c_str());
break;
}
else
{
std::string error_code;
std::string error_msg;
std::string strResponsefindResByAuth = findresByAuthclient.getHttpResponseBody();
//LOGIC_TRACE("strResponsefindResByAuth1: %s", strResponsefindResByAuth.c_str());
if ( ParseDependences(strResponsefindResByAuth, mapDependence)==HPR_ERROR)
{
LOGIC_ERROR("ParseDependences failed");
break;
}
}
iReval = HPR_OK;
} while ();
return iReval;
}
int MvnRepository::ParseNewestVersion(string strResponse, Dependence& dep)
{ int pos = strResponse.find("License</th><td><span class=");
string strTemp = "";
if (pos!= string::npos)
{
strResponse = strResponse.substr(pos);
strTemp = strResponse.substr(, );
pos = strTemp.find("b lic");
while (pos!= string::npos)
{
strResponse = strResponse.substr(pos + );
pos = strResponse.find("<");
if (pos== string::npos)
{
break;
}
strTemp = strResponse.substr(, pos);
dep.vecLicense.push_back(strTemp);
strTemp = strResponse.substr(, );
pos= strTemp.find("b lic");
}
}
pos = strResponse.find("Categories</th><td>");
if (pos!=string::npos)
{
strResponse = strResponse.substr(pos);
strTemp = strResponse.substr(, );
pos = strTemp.find("b c");
while (pos != string::npos)
{
strResponse = strResponse.substr(pos + );
pos = strResponse.find("<");
if (pos == string::npos)
{
break;
}
strTemp = strResponse.substr(, pos);
dep.vecLicense.push_back(strTemp);
strTemp = strResponse.substr(, );
pos = strTemp.find("b c");
}
}
pos = strResponse.find("vbtn release");
if (pos == string::npos)
{
LOGIC_ERROR("find vbtn release failed");
return HPR_ERROR;
}
strResponse = strResponse.substr(pos + );
pos = strResponse.find("<");
if (pos == string::npos)
{
LOGIC_ERROR("find < failed");
return HPR_ERROR;
}
dep.strNewestVersion = strResponse.substr(, pos);
if (dep.strCurrentVersion=="")
{
dep.strCurrentVersion = dep.strNewestVersion;
} return HPR_OK;
}
int MvnRepository::ParseDependences(string strResponse,map<string, Dependence>& mapDependence)
{
int iReval = HPR_ERROR;
int pos = string::npos; do
{
if (strResponse=="")
{
break;
}
pos = strResponse.find("Compile Dependencies");
if (pos== string::npos)
{
break;
}
strResponse = strResponse.substr(pos);
pos = strResponse.find("Test Dependencies");
if (pos != string::npos)
{
strResponse=strResponse.substr(, pos);
}
pos = strResponse.find(" class=\"b ");
if (pos== string::npos)
{
pos= strResponse.find("vbtn release");
}
string strtemp = "";
while (pos!= string::npos)
{
Dependence dep;
pos = strResponse.find(" class=\"b ");
if (pos!= string::npos)
{
strResponse = strResponse.substr(pos);
strtemp = strResponse.substr(, );
while (strtemp.find(" class=\"b ") != string::npos)
{
pos = strtemp.find(" class=\"b ");
strResponse = strResponse.substr(pos);
pos = strResponse.find(">");
if (pos == string::npos)
{
LOGIC_ERROR("strResponse.find(>) failed");
break;
}
strResponse = strResponse.substr(pos + );
pos = strResponse.find("<");
if (pos == string::npos)
{
LOGIC_ERROR("strResponse.find(<) failed");
break;
}
strtemp = strResponse.substr(, pos);
dep.vecLicense.push_back(strtemp);
LOGIC_TRACE("vecLicense:%s", strtemp.c_str());
strtemp = strResponse.substr(, );
}
} pos = strResponse.find("vbtn release");
if (pos== string::npos)
{
LOGIC_ERROR("strResponse.find vbtn release failed!");
break;
}
strResponse = strResponse.substr(pos + );
pos = strResponse.find("\">");
if (pos== string::npos)
{
LOGIC_ERROR("strResponse.find \"> failed!");
break;
}
strtemp = strResponse.substr(, pos);
strResponse = strResponse.substr(pos);
pos = strtemp.find("/");
if (pos == string::npos)
{
LOGIC_ERROR("strResponse.find / failed!");
break;
}
dep.strGroupid = strtemp.substr(, pos);
LOGIC_TRACE("strGroupid:%s", dep.strGroupid.c_str());
strtemp = strtemp.substr(pos + );
pos = strtemp.find("/");
if (pos == string::npos)
{
LOGIC_ERROR("strResponse.find / failed!");
break;
}
dep.strArtifact = strtemp.substr(, pos);
LOGIC_TRACE("strArtifact:%s", dep.strArtifact.c_str());
if (dep.strArtifact=="sqljet")
{
int i = ;
}
strtemp = strtemp.substr(pos + );
dep.strCurrentVersion = strtemp;
LOGIC_TRACE("strCurrentVersion:%s", dep.strCurrentVersion.c_str());
strtemp = strResponse.substr(,);
pos = strtemp.find("vbtn release");
/*if (pos == string::npos)
{
LOGIC_ERROR("strResponse.find vbtn release failed!");
break;
}
strResponse = strResponse.substr(pos);*/
//pos = strResponse.find(dep.strArtifact); if (pos != string::npos)
{
strResponse = strResponse.substr(pos);
pos = strResponse.find("\">");
if (pos == string::npos)
{
LOGIC_ERROR("strResponse.find \"> failed!");
break;
}
strResponse = strResponse.substr(pos + );
pos= strResponse.find("<");
if (pos== string::npos)
{
LOGIC_ERROR("strResponse.find \"> failed!");
break;
}
strtemp = strResponse.substr(, pos);
dep.strNewestVersion = strtemp;
LOGIC_TRACE("strNewestVersion:%s", dep.strNewestVersion.c_str());
} mapDependence[dep.strArtifact + dep.strCurrentVersion]=dep;
pos= strResponse.find(" class=\"b ");
if (pos==string::npos)
{
pos = strResponse.find("vbtn release");
}
}
iReval = HPR_OK;
} while ();
return iReval;
}

自己开发了一个股票智能分析软件,功能很强大,需要的点击下面的链接获取:

https://www.cnblogs.com/bclshuai/p/11380657.html

百度云盘下载地址:

链接:https://pan.baidu.com/s/1swkQzCIKI3g3ObcebgpIDg

提取码:mc8l

微信公众号获取最新的软件和视频介绍

QStockView

网络爬虫技术实现java依赖库整理输出的更多相关文章

  1. iOS—网络实用技术OC篇&网络爬虫-使用java语言抓取网络数据

    网络爬虫-使用java语言抓取网络数据 前提:熟悉java语法(能看懂就行) 准备阶段:从网页中获取html代码 实战阶段:将对应的html代码使用java语言解析出来,最后保存到plist文件 上一 ...

  2. iOS开发——网络实用技术OC篇&网络爬虫-使用java语言抓取网络数据

    网络爬虫-使用java语言抓取网络数据 前提:熟悉java语法(能看懂就行) 准备阶段:从网页中获取html代码 实战阶段:将对应的html代码使用java语言解析出来,最后保存到plist文件 上一 ...

  3. 企业级Python开发大佬利用网络爬虫技术实现自动发送天气预告邮件

    前天小编带大家利用Python网络爬虫采集了天气网的实时信息,今天小编带大家更进一步,将采集到的天气信息直接发送到邮箱,带大家一起嗨~~拓展来说,这个功能放在企业级角度来看,只要我们拥有客户的邮箱,之 ...

  4. 网络爬虫必备知识之requests库

    就库的范围,个人认为网络爬虫必备库知识包括urllib.requests.re.BeautifulSoup.concurrent.futures,接下来将结对requests库的使用方法进行总结 1. ...

  5. 网络爬虫技术Jsoup——爬到一切你想要的(转)

    转自:http://blog.csdn.net/ccg_201216323/article/details/53576654 本文由我的微信公众号(bruce常)原创首发, 并同步发表到csdn博客, ...

  6. 【网络爬虫】【java】微博爬虫(五):防止爬虫被墙的几个技巧(总结篇)

    爬虫的目的就是大规模地.长时间地获取数据,跟我们正常浏览器获取数据相比,虽然机理相差不大,但总是一个IP去爬网站,大规模集中对服务器访问,时间一长就有可能被拒绝.关于爬虫长时间爬取数据,可能会要求验证 ...

  7. 【网络爬虫】【java】微博爬虫(一):小试牛刀——网易微博爬虫(自定义关键字爬取微博数据)(附软件源码)

    一.写在前面 (本专栏分为"java版微博爬虫"和"python版网络爬虫"两个项目,系列里所有文章将基于这两个项目讲解,项目完整源码已经整理到我的Github ...

  8. 网络爬虫必备知识之urllib库

    就库的范围,个人认为网络爬虫必备库知识包括urllib.requests.re.BeautifulSoup.concurrent.futures,接下来将结合爬虫示例分别对urllib库的使用方法进行 ...

  9. 【网络爬虫】【java】微博爬虫(二):如何抓取HTML页面及HttpClient使用

    一.写在前面 上篇文章以网易微博爬虫为例,给出了一个很简单的微博爬虫的爬取过程,大概说明了网络爬虫其实也就这么回事,或许初次看到这个例子觉得有些复杂,不过没有关系,上篇文章给的例子只是让大家对爬虫过程 ...

随机推荐

  1. 换发型app任性扣费?苹果app订阅任性扣费?怎么办?刚成功

    2019年9月18日17:09:27 什么黑猫举报没用 先关闭订阅 账户中心自助申请试试,不通过再进行下面这步 https://getsupport.apple.com/?caller=home&am ...

  2. JavaScript例子3-对多选框进行操作,输出选中的多选框的个数

    <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title> ...

  3. O049、准备 LVM Volume Provider

    参考https://www.cnblogs.com/CloudMan6/p/5597790.html   Cinder 真正负责volume 管理的组件是 volume provider .Cinde ...

  4. javascript立体学习指南

    javascript立体学习指南第一章:首先了解javascript 首先,什么是javascript? JavaStrip出生于1995年,是一种文本脚本语言,成都装修公司是一种动态的.弱类型的.基 ...

  5. latex公式居中环境

    一般能够用到的环境是 \begin{equation} \begin{aligned} ... \end{aligned} \end{equation} 然而,这种环境用&只能够保证左对齐或者 ...

  6. fastadmin中关联表时A为主表,想让B表和C表关联时怎么办?

    $sql = Db::connect('数据库')->table('C表')->where('status', 'normal')->field('字段 别称[不可与其他表重复]') ...

  7. OPNsense防火墙搭建实验环境,MSF与SSH进行流量转发

    OPNsense防火墙搭建实验环境,MSF与SSH进行流量转发 摘要: 记录实验过程中踩到的坑.介绍OPNsense防火墙的安装配置并搭建实验环境,利用msf的模块及ssh进行流量转发(LAN向DMZ ...

  8. stm32WB 笔记

    1.HAL Debug functions(调试功能) 可以在不同模式下使能或者失能调试器 This section provides functions allowing to:• Enable/D ...

  9. VToRay C-S config

    Server config: { "inbounds": [{ "port": 20000, //Server Listening Port "pro ...

  10. GIT和SVN的区别(面试)

    Cit是分布式,而SVN不是分布式 存储内容的时候,Git按元数据方式存储,而SVN是按文件 Git没有一个全局版本号,SVN有,目前为止这是SVN相比Git缺少的最大的一个特征 Git的内容完整性要 ...