项目需要,需要把MVPtree这种冷门的数据结构写入Java,然网上没有成形的Java实现,虽说C++看惯了不过对C++实现复杂结构也是看得蒙蔽,幸好客户给了个github上job什么的人用Java写的VPtree,大体结构可以嵌入MVPtree。

  对于MVPtree的其他信息请左转百度= =本文只讲述算法实现。

  点查找树结构主要需解决的问题有2个:如何减少非必要点的搜索,以及如何减少距离计算次数。前者的解决方法比较容易想到,把点集分割为左右对称的两半长方形,或者脑洞大点的,通过距离切分(效率很高,因为所有查询都是基于点距离的)成为圆和圆环。后者适用面不是很广,优化思路通常是预先计算与基准点的距离,查询点时筛点。

  VPtree就是使用距离划分点集的例子。每个结点一个点集,随意定个点作为基准点,然后把点集根据与基准点距离分成数量相等的2个子集,这2个子集再分别进入此结点的子结点,用点查找出点集的过程如出一辙,但是没有对第2点进行优化,这个结构适合于距离函数是曼哈顿距离或者欧几里得距离的情况。

  MVPtree继承了VPtree用距离划分的特点,只不过一个结点会划分4个点集,同时通过path数组限制距离函数运行次数。划分为4个点集而不是2个点集,可以分割得细一些,减少无效点;使用一定数量的基准点限制,可以在查询频繁的情况下减少距离计算次数,并且这些基准点通常被切分得很散,大片大片的无效区域被排除了,效果拔群。这个结构适合于距离函数是计算次数过高的切比雪夫函数之流。

  接下来就是代码的实现了。

  MVPtree与VPtree的点有个不同之处,就是MVPtree的点还附上了与基准点的距离数组,这里就需要使用特别的点数据结构:MVPtree用点

  核心代码如下:

public class MVPTreePoint<P> {

    private ArrayList<Double> path;

    private P point;

    private final int maxLevel;

    public MVPTreePoint(final P point, final int maxLevel) {
this.point = point;
this.maxLevel = maxLevel;
this.path = new ArrayList<>();
} public void addDistanceToSelf(final MVPTreePoint<P> vantagePointElement, final DistanceFunction<P> distanceFunction) {
if(this.path.size() < this.maxLevel)
this.path.add(distanceFunction.getDistance(this.point, vantagePointElement.point));
} public void addDistanceToSelf(final P vantagePoint, final DistanceFunction<P> distanceFunction) {
if(this.path.size() < this.maxLevel)
this.path.add(distanceFunction.getDistance(this.point, vantagePoint));
} public void addDistanceToSelf(final double distance) {
if(this.path.size() < this.maxLevel) {
this.path.add(distance);
}
} public void removeDistanceToSelf(final int position) {
if(position < this.path.size()) {
this.path.remove(position);
}
} public double getDistanceToSelf(int i) {
return this.path.get(i);
} public int size() {
return this.path.size();
} public void clearPath() {
this.path.clear();
} public P getPoint() {
return this.point;
} @SuppressWarnings("unchecked")
public boolean equals(Object o){
MVPTreePoint<P> t = (MVPTreePoint<P>) o;
return this.point.equals(t.point);
}
}

MVPTreePoint

  把距离数组写到点类上而不是集成到树结点类上,结构会清晰一些,并且从点里取出距离也方便。

  MVPtree与VPtree有好多不同的地方,但是好多都只是改一下类名,把P,E改成MVPTreePoint<P>,MVPTreePoint<E>,这里主讲核心算法——初始化树和点查询。

  初始化MVPtree不仅要多选出一个基准点,多切分2次数组,还要把基准点到每个点的距离都分别储存起来。

  capacity就是叶子结点的容量,要设中间一些,根据数据规模定吧。

  原论文把基准点从点集取出来放到单独的位置上,但是实际编写程序时,把基准点仅仅当作一个基准点,基准点还是作为点集的一部分初始化。这样,数据结构仅仅是多出quantityOfPoint/capacity个点,但是程序编写方便了很多。

public MVPTreeNode(
final Collection<MVPTreePoint<E>> pointNodes,
final DistanceFunction<P> distanceFunction,
final MVPThresholdSelectionStrategy<P, E> thresholdSelectionStrategy,
final int capacity, final int maxLevel) { if (capacity < 1) {
throw new IllegalArgumentException("Capacity must be positive.");
} if (pointNodes.isEmpty()) {
throw new IllegalArgumentException(
"Cannot create a MVPTreeNode with an empty list of points.");
} this.capacity = capacity;
this.maxLevel = maxLevel;
this.distanceFunction = distanceFunction;
this.thresholdSelectionStrategy = thresholdSelectionStrategy;
this.pointNodes = new ArrayList<>(pointNodes);
this.children = new MVPTreeNode[2][2];
this.vantagePoint = (E[]) new Object[2];
this.secondThreshold = new double[2]; this.anneal();
} protected void anneal() {
if (this.pointNodes == null) {
int childrenSize[][] = new int[2][2];
for (int i = 0; i < 2; i++) {
for (int j = 0; j < 2; j++) {
childrenSize[i][j] = this.children[i][j].size();
}
} if (childrenSize[0][0] == 0 || childrenSize[0][1] == 0
|| childrenSize[1][0] == 0 || childrenSize[1][1] == 0) {
// One of the child nodes has become empty, and needs to be
// pruned.
this.pointNodes = new ArrayList<>(childrenSize[0][0]
+ childrenSize[0][1] + childrenSize[1][0]
+ childrenSize[1][1]);
this.addAllPointsToCollection(this.pointNodes);
for (MVPTreePoint<E> pointNode : this.pointNodes) {
pointNode.clearPath();
}
for (int i = 0; i < 2; i++) {
for (int j = 0; j < 2; j++) {
this.children[i][j] = null;
}
}
this.anneal();
} else {
for (int i = 0; i < 2; i++) {
for (int j = 0; j < 2; j++) {
this.children[i][j].anneal();
}
}
}
} else {
int firstVantagePointIndex = new Random().nextInt(this.pointNodes
.size());
this.vantagePoint[0] = this.pointNodes.get(firstVantagePointIndex)
.getPoint();
this.firstThreshold = this.thresholdSelectionStrategy
.selectThreshold(this.pointNodes, this.vantagePoint[0],
this.distanceFunction);
int firstIndexPastThreshold;
try {
firstIndexPastThreshold = MVPTreeNode.partitionPoints(
this.pointNodes, this.vantagePoint[0],
this.firstThreshold, this.distanceFunction); } catch (final PartitionException e) {
this.storeInOneNode();
return;
} if (this.pointNodes.size() > this.capacity) {
List<MVPTreePoint<E>> subTreeList[] = new List[2]; subTreeList[0] = this.pointNodes.subList(0,
firstIndexPastThreshold);
subTreeList[1] = this.pointNodes.subList(
firstIndexPastThreshold, this.pointNodes.size()); // if points can be divided into 2 parts, find second vantage
// point and try to split point array
int secondVantagePointIndex = new Random()
.nextInt(subTreeList[1].size());
this.vantagePoint[1] = subTreeList[1].get(
secondVantagePointIndex).getPoint();
int splitPosition[] = new int[2];
for (int i = 0; i < 2; i++) {
this.secondThreshold[i] = this.thresholdSelectionStrategy
.selectThreshold(subTreeList[i],
this.vantagePoint[1], this.distanceFunction);
try {
splitPosition[i] = MVPTreeNode.partitionPoints(
subTreeList[i], this.vantagePoint[1],
this.secondThreshold[i], this.distanceFunction);
} catch (final PartitionException e) {
this.storeInOneNode();
return;
}
}
for (MVPTreePoint<E> pointNode : this.pointNodes) {
pointNode.addDistanceToSelf(this.distanceFunction
.getDistance(pointNode.getPoint(),
this.vantagePoint[0]));
pointNode.addDistanceToSelf(this.distanceFunction
.getDistance(pointNode.getPoint(),
this.vantagePoint[1]));
}
for (int i = 0; i < 2; i++) {
this.children[i][0] = new MVPTreeNode<>(
subTreeList[i].subList(0, splitPosition[i]),
this.distanceFunction,
this.thresholdSelectionStrategy, this.capacity,
this.maxLevel);
this.children[i][1] = new MVPTreeNode<>(
subTreeList[i].subList(splitPosition[i],
subTreeList[i].size()),
this.distanceFunction,
this.thresholdSelectionStrategy, this.capacity,
this.maxLevel);
}
this.pointNodes = null;
} else {
this.storeInOneNode();
}
}
} private void storeInOneNode() {
int maxIndex = 0;
double maxDistance = this.distanceFunction.getDistance(this.pointNodes
.get(0).getPoint(), this.vantagePoint[0]);
for (int i = 1; i < this.pointNodes.size(); i++) {
double curDistance = this.distanceFunction.getDistance(
this.pointNodes.get(i).getPoint(), this.vantagePoint[0]);
if (maxDistance < curDistance) {
maxDistance = curDistance;
maxIndex = i;
}
}
this.vantagePoint[1] = this.pointNodes.get(maxIndex).getPoint(); for (int i = 0; i < 2; i++) {
for (int j = 0; j < 2; j++) {
this.children[i][j] = null;
}
}
}

init MVPtree

  原作者给出了2种查询方式:找离查询点前k近点和找离查询点不远于u点。

  找离查询点前k点的算法可以沿用查询VPtree时的做法,先查找查询点所在的子结点,再查找其他子结点,注意要先判定收集者是否装满(没装满的话,不管是啥点都直接塞),再判定收集者与查询点的最远距离(对第二种查找方式来说是固定距离)是否小于点/点集与查询点的最近距离(在树结点和叶子结点都有用处)。

public void collectNearestNeighbors(
final NearestNeighborCollector<P, E> collector, int depth) {
if (this.pointNodes == null) {
// O1-Q
final double distanceFromFirstVantagePointToQueryPoint = this.distanceFunction
.getDistance(this.vantagePoint[0],
collector.getQueryPoint().getPoint()); // O2-Q
final double distanceFromSecondVantagePointToQueryPoint = this.distanceFunction
.getDistance(this.vantagePoint[1],
collector.getQueryPoint().getPoint()); collector.getQueryPoint().addDistanceToSelf(
distanceFromFirstVantagePointToQueryPoint);
collector.getQueryPoint().addDistanceToSelf(
distanceFromSecondVantagePointToQueryPoint); final MVPTreeNode<P, E> index = this
.getChildNodeForPoint(collector.getQueryPoint().getPoint());
index.collectNearestNeighbors(collector, depth + 1); // O1-Q - O1-S1
double basicDistance = distanceFromFirstVantagePointToQueryPoint
- this.firstThreshold; for(int i = 0;i < 2;i ++){
if (!collector.isFull() || basicDistance <= collector.getRadius()) {
// O2-Q - O2-S2
double touchDistance = distanceFromSecondVantagePointToQueryPoint
- this.secondThreshold[i]; for(int j = 0;j < 2;j ++){
if (index != this.children[i][j]
&& (!collector.isFull() || touchDistance <= collector.getRadius())) {
this.children[i][j].collectNearestNeighbors(collector, depth + 1);
}
touchDistance *= -1;
}
}
basicDistance *= -1;
}
collector.getQueryPoint().removeDistanceToSelf(depth + depth + 1);
collector.getQueryPoint().removeDistanceToSelf(depth + depth);
} else {
for (final MVPTreePoint<E> pointNode : this.pointNodes) {
if(!collector.isFull() || this.isAbleToInsert(collector.getRadius(),
collector.getQueryPoint(), pointNode)) {
collector.offerPoint(pointNode.getPoint());
}
}
}
}

collectNearestNeighbors

  找离查询点不远于u点算法就是论文里讲述的算法,执行步骤与收集第k近有相同之处,不同在于限定距离是固定值,且任何时候都必须判定,点集没有数量限制。

public void collectAllWithinDistance(final MVPTreePoint<P> queryPoint,
final double maxDistance, final Collection<E> collection, int depth) {
if (this.pointNodes == null) {
final double distanceFromFirstVantagePointToQueryPoint = this.distanceFunction
.getDistance(this.vantagePoint[0], queryPoint.getPoint());
final double distanceFromSecondVantagePointToQueryPoint = this.distanceFunction
.getDistance(this.vantagePoint[1], queryPoint.getPoint()); queryPoint
.addDistanceToSelf(distanceFromFirstVantagePointToQueryPoint);
queryPoint
.addDistanceToSelf(distanceFromSecondVantagePointToQueryPoint); // We want to search any of this node's children that intersect with
// the query region
if (distanceFromFirstVantagePointToQueryPoint <= this.firstThreshold
+ maxDistance) {
if (distanceFromSecondVantagePointToQueryPoint <= this.secondThreshold[0]
+ maxDistance) {
this.children[0][0].collectAllWithinDistance(queryPoint,
maxDistance, collection, depth + 1);
} if (distanceFromSecondVantagePointToQueryPoint + maxDistance >= this.secondThreshold[0]) {
this.children[0][1].collectAllWithinDistance(queryPoint,
maxDistance, collection, depth + 1);
}
} if (distanceFromFirstVantagePointToQueryPoint + maxDistance >= this.firstThreshold) {
if (distanceFromSecondVantagePointToQueryPoint <= this.secondThreshold[1]
+ maxDistance) {
this.children[1][0].collectAllWithinDistance(queryPoint,
maxDistance, collection, depth + 1);
} if (distanceFromSecondVantagePointToQueryPoint + maxDistance >= this.secondThreshold[1]) {
this.children[1][1].collectAllWithinDistance(queryPoint,
maxDistance, collection, depth + 1);
}
}
queryPoint.removeDistanceToSelf(depth + depth + 1);
queryPoint.removeDistanceToSelf(depth + depth);
} else {
for (MVPTreePoint<E> pointNode : pointNodes) {
if (this.isAbleToInsert(maxDistance, queryPoint, pointNode))
collection.add(pointNode.getPoint());
}
}
}

collectAllWithinDistance

  这两种查询方式都需要比较预先计算的距离,把这种计算合为一个函数:

public boolean isAbleToInsert(double limitDistance,
MVPTreePoint<P> queryPoint, MVPTreePoint<E> pointNode) { for (int i = 0; i < queryPoint.size(); i++) {
double disOffset = queryPoint.getDistanceToSelf(i)
- pointNode.getDistanceToSelf(i); if (Math.abs(disOffset) > limitDistance) {
return false;
}
} return this.distanceFunction.getDistance(pointNode.getPoint(),
queryPoint.getPoint()) <= limitDistance;
}

isAbleToInsert

  其他函数也需要修改,但是没有像这3个函数一样大幅度的修改结构。

-------------------------------我是分割线------------------------------------

代码地址:https://coding.net/u/funcfans/p/MVPtree-for-Java/git

用Java实现MVPtree——MVPtree核心算法代码的搭建的更多相关文章

  1. x264代码剖析(十三):核心算法之帧间预測函数x264_mb_analyse_inter_*()

    x264代码剖析(十三):核心算法之帧间预測函数x264_mb_analyse_inter_*() 帧间预測是指利用视频时间域相关性,使用临近已编码图像像素预測当前图像的像素,以达到有效去除视频时域冗 ...

  2. x264代码剖析(十五):核心算法之宏块编码中的变换编码

    x264代码剖析(十五):核心算法之宏块编码中的变换编码 为了进一步节省图像的传输码率.须要对图像进行压缩,通常採用变换编码及量化来消除图像中的相关性以降低图像编码的动态范围.本文主要介绍变换编码的相 ...

  3. 模拟退火算法SA原理及python、java、php、c++语言代码实现TSP旅行商问题,智能优化算法,随机寻优算法,全局最短路径

    模拟退火算法SA原理及python.java.php.c++语言代码实现TSP旅行商问题,智能优化算法,随机寻优算法,全局最短路径 模拟退火算法(Simulated Annealing,SA)最早的思 ...

  4. Java调用Javascript、Python算法总结

    最近项目中经常需要将Javascript或者Python中的算法发布为服务,而发布Tomcat服务则需要在Java中调用这些算法,因此就不免要进行跨语言调用,即在Java程序中调用这些算法. 不管是调 ...

  5. Java面试 32个核心必考点完全解析

    目录 课程预习 1.1 课程内容分为三个模块 1.2 换工作面临问题 1.3 课程特色 课时1:技术人职业发展路径 1.1 工程师发展路径 1.2 常见技术岗位划分 1.3 面试岗位选择 1.4 常见 ...

  6. [转]Java调用Javascript、Python算法总结

    最近项目中经常需要将Javascript或者Python中的算法发布为服务,而发布Tomcat服务则需要在Java中调用这些算法,因此就不免要进行跨语言调用,即在Java程序中调用这些算法. 不管是调 ...

  7. Java多线程--并行模式与算法

    Java多线程--并行模式与算法 单例模式 虽然单例模式和并行没有直接关系,但是我们经常会在多线程中使用到单例.单例的好处有: 对于频繁使用的对象可以省去new操作花费的时间: new操作的减少,随之 ...

  8. java垃圾回收机制--可达性算法

    先说一些题外话,Java虚拟机在执行Java程序的过程中会把它所管理的内存划分为若干个不同的数据区,这些区分为线程私有区和线程共享区 1.线程私有区 a.程序计数器 记录正在执行的虚拟机字节码指令地址 ...

  9. java线程池,工作窃取算法

    前言 在上一篇<java线程池,阿里为什么不允许使用Executors?>中我们谈及了线程池,同时又发现一个现象,当最大线程数还没有满的时候耗时的任务全部堆积给了单个线程, 代码如下: T ...

随机推荐

  1. SQL 的约束

    说明:文章所有内容均截选自用户"实验楼包工头"发布在实验楼上的教程[MySQL 基础课程],想要详细的学习SQL,点击教程即可免费学习了:未经允许,禁止转载: 约束是一种限制,它通 ...

  2. Python实现文字聊天室

    你是否想过用所学的Python开发一个图形界面的聊天室程序啊? 像这样的: 如果你想开发这样一个有点怀旧风格的聊天程序,那么可以接着看: 要开发这个聊天程序,你需要具备以下知识点: asyncore ...

  3. 解决异地服务器接口访问跨域,node构建反向代理

    跨域对于前端来说是一个老大难的问题,许多方法如jsonp.document.domain + iframe...都有或多或少的问题,一个最佳实践就是通过服务器nginx做反向代理,但奈何不懂相关知识, ...

  4. jenkins 添加 证书凭证Credentials

    jenkins 添加 证书凭证Credentials 大家都知道jenkins在拉取git项目代码的时候,如果没有配置 “证书凭证Credentials” 或者配置的不对, 就会出现红色报错,最终导致 ...

  5. 增删改(DML)操作

    增删改(DML)操作 1.1事务(transaction) 事务是数据库操作的最小单元,又ACID的特性,应该保证一个事务的sql语句要么同时成功,要么都不成功. Mybatis中配置了事务管理器,t ...

  6. Hibernate.基础篇《一》.Hibernate工具类.

    Hibernate.基础篇<一>.Hibernate工具类. 话述: Hibernate.基础篇第一篇,前面是代码.后面再加理论&实践. Hibernate使用的版本是:5.x,在 ...

  7. C# 基于DocumentFormat.OpenXml的数据导出到Excel

    using DocumentFormat.OpenXml; using DocumentFormat.OpenXml.Packaging; using DocumentFormat.OpenXml.S ...

  8. Unity shader学习之屏幕后期处理效果之运动模糊

    运动模糊,代码如下: using UnityEngine; public class MotionBlurRenderer : PostEffectRenderer { [Range(0.1f, 0. ...

  9. Keras中使用LSTM层时设置的units参数是什么

    https://www.zhihu.com/question/64470274 http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ht ...

  10. SQL性能优化前期准备-清除缓存、开启IO统计

    文章来至:https://www.cnblogs.com/Ren_Lei/p/5669662.html 如果需要进行SQl Server下的SQL性能优化,需要准备以下内容: 一.SQL查询分析器设置 ...