Find the largest K numbers from array (找出数组中最大的K个值)
Recently i was doing some study on algorithms. A classic problem is to find the K largest(smallest) numbers from an array. I mainly studyed two methods, one is directly methold. It is an extension of select sort, always select the largest number from the array. The pseudo code is as below. The algorithm complexity is O(kn).
function select(list[1..n], k)
for i from 1 to k
minIndex = i
minValue = list[i]
for j from i+1 to n
if list[j] < minValue
minIndex = j
minValue = list[j]
swap list[i] and list[minIndex]
return list[k] The C++ implementation is
template<typename T>
std::vector<T> SelectLargestKItem(const std::vector<T> &vecInput, size_t K, std::vector<int> &vecIndex)
{
if (K > vecInput.size())
return vecInput; std::vector<T> vecLocal(vecInput);
std::vector<T> vecResult;
for (size_t k = ; k < K; ++ k)
{
T maxValue = vecLocal[k];
int maxIndex = k;
for (size_t i = k + ; i < vecLocal.size(); ++i) {
if (vecLocal[i] > maxValue) {
maxValue = vecLocal[i];
maxIndex = i;
}
}
if (maxIndex != k)
std::swap(vecLocal[maxIndex], vecLocal[k]);
vecResult.push_back( maxValue );
vecIndex.push_back( maxIndex );
}
return vecResult;
}
When the total number of N is very large, such as N > 200,000. And the numbers need to select K is larger than 20, then the above algorithm will become time consuming. After do some research, i choose another algorithm to do the job. This method is a extension of heap sort. The steps work as below:
1) Build a Min Heap MH of the first k elements (arr[0] to arr[k-1]) of the given array. O(k)
2) For each element, after the kth element (arr[k] to arr[n-1]), compare it with root of MH.
……a) If the element is greater than the root then make it root and call heapifyfor MH
……b) Else ignore it.
// The step 2 is O((n-k)*logk)
3) Finally, MH has k largest elements and root of the MH is the kth largest element.
Time Complexity: O(k + (n-k)Logk) without sorted output. If sorted output is needed then O(k + (n-k)Logk + kLogk).
The C++ implementation of the method is as below:
// To heapify a subtree rooted with node i which is
// an index in arr[]. n is size of heap
template<typename T>
void heapifyMinToRoot(std::vector<T> &vecInput, const int n, const int i, std::vector<int> &vecIndex)
{
int smallestIndex = i; // Initialize largest as root
int l = * i + ; // left = 2*i + 1
int r = * i + ; // right = 2*i + 2 // If left child is larger than root
if (l < n && vecInput[l] < vecInput[smallestIndex])
smallestIndex = l; // If right child is larger than largest so far
if (r < n && vecInput[r] < vecInput[smallestIndex])
smallestIndex = r; // If largest is not root
if (smallestIndex != i)
{
std::swap(vecInput[i], vecInput[smallestIndex]);
std::swap(vecIndex[i], vecIndex[smallestIndex]); // Recursively heapify the affected sub-tree
heapifyMinToRoot(vecInput, n, smallestIndex, vecIndex);
}
} template<typename T>
std::vector<T> SelectLargestKItemHeap(const std::vector<T> &vecInput, const size_t K, std::vector<int> &vecIndex)
{
if (K > vecInput.size()) {
std::vector<T> vecResult(vecInput);
std::sort(vecResult.begin(), vecResult.end());
std::reverse(vecResult.begin(), vecResult.end());
for (size_t i = ; i < vecInput.size(); ++i)
vecIndex.push_back(i);
return vecResult;
} std::vector<T> vecLocal(vecInput);
std::vector<T> vecResult(vecInput.begin(), vecInput.begin() + K);
vecIndex.clear();
for (size_t i = ; i < K; ++ i) vecIndex.push_back(i); for (int K1 = K / - ; K1 >= ; -- K1)
heapifyMinToRoot(vecResult, K, K1, vecIndex); for (size_t i = K; i < vecLocal.size(); ++ i) {
if (vecLocal[i] > vecResult[]) {
vecResult[] = vecLocal[i];
vecIndex[] = i; for (int K1 = K / - ; K1 >= ; -- K1)
heapifyMinToRoot(vecResult, K, K1, vecIndex);
}
}
for (int k = K - ; k >= ; -- k )
{
std::swap(vecResult[k], vecResult[]);
std::swap(vecIndex[k], vecIndex[]); heapifyMinToRoot(vecResult, k, , vecIndex);
} return vecResult;
}
Here is the code to test these two methods.
void SelectionAlgorithmBenchMark()
{
int N = ;
std::vector<int> vecInput; std::minstd_rand0 generator();
for (int i = ; i < N; ++i)
{
int nValue = generator();
vecInput.push_back(nValue );
}
std::vector<int> vecResult, vecIndex;
int K = ;
CStopWatch stopWatch;
vecResult = SelectLargestKItem<int>(vecInput, K, vecIndex);
std::cout << "Standard algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
for (int k = ; k < K; ++k)
{
std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
}
std::cout << std::endl; stopWatch.Start();
vecResult = SelectLargestKItemHeap<int>(vecInput, K, vecIndex);
std::cout << "Heap algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
for (int k = ; k < K; ++k)
{
std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
}
}
When N is 200000, K is 20, the first method takes 353ms, the second method takes 31ms. The difference is more than 10 times.
Find the largest K numbers from array (找出数组中最大的K个值)的更多相关文章
- 215. Kth Largest Element in an Array找出数组中第k大的值
堆排序做的,没有全部排序,找到第k个就结束 public int findKthLargest(int[] nums, int k) { int num = 0; if (nums.length &l ...
- [LeetCode] Find All Numbers Disappeared in an Array 找出数组中所有消失的数字
Given an array of integers where 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and ot ...
- [LeetCode] Find All Duplicates in an Array 找出数组中所有重复项
Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and others ...
- 442. Find All Duplicates in an Array找出数组中所有重复了两次的元素
[抄题]: Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and o ...
- 前端算法题:找出数组中第k大的数字出现多少次
题目:给定一个一维数组,如[1,2,4,4,3,5],找出数组中第k大的数字出现多少次. 例如:第2大的数是4,出现2次,最后输出 4,2 function getNum(arr, k){ // 数组 ...
- 【Java】 剑指offer(1) 找出数组中重复的数字
本文参考自<剑指offer>一书,代码采用Java语言. 更多:<剑指Offer>Java实现合集 题目 在一个长度为n的数组里的所有数字都在0到n-1的范围内.数组中某些数字 ...
- 《剑指offer》第三_一题(找出数组中重复的数字,可改变数组)
// 面试题3(一):找出数组中重复的数字 // 题目:在一个长度为n的数组里的所有数字都在0到n-1的范围内.数组中某些数字是重复的,但不知道有几个数字重复了, // 也不知道每个数字重复了几次.请 ...
- 1. 找出数组中的单身狗OddOccurrencesInArray Find value that occurs in odd number of elements.
找出数组中的单身狗: 1. OddOccurrencesInArray Find value that occurs in odd number of elements. A non-empty ze ...
- 【Offer】[3-1] 【找出数组中重复的数字】
题目描述 思路 Java代码 代码链接 题目描述 在一个长度为n的数组里的所有数字都在0~n-1的范围内.数组中某些数字是重复的,但不知道有几个数字重复了,也不知道每个数字重复了几次. 请找出数组中任 ...
随机推荐
- Linux java环境安装
一.jdk yum 安装方法 1.wegt http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260 ...
- [Linux] -Docker修改空间大小
Docker默认空间大小分为两个,一个是池空间大小,另一个是容器空间大小. 池空间大小默认为:100G 容器空间大小默认为是:10G 所以修改空间大小也分为两个: 这里使用centos下的yum进行安 ...
- 使用PowerDesigner把oom设计图导出jpg格式的图片
1: 按住Shift键点击鼠标选择要导出的对象,必须先选择. 2: 选择Edit—>Export Image 到出你需要的格式,如下图
- Activity Intent Flags及Task相关属性
转自http://www.cnblogs.com/lwbqqyumidi/p/3775479.html 今天我们来讲一下Activity的task相关内容. 上次我们讲到Activity的四种启动模式 ...
- 红黑树(Red-Black tree)
红黑树又称红-黑二叉树,它首先是一颗二叉树,它具体二叉树所有的特性.同时红黑树更是一颗自平衡的排序二叉树.我们知道一颗基本的二叉树他们都需要满足一个基本性质–即树中的任何节点的值大于它的左子节点,且小 ...
- JVM实用参数(六) 吞吐量收集器
在实践中我们发现对于大多数的应用领域,评估一个垃圾收集(GC)算法如何根据如下两个标准: 吞吐量越高算法越好 暂停时间越短算法越好 首先让我们来明确垃圾收集(GC)中的两个术语:吞吐量(through ...
- C# 与JAVA 的RSA 加密解密交互,互通,C#使用BouncyCastle来实现私钥加密,公钥解密的方法
因为C#的RSA加密解密只有公钥加密,私钥解密,没有私钥加密,公钥解密.在网上查了很久也没有很好的实现.BouncyCastle的文档少之又少.很多人可能会说,C#也是可以的,通过Biginteger ...
- VS 远程调试之 “The visual studio remote debugger does not support this edition of windows”
The error message "The visual studio remote debugger does not support this edition of windows&q ...
- [JavaScript]顺序的异步执行
我们知道,在适用js的时候,程序是单线程执行的,而且如果遇到阻塞就会将浏览器卡死. 能否异步的执行,让程序不再卡呢? 可以,用setTimeout. 但是,问题又来了,如果我有这样的要求: 执行一个函 ...
- Nopcommerce 二次开发1 基础
1 Doamin 酒店 namespace Nop.Core.Domain.Hotels { /// <summary> /// 酒店 /// </summary> p ...