Recently i was doing some study on algorithms. A classic problem is to find the K largest(smallest) numbers from an array. I mainly studyed two methods, one is directly methold. It is an extension of select sort, always select the largest number from the array. The pseudo code is as below. The algorithm complexity is O(kn).

 function select(list[1..n], k)
for i from 1 to k
minIndex = i
minValue = list[i]
for j from i+1 to n
if list[j] < minValue
minIndex = j
minValue = list[j]
swap list[i] and list[minIndex]
return list[k] The C++ implementation is
template<typename T>
std::vector<T> SelectLargestKItem(const std::vector<T> &vecInput, size_t K, std::vector<int> &vecIndex)
{
if (K > vecInput.size())
return vecInput; std::vector<T> vecLocal(vecInput);
std::vector<T> vecResult;
for (size_t k = ; k < K; ++ k)
{
T maxValue = vecLocal[k];
int maxIndex = k;
for (size_t i = k + ; i < vecLocal.size(); ++i) {
if (vecLocal[i] > maxValue) {
maxValue = vecLocal[i];
maxIndex = i;
}
}
if (maxIndex != k)
std::swap(vecLocal[maxIndex], vecLocal[k]);
vecResult.push_back( maxValue );
vecIndex.push_back( maxIndex );
}
return vecResult;
}

When the total number of N is very large, such as N > 200,000. And the numbers need to select K is larger than 20, then the above algorithm will become time consuming. After do some research, i choose another algorithm to do the job. This method is a extension of heap sort. The steps work as below:

1) Build a Min Heap MH of the first k elements (arr[0] to arr[k-1]) of the given array. O(k)

2) For each element, after the kth element (arr[k] to arr[n-1]), compare it with root of MH.
……a) If the element is greater than the root then make it root and call heapifyfor MH
……b) Else ignore it.
// The step 2 is O((n-k)*logk)

3) Finally, MH has k largest elements and root of the MH is the kth largest element.

Time Complexity: O(k + (n-k)Logk) without sorted output. If sorted output is needed then O(k + (n-k)Logk + kLogk).

The C++ implementation of the method is as below:

// To heapify a subtree rooted with node i which is
// an index in arr[]. n is size of heap
template<typename T>
void heapifyMinToRoot(std::vector<T> &vecInput, const int n, const int i, std::vector<int> &vecIndex)
{
int smallestIndex = i; // Initialize largest as root
int l = * i + ; // left = 2*i + 1
int r = * i + ; // right = 2*i + 2 // If left child is larger than root
if (l < n && vecInput[l] < vecInput[smallestIndex])
smallestIndex = l; // If right child is larger than largest so far
if (r < n && vecInput[r] < vecInput[smallestIndex])
smallestIndex = r; // If largest is not root
if (smallestIndex != i)
{
std::swap(vecInput[i], vecInput[smallestIndex]);
std::swap(vecIndex[i], vecIndex[smallestIndex]); // Recursively heapify the affected sub-tree
heapifyMinToRoot(vecInput, n, smallestIndex, vecIndex);
}
} template<typename T>
std::vector<T> SelectLargestKItemHeap(const std::vector<T> &vecInput, const size_t K, std::vector<int> &vecIndex)
{
if (K > vecInput.size()) {
std::vector<T> vecResult(vecInput);
std::sort(vecResult.begin(), vecResult.end());
std::reverse(vecResult.begin(), vecResult.end());
for (size_t i = ; i < vecInput.size(); ++i)
vecIndex.push_back(i);
return vecResult;
} std::vector<T> vecLocal(vecInput);
std::vector<T> vecResult(vecInput.begin(), vecInput.begin() + K);
vecIndex.clear();
for (size_t i = ; i < K; ++ i) vecIndex.push_back(i); for (int K1 = K / - ; K1 >= ; -- K1)
heapifyMinToRoot(vecResult, K, K1, vecIndex); for (size_t i = K; i < vecLocal.size(); ++ i) {
if (vecLocal[i] > vecResult[]) {
vecResult[] = vecLocal[i];
vecIndex[] = i; for (int K1 = K / - ; K1 >= ; -- K1)
heapifyMinToRoot(vecResult, K, K1, vecIndex);
}
}
for (int k = K - ; k >= ; -- k )
{
std::swap(vecResult[k], vecResult[]);
std::swap(vecIndex[k], vecIndex[]); heapifyMinToRoot(vecResult, k, , vecIndex);
} return vecResult;
}

Here is the code to test these two methods.

void SelectionAlgorithmBenchMark()
{
int N = ;
std::vector<int> vecInput; std::minstd_rand0 generator();
for (int i = ; i < N; ++i)
{
int nValue = generator();
vecInput.push_back(nValue );
}
std::vector<int> vecResult, vecIndex;
int K = ;
CStopWatch stopWatch;
vecResult = SelectLargestKItem<int>(vecInput, K, vecIndex);
std::cout << "Standard algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
for (int k = ; k < K; ++k)
{
std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
}
std::cout << std::endl; stopWatch.Start();
vecResult = SelectLargestKItemHeap<int>(vecInput, K, vecIndex);
std::cout << "Heap algorithm SelectLargestKItem takes " << stopWatch.Now() << " ms" << std::endl;
for (int k = ; k < K; ++k)
{
std::cout << "Index " << vecIndex[k] << ", value " << vecResult[k] << std::endl;
}
}

When N is 200000, K is 20, the first method takes 353ms, the second method takes 31ms. The difference is more than 10 times.

Find the largest K numbers from array (找出数组中最大的K个值)的更多相关文章

  1. 215. Kth Largest Element in an Array找出数组中第k大的值

    堆排序做的,没有全部排序,找到第k个就结束 public int findKthLargest(int[] nums, int k) { int num = 0; if (nums.length &l ...

  2. [LeetCode] Find All Numbers Disappeared in an Array 找出数组中所有消失的数字

    Given an array of integers where 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and ot ...

  3. [LeetCode] Find All Duplicates in an Array 找出数组中所有重复项

    Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and others ...

  4. 442. Find All Duplicates in an Array找出数组中所有重复了两次的元素

    [抄题]: Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and o ...

  5. 前端算法题:找出数组中第k大的数字出现多少次

    题目:给定一个一维数组,如[1,2,4,4,3,5],找出数组中第k大的数字出现多少次. 例如:第2大的数是4,出现2次,最后输出 4,2 function getNum(arr, k){ // 数组 ...

  6. 【Java】 剑指offer(1) 找出数组中重复的数字

    本文参考自<剑指offer>一书,代码采用Java语言. 更多:<剑指Offer>Java实现合集 题目 在一个长度为n的数组里的所有数字都在0到n-1的范围内.数组中某些数字 ...

  7. 《剑指offer》第三_一题(找出数组中重复的数字,可改变数组)

    // 面试题3(一):找出数组中重复的数字 // 题目:在一个长度为n的数组里的所有数字都在0到n-1的范围内.数组中某些数字是重复的,但不知道有几个数字重复了, // 也不知道每个数字重复了几次.请 ...

  8. 1. 找出数组中的单身狗OddOccurrencesInArray Find value that occurs in odd number of elements.

    找出数组中的单身狗: 1. OddOccurrencesInArray Find value that occurs in odd number of elements. A non-empty ze ...

  9. 【Offer】[3-1] 【找出数组中重复的数字】

    题目描述 思路 Java代码 代码链接 题目描述 在一个长度为n的数组里的所有数字都在0~n-1的范围内.数组中某些数字是重复的,但不知道有几个数字重复了,也不知道每个数字重复了几次. 请找出数组中任 ...

随机推荐

  1. mybatis 与 xml

    mybatis的两大重要组件:配置和映射文件,都是可以通过xml配置的(新版本新增了注解的方式配置Mapper),下面来解析下mybatis是怎么做的 其中,关于配置文件解析的主要是在这个类XMLCo ...

  2. AlwaysOn添加高可用性自定义登陆用户的方法

    1.在主服务器添加自定义登陆用户,比如TestUser 2.在主服务器执行如下SQL,在master数据库创建存储过程sp_hexadecimal,sp_help_revlogin USE maste ...

  3. 关于redis的主从复制

    redis主从复制需要注意的一个问题 这两天我朋友在使用redis偶尔会遇见一个问题,就是所有的缓存莫名其妙会不见,找了好久都没找到,他一直以为 有人错误执行了什么命令 他跟我说的时候我估计是主从复制 ...

  4. CSS的一些简单概念

    行内元素与块级元素 在标准文档流里面,块级元素具有以下特点: ①总是在新行上开始,占据一整行:②高度,行高以及外边距和内边距都可控制:③宽带始终是与浏览器宽度一样,与内容无关:④它可以容纳内联元素和其 ...

  5. 资金归集率比率sql

    基础资料 select bd_glorgbook.glorgbookcode, nvl(replace(bd_glorgbook.glorgbookname,'集团基准账薄',''),'小计')公司名 ...

  6. expr命令的一些用法

    expr是evaluate expressions的缩写,我的理解它的作用就是用来输出表达式的值. 看下面的几个例子. (1)进行数值运算 $:expr 1 + 2     //'+' 左右两边必须有 ...

  7. [转]mysql在windows下支持表名大小写,lower_case_table_names

    windows下mysql默认是不支表名大小写的,也就是表名大小写不敏感.用phpmyadmin创建的驼峰式表名,全部被强制成小写.mysql表名大小写敏感的参数: lower_case_table_ ...

  8. ylbtech-Miscellaneos

    ylbtech-Miscellaneos: A,返回顶部 1, 2, B返回顶部 1, 2 作者:ylbtech出处:http://ylbtech.cnblogs.com/本文版权归作者和博客园共有, ...

  9. CentOS7 修改主机名(转)

    转载出处:http://www.centoscn.com/CentOS/config/2014/1031/4039.html CentOS7 时间同步:http://www.cnblogs.com/r ...

  10. HttpWebRequest-header设置

    http://www.cnblogs.com/yczz/archive/2012/06/01/2530484.html http://blog.csdn.net/htsnoopy/article/de ...