用C代码、bash实现代码遍历文件夹下所有文件

递归方式实现如下：

void listdir(char *path)

{

    DIR         *ptr_dir;

    struct dirent   *dir_entry;

    int         i = ;

    char        *child_path;

    char        *file_path;

    struct stat sb_sub = {};

    struct stat sb  = {};

    if ( != stat(ppath, &sb))

        return;

    child_path = (char*)malloc(sizeof(char)*MAX_PATH_LENGTH);

    if(child_path == NULL)

    {

        printf("allocate memory for path failed.\n");

        return;

    }

    memset(child_path, , sizeof(char)*MAX_PATH_LENGTH);  

    file_path = (char*)malloc(sizeof(char)*MAX_PATH_LENGTH);

    if(file_path == NULL)

    {

        printf("allocate memory for file path failed.\n");

        free(child_path);

        child_path = NULL;

        return;

    }

    memset(file_path, , sizeof(char)*MAX_PATH_LENGTH);  

    ptr_dir = opendir(path);

    while((dir_entry = readdir(ptr_dir)) != NULL)

    {  

        if(strcmp(dir_entry->d_name,".") ==  ||  strcmp(dir_entry->d_name,"..") == )

            continue;

        memset(&sb_sub, , sizeof(struct stat));

        sprintf(child_path, "%s/%s", path, dir_entry->d_name);

        if( !=  stat(child_path, &sb_sub))

            continue;

        visit_dirs++;  

        if(S_ISDIR(sb_sub.st_mode))

        {

            printf("[DIR]%s\n", child_path);

            listdir(child_path);

        }

        else

        {

           sprintf(file_path, "%s/%s", path, dir_entry->d_name);

           printf("[FILE]%s\n", file_path);

           visit_files++;

        }

    }  

    free(child_path);

    child_path = NULL;  

    free(file_path);

    file_path = NULL;

}

递归方式阅读比较清晰，会因使用栈的空间来保存局部变量（还有参数、返回地址等）而导致的stack overflow的问题。

系统给程序分配的内存有一部分是用来作栈使用的，栈在最大的地址开始，需要“申请”栈的时候就让栈顶指针也就是esp指向更低（往下“走”）的空间，当栈增长太大乃至超过堆（堆是往上“走”的）的范围时就是所谓stack overflow/collide，可以想象的是要么栈破坏堆上存储的数据，要么就是程序“返回”到非法的地址去执行指令，多么可怕啊，不过现在堆栈貌似是被分配在不同内存页上的，操作系统尽了最大的努力对堆栈进行保护，所以其实也不是很恐怖，大不了就是整个进程被操作系统kill掉。

尽管如此，谁也不希望自己的程序这么死掉，那多不给力啊！stack overflow的异常即使用try/catch也不行（C++不能catch这种异常，必须用Windows自己的）递归需要额外的函数调用开销，如果代码是在多线程环境下执行，那还会面临一个系统分配给每个线程的堆栈大小限制的问题，Windows下每个线程默认分配1M的空间。

那看看上面那个递归版本的函数会需要多少局部空间：WIN32_FIND_DATA 的大小是 320，MAX_PATH的buffer是260，其余变量和参数忽略，那么一次函数调用需要580个字节（ANSI环境），也就是说最大能递归多少层？

答案是 1808 层。

换句话说，从根目录开始，最多只能遍历到1907层深的文件夹结构，再深层的文件就遍历不了了。而实际上，我们是没办法创建这么深层次的目录树结构的，试试看就知道Windows会提示超出限制。

其实看看 MAX_PATH 的值就知道了，不是才260么，哪有可能给你弄到1800多层？

NTFS不是据称很先进么，莫非也这么不给力？在多字节字符环境下，这个限制将使得我们最多只能创建一百多层深的文件夹结构。

翻翻MSDN上关于FindFirstFile的说明，原来微软还留有一手：

In the ANSI version of this function, the name is limited to MAX_PATH characters. To extend this limit to 32,767 widecharacters, call the Unicode version of the function and prepend "\\?\" to the path. For more information, see Naming a File.

简单说为了让这个限制突破到32767个宽字节字符（说了是宽字符了，那当然得是UNICODE环境下了），就要在路径前加上 \\?\ （这个办法有个缺点，那就是不能访问根目录）。

这下，我们完全有机会遇到1M的线程堆栈限制，虽然搞不懂为什么既然微软已经考虑到并提供了增加文件路径长度的方案，而我们仍然不能创建那么长的路径，但这至少给写个非递归版本的遍历文件函数提供了个理由。

非递归方式实现

思路：使用一个stack，遍历文件夹时，如果是dir则push该dir到stack，如果是文件则做文件处理，循环pop出stack中的dir 并处理。

引用 http://blog.csdn.net/yufei_email/article/details/42624551 代码

#include <unistd.h>

#include <cstdio>

#include <dirent.h>

#include <sys/stat.h>

#include <sys/time.h>

#include <iostream>

#include <deque>

#include <list>

#include <pthread.h>

using namespace std;

bool g_bIsListDirEnd = false;

pthread_mutex_t g_mutex;

void FormatDir(string& strDir)

{

    if (strDir.at(strDir.length() -) != '\\' &&

        strDir.at(strDir.length() -) != '/')

    {

        strDir += '/';

    }

}

void* thrd_func(void *arg)

{

    list<string>* pList = (list<string>*)arg;

    if (!arg)

    {

        cout << "arg is null" << endl;

        pthread_exit((void *));

    }

    int nCount = ;

    while(true)

    {

        string strFileName;

        pthread_mutex_lock(&g_mutex);

        if (pList->size() > )

        {

            strFileName = pList->front();

            pList->pop_front();

            ++nCount;

        }

        pthread_mutex_unlock(&g_mutex);

        if (!strFileName.empty())

        {

            usleep();

            if (nCount %  == )

            {

                cout << strFileName << " " << nCount << endl;

            }

        }

        else

        {

            usleep();

        }

        pthread_mutex_lock(&g_mutex);

        int nSize = pList->size();

        pthread_mutex_unlock(&g_mutex);

        if (g_bIsListDirEnd && nSize == )

        {

            break;

        }

    }

    cout << "thread deals total files:" << nCount << endl;

}

void ListDir(const string& strTopDir)

{

    deque<string> deqDirs;

    deqDirs.push_back(strTopDir);

    list<string> listFiles;

    const int nThreadNum = ;

    pthread_t tid[nThreadNum];

    for(int i = ; i < nThreadNum; ++i)

    {

        if (pthread_create(&tid[i], NULL, thrd_func, &listFiles) != )

        {

            cout << "create thread failed:" << i << endl;

            return;

        }

    }

    int nCountDirs = ;

    int nCountFiles = ;

    while(deqDirs.size() > )

    {

        string strDir = deqDirs.front();

        deqDirs.pop_front();

        if (strDir.empty())

        {

            continue;

        }

        DIR *dir;

        if(!(dir = opendir(strDir.c_str())))

        {

            cout << "open dir failed:" << strDir << endl;

            continue;

        }

        FormatDir(strDir);

        struct dirent *file;

        while((file = readdir(dir)) != NULL)

        {

            if (file->d_name[] == '.')

            {

                continue;

            }

            string strFullPath = strDir + file->d_name;

            struct stat stFile;

            if(stat(strFullPath.c_str(), &stFile) >=  && S_ISDIR(stFile.st_mode))

            {

                if (strFullPath.find("/proc") == string::npos &&

                    strFullPath.find("/sys") == string::npos &&

                    strFullPath.find("/dev") == string::npos)

                {

                    deqDirs.push_back(strFullPath);

                    ++nCountDirs;

                }

            }

            else

            {

                while(true)

                {

                    pthread_mutex_lock(&g_mutex);

                    int nSize = listFiles.size();

                    pthread_mutex_unlock(&g_mutex);

                    if (nSize < )

                    {

                        break;

                    }

                    cout << "wait files dealed" << endl;

                    usleep();

                }

                 pthread_mutex_lock(&g_mutex);

                 listFiles.push_back(strFullPath);

                 pthread_mutex_unlock(&g_mutex);

                ++nCountFiles;

            }

        }

        closedir(dir);

    }

    cout << "list dir end" << endl;

    g_bIsListDirEnd = true;

    for(int i = ; i < nThreadNum; ++i)

    {

        void *tret;

        pthread_join(tid[i],&tret);

    }

    cout << "dirs:" << nCountDirs << "  files:" << nCountFiles << "  total:" << nCountDirs + nCountFiles << endl;

}

int main(int argc, char* argv[])

{

    if (argc != )

    {

        printf("argc error!\n");

        return -;

    }

    pthread_mutex_init(&g_mutex,NULL);

    struct timeval tv1;

    gettimeofday(&tv1, NULL);

    ListDir(argv[]);

    struct timeval tv2;

    gettimeofday(&tv2, NULL);

    cout << "cost time:" << (tv2.tv_sec - tv1.tv_sec)* + (tv2.tv_usec - tv1.tv_usec)/ << "ms" << endl;

    pthread_mutex_destroy(&g_mutex);

    return ;

}

参考

1. http://www.codeproject.com/KB/files/CEnum_enumeration.aspx

2. http://www.codeproject.com/KB/cpp/recursedir.aspx

3. http://www.codeproject.com/KB/edit/XEditPrompt.aspx

c bash 代码遍历文件夹下所有文件的更多相关文章

C#遍历文件夹下所有文件
FolderForm.cs的代码如下: using System; using System.Collections.Generic; using System.Diagnostics; using ...
java中File类应用：遍历文件夹下所有文件
练习: 要求指定文件夹下的所有文件,包括子文件夹下的文件代码: package 遍历文件夹所有文件; import java.io.File; public class Test { public ...
opencv实现遍历文件夹下所有文件
前言最近需要将视频数据集中的每个视频进行分割,分割成等长的视频片段,前提是需要首先遍历数据集文件夹中的所有视频. 实现 1.了解opencv中的Directory类: 2.实现测试代码: 系统环境 ...
PHP使用glob方法遍历文件夹下所有文件
PHP使用glob方法遍历文件夹下所有文件遍历文件夹下所有文件,一般可以使用opendir 与 readdir 方法来遍历.<pre><?php$path = dirname(__ ...
python （9）统计文件夹下的所有文件夹数目、统计文件夹下所有文件数目、遍历文件夹下的文件
命令:os 用到的:os.walk os.listdir 写的爬虫爬的数据,但是又不知道进行到哪了,于是就写了个脚本来统计文件的个数 #统计 /home/dir/ 下的文件夹个数 import o ...
PHP遍历文件夹下的文件和获取到input name的值
<?php$dir = dirname(__FILE__); //要遍历的目录名字 ->当前文件所在的文件夹//$dir='D:\PHP\wamp\www\admin\hosts\admi ...
利用shell脚本或者php移动某个文件夹下的文件到各自的日期组成的目录下
背景是这样的:网站一开始访问量比较小,大家就把所有的图片文件上传到一个目录下(比如是/data/images/).后来访问量大了,图片也多了,这样就影响读取效率.所以有个这样的需求,把这些个图片文件移 ...
FILE文件删除操作（删除指定文件夹下所有文件和文件夹包括子文件夹下所有文件和文件夹），就是删除所有
2018-11-05 19:42:08开始写选择删除 1.FileUtils.java类 import java.io.File;//导入包 import java.util.List;//导入 ...
python 替换文件夹下的文件名称及文件内容
示例效果: 1.替换某文件夹下的文件夹及子文件夹的名称由OldStrDir 变为 NewStrDir: 2.替换某文件夹下的文件夹及子文件夹下所有的文件的名称由OldStrFile 变为 ...

随机推荐

[c++,bson] linux 使用 BSON 编程[www]
[c++,bson] linux 使用 BSON 编程 http://blog.chinaunix.net/uid-28595538-id-4987410.html 1.js db=db.getSib ...
集合类---set
定义:一个不包含重复元素的collection.set 不包含满足 e1.equals(e2) 的元素对 e1 和 e2,并且最多包含一个 null 元素,不保证集合里元素的顺序. 方法使用详解: 1 ...
[ python ] 类的组合
首先,使用面向对象是一个人狗大战的实例: class Person: def __init__(self, name, hp, aggr, sex): self.name = name self.hp ...
leetcode 之Rotate List（18）
这题我的第一想法是用头插法,但实际上并不好做,因为每次都需要遍历最后一个.更简单的做法是将其连成环,找到相应的位置重新设头结点和尾结点.这过有很多细节需要注意,比如K有可能是大于链表长度的,如何重新 ...
python基础（4）---解释器、编码、字符拼接
1.Python种类 1.1Cpython Python官方版本,使用C语言实现,运行机制:先编译.py(源码文件)->pyc(字节码文件),最终执行时先将字节码转换成机器码,然后交给cpu执行 ...
WebDriver自动化测试工具(1)---环境搭建
Webdriver是一个前端自动化测试工具,可以模拟用户点击链接,填写表单,点击按钮等操作,下面介绍其使用一.下载WebdriverC#类库以及对应浏览器驱动 http://www.selenium ...
安卓APP安全测试基础
学习牛人经验,结合自己的测试,做简单总结: 简介:安卓APP安全测试目前主要覆盖以下方面:1)自身组件安全2)本地敏感数据保护3)web接口安全一.自身组件安全目前手动.开源或免费工具均能检测此类漏 ...
最小生成树算法详解（prim+kruskal）
最小生成树概念: 一个有 n 个结点的连通图的生成树是原图的极小连通子图,且包含原图中的所有 n 个结点,并且有保持图连通的最少的边. 最小生成树可以用kruskal(克鲁斯卡尔)算法或prim(普里 ...
bootstrap 在django中的使用
一.应用 http://www.bootcss.com/进入bootstrap4或bootstrap3中文网,想要快速地将 Bootstrap 应用到你的项目中,有以下两种办法: 1.boot ...
iReport学习笔记
概述主要介绍如何根据jasper报表和数据生成pdf文档,中文字体问题的解决方案和日期时间的格式化输出. iReport版本:5.2.0 生成pdf文档 maven依赖 <dependency ...

c bash 代码遍历文件夹下所有文件

递归方式实现如下：

非递归方式实现

参考

c bash 代码遍历文件夹下所有文件的更多相关文章

随机推荐

热门专题