LUA中将未分类数据分为测试集和训练集

require 'torch'

require 'image'

local  setting = {parent_root = '/home/pxu/image'}

function list_children_root(path)

        local i,t,popen = ,{},io.popen

        for file_name in popen('ls -a ' .. path):lines() do

                i = i +

                if i> then

                        t[i-] = file_name

                --if i>0 then

                        --t[i] = file_name

                end

        end

        return t

end

function list_img(path)

        --print(path)

        local i,t,popen = ,{},io.popen

        for file_name in popen('ls -a ' .. path .. ' |grep jpg'):lines() do

                i = i +

                t[i] = file_name

        end

        return t

end

print('obtain children root path ...')

train_paths,train_labels = {},{}

test_paths,test_labels = {}, {}

children_paths = list_children_root(setting.parent_root)

print(children_paths)

num_train,num_test =,

print('spit data begin')

for i=,table.getn(children_paths)  do

 children_root = setting.parent_root ..'/'..children_paths[i]

        print(children_root)

 img_names = list_img(children_root)

ranIdx = torch.randperm(table.getn(img_names))

        for j=,table.getn(img_names)do

                if j<=math.floor(0.6*table.getn(img_names)) then

                        local idx = ranIdx[{j}]

                        train_paths[num_train] = children_root .. '/'..img_names[idx]

                        train_labels[num_train]=i

                        num_train = num_train+

                else

                        local idx = ranIdx[{j}]

                        test_paths[num_test]=children_root .. '/' ..img_names[idx]

                        test_labels[num_test]=i

                        num_test = num_test+

end

end

end

print('begin copy')

local nTrain,nTest = table.getn(train_paths),table.getn(test_paths)

for i=1,nTrain do

        local aimpath = '/home/yqcui/image/train/'..train_labels[i]..'/'..i..'.jpg'

        local todo='cp '..train_paths[i]..' ' ..aimpath

        print(todo)

        os.execute(todo)

end

for i=,nTest do

        local aimpath = '/home/yqcui/image/train/'..test_labels[i]..'/'..i..'.jpg'

        local todo='cp '..test_paths[i]..' ' .. aimpath

        print(todo)

        os.execute(todo)

end

将数据分为数据集和训练集，比例为6:4

LUA中将未分类数据分为测试集和训练集的更多相关文章

Matlab划分测试集和训练集
% x是原数据集,分出训练样本和测试样本 [ndata, D] = size(X); %ndata样本数,D维数 R = randperm(ndata); %1到n这些数随机打乱得到的一个随机数字序列 ...
【ML入门系列】（一）训练集、测试集和验证集
训练集.验证集和测试集这三个名词在机器学习领域极其常见,但很多人并不是特别清楚,尤其是后两个经常被人混用. 在有监督(supervise)的机器学习中,数据集常被分成2~3个,即:训练集(train ...
【Machine Learning】训练集验证集测试集区别
最近在Udacity上学习Machine learning课程,对于验证集.测试集和训练集的相关概念有些模糊.故整理相关资料如下. 交叉检验(Cross Validation) 在数据分析中,有些算法 ...
斯坦福大学公开课机器学习：advice for applying machine learning | model selection and training/validation/test sets（模型选择以及训练集、交叉验证集和测试集的概念）
怎样选用正确的特征构造学习算法或者如何选择学习算法中的正则化参数lambda?这些问题我们称之为模型选择问题. 在对于这一问题的讨论中,我们不仅将数据分为:训练集和测试集,而是将数据分为三个数据组:也 ...
[DeeplearningAI笔记]改善深层神经网络1.1_1.3深度学习使用层面_偏差/方差/欠拟合/过拟合/训练集/验证集/测试集
觉得有用的话,欢迎一起讨论相互学习~Follow Me 1.1 训练/开发/测试集对于一个数据集而言,可以将一个数据集分为三个部分,一部分作为训练集,一部分作为简单交叉验证集(dev)有时候也成为验 ...
机器学习基础：(Python)训练集测试集分割与交叉验证
在上一篇关于Python中的线性回归的文章之后,我想再写一篇关于训练测试分割和交叉验证的文章.在数据科学和数据分析领域中,这两个概念经常被用作防止或最小化过度拟合的工具.我会解释当使用统计模型时,通常 ...
Machine Learning笔记整理 ------ （二）训练集与测试集的划分
在实际应用中,一般会选择将数据集划分为训练集(training set).验证集(validation set)和测试集(testing set).其中,训练集用于训练模型,验证集用于调参.算法选择等 ...
9. 获得图片路径，构造出训练集和验证集，同时构造出相同人脸和不同人脸的测试集，将结果存储为.csv格式 1.random.shuffle(数据清洗) 2.random.sample(从数据集中随机选取2个数据) 3. random.choice(从数据集中抽取一个数据) 4.pickle.dump(将数据集写成.pkl数据)
1. random.shuffle(dataset) 对数据进行清洗操作参数说明:dataset表示输入的数据 2.random.sample(dataset, 2) 从dataset数据集中选取2 ...
SpringBoot(18）---通过Lua脚本批量插入数据到Redis布隆过滤器
通过Lua脚本批量插入数据到布隆过滤器有关布隆过滤器的原理之前写过一篇博客: 算法(3)---布隆过滤器原理在实际开发过程中经常会做的一步操作,就是判断当前的key是否存在. 那这篇博客主要分为三 ...

随机推荐

C++ 模版
函数模版 #include <iostream> using namespace std; template<typename T> T add(T t1, T t2) { r ...
init.d functions的daemon函数
daemon函数说明 # 该函数的作用是启动一个可执行的二进制程序: # 使用方法: # .daemon {--check program|--check=program} [--user usern ...
[LeetCode] Elimination Game 淘汰游戏
There is a list of sorted integers from 1 to n. Starting from left to right, remove the first number ...
[LeetCode] Serialize and Deserialize Binary Tree 二叉树的序列化和去序列化
Serialization is the process of converting a data structure or object into a sequence of bits so tha ...
文件上传之——用SWF插件实现文件异步上传和头像截取
之前写过几篇文件上传,那些都不错.今天小编带领大家体会一种新的上传方法,及使用Flash插件实现文件上传. 使用Flash的好处就是可以解决浏览器兼容性问题.之前我写的一个快捷复制功能也是利用的Fla ...
自己写的一个Pager分页组件,WebForm，Mvc都适用
我一说写这个功能的时候,好多人估计有疑问.分页功能网上多的是,搜一个不就行了,你这样不是浪费时间么.你说这句话的时候,我是比较信的,首先自己写一些东西是很耗时,有这些时间又能多打几盘LOL了.但是我觉 ...
HTML之form表单和input系列
<form method="POST" action="/host"> <input class="c1" type=&q ...
angularjs $emit $on $broadcast 父子兄弟之间传值
父子之间 <div ng-controller="ParentCtrl"> <div ng-controller="ChildCtrl"> ...
Javascript中JSON对象的操作以及遍历key/value
//遍历获取值: function text(){ var json = {"options":"[{/"text/":/"王家湾/&quo ...
【Android群英传】学习笔记（一）
本系列博客为笔者在学习<Android群英传>的学习总结 Android相关工具镜像连接:http://www.androiddevtools.cn/ Dalvik与ART Dalvik包 ...

LUA中将未分类数据分为测试集和训练集

LUA中将未分类数据分为测试集和训练集的更多相关文章

随机推荐

热门专题