pointnet++之classification/train.py

1.数据集加载

if FLAGS.normal:

    assert(NUM_POINT<=10000)

    DATA_PATH = os.path.join(ROOT_DIR, 'data/modelnet40_normal_resampled')

    TRAIN_DATASET = modelnet_dataset.ModelNetDataset(root=DATA_PATH, npoints=NUM_POINT, split='train', normal_channel=FLAGS.normal, batch_size=BATCH_SIZE)

    TEST_DATASET = modelnet_dataset.ModelNetDataset(root=DATA_PATH, npoints=NUM_POINT, split='test', normal_channel=FLAGS.normal, batch_size=BATCH_SIZE)

else:

    assert(NUM_POINT<=2048)

    TRAIN_DATASET = modelnet_h5_dataset.ModelNetH5Dataset(os.path.join(BASE_DIR, 'data/modelnet40_ply_hdf5_2048/train_files.txt'), batch_size=BATCH_SIZE, npoints=NUM_POINT, shuffle=True)

    TEST_DATASET = modelnet_h5_dataset.ModelNetH5Dataset(os.path.join(BASE_DIR, 'data/modelnet40_ply_hdf5_2048/test_files.txt'), batch_size=BATCH_SIZE, npoints=NUM_POINT, shuffle=False)

训练数据（TRAIN_DATASET）是5个.h5格式的文件：

data/modelnet40_ply_hdf5_2048/ply_data_train0.h5
data/modelnet40_ply_hdf5_2048/ply_data_train1.h5
data/modelnet40_ply_hdf5_2048/ply_data_train2.h5
data/modelnet40_ply_hdf5_2048/ply_data_train3.h5
data/modelnet40_ply_hdf5_2048/ply_data_train4.h5

训练之前把5个训练文件的顺序打乱：

if self.shuffle: np.random.shuffle(self.file_idxs)

测试数据（TEST_DATASET）是2个.h5格式的文件：

data/modelnet40_ply_hdf5_2048/ply_data_test0.h5
data/modelnet40_ply_hdf5_2048/ply_data_test1.h5

数据集加载的关键是对数据集进行分批，2048*2048*3---->16*1024*3,16*1024*3,16*1024*3,...

注：2048个对象顺序打乱

modelnet_h5_dataset.py文件：

        data_batch = self.current_data[start_idx:end_idx, 0:self.npoints, :].copy() #这一句是关键语句，从一个.h5文件中的顺序已经打乱过的2048个对象中取出16个对象，每个对象中从2048个点云中取出1024个点云

        label_batch = self.current_label[start_idx:end_idx].copy()

self.npoints=1024

按照顺序取1024个点。（按照顺序取的这1024个点，居然很均匀，不知道原因何在？）

注：一个对象的1024个点在训练之前会打乱。

A. 根据2048*2048*3---->16*1024*3，把16*1024*3的前16个对象存入.txt文件，2048*2048*3的前16个对象存入.txt文件，利用CloudCompare对比二者的情况，看下降采样后和降采样前的一个点云对象有什么不同。

B. 以下代码存入后16个对象。（16*1024*3）

for i in range(data_batch.shape[0]):

    filename=''.join(["/media/dell/D/qcc/code/pointnet/code/pointnet2-master/data/contemporaryfile/train_",str(i),'.txt'])

    np.savetxt(filename, data_batch[i],fmt="%.13f,%.13f,%.13f", delimiter=',')

带上标签：

for i in range(data_batch.shape[0]):

    filename=''.join(["/media/dell/D/qcc/code/pointnet/code/pointnet2-master/data/contemporaryfile/train_",str(i),'.txt'])

    traindata_and_label = np.column_stack((data_batch[i], np.ones((1024, 1), dtype=int) * label_batch[i]))  # np.column_stack将两个矩阵进行组合连接

    np.savetxt(filename, traindata_and_label,fmt="%.13f,%.13f,%.13f,%d", delimiter=',')

C. 以下代码存入前16个对象。（16*2048*3）

for i in range(16):

    filename=''.join(["/media/dell/D/qcc/code/pointnet/code/pointnet2-master/data/contemporaryfile/initial_train_",str(i),'.txt'])

    np.savetxt(filename, self.current_data[i],fmt="%.13f,%.13f,%.13f", delimiter=',')

带上标签：

for i in range(16):

    filename=''.join(["/media/dell/D/qcc/code/pointnet/code/pointnet2-master/data/contemporaryfile/initial_train_",str(i),'.txt'])

    traindata_and_label=np.column_stack((self.current_data[i], np.ones((2048,1),dtype=int)*self.current_label[i]))#np.column_stack将两个矩阵进行组合连接

    np.savetxt(filename, traindata_and_label,fmt="%.13f,%.13f,%.13f,%d", delimiter=',')

D. 对比。

可以看到，第一个好像是躺椅，第二个是钢琴，采样过程暂时还不知道，但是看上去采样很均匀。

--------------------------------------------------------------------------------

# 在此处考虑制作自己的训练数据集。 #

--------------------------------------------------------------------------------

每一个.h5训练或者测试文件中包含2048个对象，每个对象包含2048个点云，每个点云包含x、y、z三维坐标。

在训练之前，会把这2048个对象随机打乱，当然打乱之后，其对象和标签仍然是对应的。

制作h5训练和测试文件的步骤如下：

运行matlab文件：ready_for_make_hdf5.m ，独立的标线点云对象写入文件。
运行Python文件：putfilenamesintofile.py，把训练和测试的文件的名字存到一个文件中。
运行python文件：make_hdf5_c.py ，制作h5文件。
运行Python文件：putfilenamesintofile.py，把h5文件名字写到一个文件中。
运行训练文件：train.py

2.训练模型的加载

pointnet2_cls_ssg.py

    l1_xyz, l1_points, l1_indices = pointnet_sa_module(l0_xyz, l0_points, npoint=512, radius=0.2, nsample=32, mlp=[64,64,128], mlp2=None, group_all=False, is_training=is_training, bn_decay=bn_decay, scope='layer1', use_nchw=True) #a

    l2_xyz, l2_points, l2_indices = pointnet_sa_module(l1_xyz, l1_points, npoint=128, radius=0.4, nsample=64, mlp=[128,128,256], mlp2=None, group_all=False, is_training=is_training, bn_decay=bn_decay, scope='layer2') #b

    l3_xyz, l3_points, l3_indices = pointnet_sa_module(l2_xyz, l2_points, npoint=None, radius=None, nsample=None, mlp=[256,512,1024], mlp2=None, group_all=True, is_training=is_training, bn_decay=bn_decay, scope='layer3') #c

l0_xyz： (16, 1024, 3)　初始的输入点云，16个对象，每个对象有1024个点，每个点有x,y,z三维坐标。

npoint=512：从1024个点中用最远点采样方法选出512个质心点。

radius=0.2：采样的球形邻域的半径是0.2m.

nsample=32：每个质心点周围采样32个点。

返回值：

l1_xyz：第二层输入的点云，(16, 512, 3)。第一层设置512个中心点，3是每个中心点的三维坐标
l1_points： (16, 512, 128)第一层提取到的local point region的特征, 512个分组（group）,每个group有128维的局部小区域特征
l1_indices：(16, 512, 32)　512个group,每个group 有32个成员，32表示这32 个points 的下标

# Sample and Grouping layer

        if group_all:

            nsample = xyz.get_shape()[1].value

            new_xyz, new_points, idx, grouped_xyz = sample_and_group_all(xyz, points, use_xyz)

        else:

            new_xyz, new_points, idx, grouped_xyz = sample_and_group(npoint, radius, nsample, xyz, points, knn, use_xyz)

        #找到中心点 (new xyz),每个group的局部特征（new points）,每个group对应的下标(idx)

        #new_xyz是最远点采样的返回结果： 16*512*3.

　　　　 #idx是球形邻域（r=0.2m）查询到的点的索引.

        #grouped_xyz: 16*512*32*3

        # Point Feature Embedding layer

        if use_nchw: new_points = tf.transpose(new_points, [0,3,1,2])

        for i, num_out_channel in enumerate(mlp):

            new_points = tf_util.conv2d(new_points, num_out_channel, [1,1],

                                        padding='VALID', stride=[1,1],

                                        bn=bn, is_training=is_training,

                                        scope='conv%d'%(i), bn_decay=bn_decay,

                                        data_format=data_format)

        if use_nchw: new_points = tf.transpose(new_points, [0,2,3,1])

        #pointnet层：对 new points 提取特征的卷积层

        # Pooling in Local Regions

        # 对每个group的feature进行pooling,得到每个中心点的local points feature

        if pooling=='max':

            new_points = tf.reduce_max(new_points, axis=[2], keep_dims=True, name='maxpool')

        elif pooling=='avg':

            new_points = tf.reduce_mean(new_points, axis=[2], keep_dims=True, name='avgpool')

        elif pooling=='weighted_avg':

            with tf.variable_scope('weighted_avg'):

                dists = tf.norm(grouped_xyz,axis=-1,ord=2,keep_dims=True)

                exp_dists = tf.exp(-dists * 5)

                weights = exp_dists/tf.reduce_sum(exp_dists,axis=2,keep_dims=True) # (batch_size, npoint, nsample, 1)

                new_points *= weights # (batch_size, npoint, nsample, mlp[-1])

                new_points = tf.reduce_sum(new_points, axis=2, keep_dims=True)

        elif pooling=='max_and_avg':

            max_points = tf.reduce_max(new_points, axis=[2], keep_dims=True, name='maxpool')

            avg_points = tf.reduce_mean(new_points, axis=[2], keep_dims=True, name='avgpool')

            new_points = tf.concat([avg_points, max_points], axis=-1)

        # [Optional] Further Processing

        if mlp2 is not None:

            if use_nchw: new_points = tf.transpose(new_points, [0,3,1,2])

            for i, num_out_channel in enumerate(mlp2):

                new_points = tf_util.conv2d(new_points, num_out_channel, [1,1],

                                            padding='VALID', stride=[1,1],

                                            bn=bn, is_training=is_training,

                                            scope='conv_post_%d'%(i), bn_decay=bn_decay,

                                            data_format=data_format)

            if use_nchw: new_points = tf.transpose(new_points, [0,2,3,1])

        new_points = tf.squeeze(new_points, [2]) # (batch_size, npoints, mlp2[-1])

        return new_xyz, new_points, idx

这一段带注释的代码参考来源是：https://zhuanlan.zhihu.com/p/57761392

0============================================================0

两条横线之间的内容来自：https://zhuanlan.zhihu.com/p/57761392

0============================================================0

SA（set abstraction）层的解释：

1. 改进特征提取方法：pointnet++使用了分层抽取特征的思想，把每一次叫做set abstraction。分为三部分：采样层、分组层、特征提取层。首先来看采样层，为了从稠密的点云中抽取出一些相对较为重要的中心点，采用FPS（farthest point sampling）最远点采样法，这些点并不一定具有语义信息。当然也可以随机采样；然后是分组层，在上一层提取出的中心点的某个范围内寻找最近个k近邻点组成一个group；特征提取层是将这k个点通过小型pointnet网络进行卷积和pooling得到的特征作为此中心点的特征，再送入下一个分层继续。这样每一层得到的中心点都是上一层中心点的子集，并且随着层数加深，中心点的个数越来越少，但是每一个中心点包含的信息越来越多。

2. 解决点云密度不同问题：由于采集时会出现采样密度不均的问题，所以通过固定范围选取的固定个数的近邻点是不合适的。pointnet++提出了两个解决方案。

2.1. 多尺度分组

如上图左所示，在每一个分组层都通过多个尺度(设置多个半径值) 来确定每一个组，并经过 pointnet提取特征之后将多个特征 concat 起来，得到新特征。

2.2. 多分辨率分组

如上图右所示。左边特征向量是通过２个set abstraction后得到的，每次set abstraction的半径不一样。右边特征向量是直接对当前层中所有点进行pointnet卷积得到。并且，当点云密度不均时，可以通过判断当前patch的密度对左右两个特征向量给予不同权重。例如，当patch中密度很小，左边向量得到的信息就没有对所有patch中点提取的特征可信度更高，于是将右特征向量的权重提高。以此达到减少计算量的同时解决密度问题。

一、分类任务

见网络下面的那个分支。

分层抽取特征层　set abstraction layer

主要有以下三个部分组成

1. sample layer:　采样层。得到重要的中心点（使用最远点采样）
2. group layer:　分组层。找到距离中心点附近的k个最近点（使用knn），组成local points region
3. pointnet layer:　特征提取层。对每个local points region提取特征

这样每一层得到的中心点都是上一层中心点的子集，并且随着层数加深，中心点的个数越来越少，但是每一个中心点包含的信息越来越多。

来看代码具体实现。这样的参数设置是SSG(same scale grouping)，作者在论文主要提出的是MSG(multi-scale grouping)，其实只是参数设置的不同。解释见注释。

点云卷积：

输入：（16,3,512,32）

输出：（16,64,512,32）

(a): 多尺度分组,不同尺度所提取的局部特征concatenate到一起。

(b): 多分辨率分组，左边从输入点云中（最远点采样法）采样一定个数的质心，右边在每个质心周围一定邻域内采样一组点（比如32个）。

    # Set abstraction layers  每个模块中先采样，找邻域，然后用三层1*1卷积构成的全连接层进行特征提取，最后做池化，输出

    # Note: When using NCHW for layer 2, we see increased GPU memory usage (in TF1.4).

    # So we only use NCHW for layer 1 until this issue can be resolved.   总共用了9个mlp层用于特征提取。

    l1_xyz, l1_points, l1_indices = pointnet_sa_module(l0_xyz, l0_points, npoint=512, radius=0.2, nsample=32, mlp=[64,64,128], mlp2=None, group_all=False, is_training=is_training, bn_decay=bn_decay, scope='layer1', use_nchw=True)

    l2_xyz, l2_points, l2_indices = pointnet_sa_module(l1_xyz, l1_points, npoint=128, radius=0.4, nsample=64, mlp=[128,128,256], mlp2=None, group_all=False, is_training=is_training, bn_decay=bn_decay, scope='layer2') #b

    l3_xyz, l3_points, l3_indices = pointnet_sa_module(l2_xyz, l2_points, npoint=None, radius=None, nsample=None, mlp=[256,512,1024], mlp2=None, group_all=True, is_training=is_training, bn_decay=bn_decay, scope='layer3') #c

l2_xyz：(<tf.Tensor 'layer2/GatherPoint:0' shape=(16, 128, 3) dtype=float32>, 16个对象，每个对象选择128个质心点，每个质心点有x,y,z坐标
l2_points：<tf.Tensor 'layer2/Squeeze:0' shape=(16, 128, 256) dtype=float32>, 16个对象，每个对象选择128个质心点，256代表局部小区域的特征向量
l2_indices：<tf.Tensor 'layer2/QueryBallPoint:0' shape=(16, 128, 64) dtype=int32>) ， 16个对象，每个对象选择128个质心点，每个质心点周围选取64个点云，64是点云的索引。

c.
(<tf.Tensor 'layer3/Const:0' shape=(16, 1, 3) dtype=float32>,
<tf.Tensor 'layer3/Squeeze:0' shape=(16, 1, 1024) dtype=float32>,
<tf.Tensor 'layer3/Const_1:0' shape=(16, 1, 128) dtype=int64>)

l3_xyz：16个对象，每个对象选择1个质心点，每个质心点有x,y,z坐标

l3_points：16个对象，每个对象选择1个质心点，每个质心点具有1024维特征向量

l3_indices：16个对象，每个对象选择1个质心点，每个质心点周围选取128个点云，128是点云的索引。

3. 分类的整个过程如下：

点云卷积的方法（如何由3维变成64维的）：