环境依赖:

pytorch   0.4以上

tensorboardX:   pip install tensorboardX、pip install tensorflow

在项目代码中加入tensorboardX的记录代码,生成文件并返回到浏览器中显示可视化结果。

官方示例:

默认设置是在根目录下生成一个runs文件夹,里面存储summary的信息。

在runs的同级目录下命令行中输入:

tensorboard --logdir runs            (不是输tensorboardX)

会出来一个网站,复制到浏览器即可可视化loss,acc,lr等数据的变化过程.

举例说明pytorch中设置summary的方式:

 import argparse
import os
import numpy as np
from tqdm import tqdm from mypath import Path
from dataloaders import make_data_loader
from modeling.sync_batchnorm.replicate import patch_replication_callback
from modeling.deeplab import *
from modeling.psp_net import *
from utils.loss import SegmentationLosses
from utils.calculate_weights import calculate_weigths_labels
from utils.lr_scheduler import LR_Scheduler
from utils.saver import Saver
from utils.summaries import TensorboardSummary
from utils.metrics import Evaluator
from utils.misc import CrossEntropyLoss2d class Trainer(object):
def __init__(self, args):
self.args = args # Define Saver
self.saver = Saver(args)
self.saver.save_experiment_config()
# Define Tensorboard Summary,是pytorch中的tensorboardX.
self.summary = TensorboardSummary(self.saver.experiment_dir)
self.writer = self.summary.create_summary() # Define Dataloader,根据不同的数据集修改此加载器
kwargs = {'num_workers': args.workers, 'pin_memory': True}
self.train_loader, self.val_loader, self.test_loader, self.nclass = make_data_loader(args, **kwargs) # Define network,需要修改的是类的数量.
model = PSPNet(num_classes=self.nclass).cuda()
#源代码的deeplabv3+模型
# model = DeepLab(num_classes=self.nclass,
# backbone=args.backbone,
# output_stride=args.out_stride,
# sync_bn=args.sync_bn,
# freeze_bn=args.freeze_bn) # train_params = [{'params': model.get_1x_lr_params(), 'lr': args.lr},
# {'params': model.get_10x_lr_params(), 'lr': args.lr * 10}] # Define Optimizer(deeplabv3+)
# optimizer = torch.optim.SGD(train_params, momentum=args.momentum,
# weight_decay=args.weight_decay, nesterov=args.nesterov)
#PSPNET,修改的优化器部分,需要注意的是lr需要用args.lr来表示
optimizer = torch.optim.SGD([
{'params': [param for name, param in model.named_parameters() if name[-4:] == 'bias'],
'lr': 2 * args.lr},
{'params': [param for name, param in model.named_parameters() if name[-4:] != 'bias'],
'lr': args.lr, 'weight_decay': args.weight_decay}
], momentum=args.momentum, nesterov=True) # Define Criterion,在util中有Loss文件对此重新定义,调用时候用self.criterion
# whether to use class balanced weights
if args.use_balanced_weights:
classes_weights_path = os.path.join(Path.db_root_dir(args.dataset), args.dataset+'_classes_weights.npy')
if os.path.isfile(classes_weights_path):
weight = np.load(classes_weights_path)
else:
weight = calculate_weigths_labels(args.dataset, self.train_loader, self.nclass)
weight = torch.from_numpy(weight.astype(np.float32))
else:
weight = None
self.criterion = SegmentationLosses(weight=weight, cuda=args.cuda).build_loss(mode=args.loss_type)
self.model, self.optimizer = model, optimizer # Define Evaluator
self.evaluator = Evaluator(self.nclass)
# Define lr scheduler
self.scheduler = LR_Scheduler(args.lr_scheduler, args.lr,
args.epochs, len(self.train_loader)) # Using cuda
if args.cuda:
self.model = torch.nn.DataParallel(self.model, device_ids=self.args.gpu_ids)
patch_replication_callback(self.model)
self.model = self.model.cuda() # Resuming checkpoint
self.best_pred = 0.0
if args.resume is not None:
if not os.path.isfile(args.resume):
raise RuntimeError("=> no checkpoint found at '{}'" .format(args.resume))
checkpoint = torch.load(args.resume)
args.start_epoch = checkpoint['epoch']
if args.cuda:
self.model.module.load_state_dict(checkpoint['state_dict'])
else:
self.model.load_state_dict(checkpoint['state_dict'])
if not args.ft:
self.optimizer.load_state_dict(checkpoint['optimizer'])
self.best_pred = checkpoint['best_pred']
print("=> loaded checkpoint '{}' (epoch {})"
.format(args.resume, checkpoint['epoch'])) # Clear start epoch if fine-tuning
if args.ft:
args.start_epoch = 0
#训练函数
def training(self, epoch):
train_loss = 0.0
self.model.train()
tbar = tqdm(self.train_loader)
num_img_tr = len(self.train_loader)
#源代码deeplabv3+的加载方式,换成pspnet时需要进行loss的修改
# for inputs_slice, gts_slice in zip(inputs, gts):
# inputs_slice = Variable(inputs_slice).cuda()
# gts_slice = Variable(gts_slice).cuda()
#
# optimizer.zero_grad()
# outputs, aux = net(inputs_slice)
# assert outputs.size()[2:] == gts_slice.size()[1:]
# assert outputs.size()[1] == voc.num_classes
#
# main_loss = criterion(outputs, gts_slice)
# aux_loss = criterion(aux, gts_slice)
# loss = main_loss + 0.4 * aux_loss
# loss.backward()
# optimizer.step()
#
# train_main_loss.update(main_loss.item(), slice_batch_pixel_size)
# train_aux_loss.update(aux_loss.item(), slice_batch_pixel_size)
for i, sample in enumerate(tbar):
image, target = sample['image'], sample['label']
if self.args.cuda:
image, target = image.cuda(), target.cuda()
self.scheduler(self.optimizer, i, epoch, self.best_pred) self.optimizer.zero_grad()
outputs, aux = self.model(image)#output即为标签
assert outputs.size()[2:] == target.size()[1:]
assert outputs.size()[1] == self.nclass
loss = self.criterion(outputs, target)
#criterion
loss.backward() #deeplabv3+设置
# self.optimizer.zero_grad()
# output = self.model(image)
# loss = self.criterion(output, target)
# loss.backward()
self.optimizer.step()
train_loss += loss.item()
tbar.set_description('Train loss: %.3f' % (train_loss / (i + 1)))
self.writer.add_scalar('train/total_loss_iter', loss.item(), i + num_img_tr * epoch) # Show 10 * 3 inference results each epoch
if i % (num_img_tr // 10) == 0:
global_step = i + num_img_tr * epoch
self.summary.visualize_image(self.writer, self.args.dataset, image, target, outputs, global_step) self.writer.add_scalar('train/total_loss_epoch', train_loss, epoch)
print('[Epoch: %d, numImages: %5d]' % (epoch, i * self.args.batch_size + image.data.shape[0]))
print('Loss: %.3f' % train_loss) if self.args.no_val:
# save checkpoint every epoch
is_best = False
self.saver.save_checkpoint({
'epoch': epoch + 1,
'state_dict': self.model.module.state_dict(),
'optimizer': self.optimizer.state_dict(),
'best_pred': self.best_pred,
}, is_best) def validation(self, epoch):
self.model.eval()
self.evaluator.reset()
tbar = tqdm(self.val_loader, desc='\r')
test_loss = 0.0
for i, sample in enumerate(tbar):
image, target = sample['image'], sample['label']
if self.args.cuda:
image, target = image.cuda(), target.cuda()
with torch.no_grad():
output = self.model(image)
loss = self.criterion(output, target)
test_loss += loss.item()
tbar.set_description('Test loss: %.3f' % (test_loss / (i + 1)))
pred = output.data.cpu().numpy()
target = target.cpu().numpy()
pred = np.argmax(pred, axis=1)
# Add batch sample into evaluator
self.evaluator.add_batch(target, pred) # Fast test during the training
Acc = self.evaluator.Pixel_Accuracy()
Acc_class = self.evaluator.Pixel_Accuracy_Class()
mIoU = self.evaluator.Mean_Intersection_over_Union()
FWIoU = self.evaluator.Frequency_Weighted_Intersection_over_Union()
self.writer.add_scalar('val/total_loss_epoch', test_loss, epoch)
self.writer.add_scalar('val/mIoU', mIoU, epoch)
self.writer.add_scalar('val/Acc', Acc, epoch)
self.writer.add_scalar('val/Acc_class', Acc_class, epoch)
self.writer.add_scalar('val/fwIoU', FWIoU, epoch)
print('Validation:')
print('[Epoch: %d, numImages: %5d]' % (epoch, i * self.args.batch_size + image.data.shape[0]))
print("Acc:{}, Acc_class:{}, mIoU:{}, fwIoU: {}".format(Acc, Acc_class, mIoU, FWIoU))
print('Loss: %.3f' % test_loss) new_pred = mIoU
if new_pred > self.best_pred:
is_best = True
self.best_pred = new_pred
self.saver.save_checkpoint({
'epoch': epoch + 1,
'state_dict': self.model.module.state_dict(),
'optimizer': self.optimizer.state_dict(),
'best_pred': self.best_pred,
}, is_best) def main():
# 超参数的设置
parser = argparse.ArgumentParser(description="PyTorch DeeplabV3Plus Training")
# 提取特征的卷积网络的设置
parser.add_argument('--backbone', type=str, default='resnet',
choices=['resnet', 'xception', 'drn', 'mobilenet'],
help='backbone name (default: resnet)')
parser.add_argument('--out-stride', type=int, default=16,
help='network output stride (default: 8)')
parser.add_argument('--dataset', type=str, default='pascal',
choices=['pascal', 'coco', 'cityscapes'],
help='dataset name (default: pascal)')
parser.add_argument('--use-sbd', action='store_true', default=False,
help='whether to use SBD dataset (default: True)')
parser.add_argument('--workers', type=int, default=4,
metavar='N', help='dataloader threads')
parser.add_argument('--base-size', type=int, default=513,
help='base image size')
# 在cuda内存不足时可修改此参数,原参数为513
parser.add_argument('--crop-size', type=int, default=256,
help='crop image size')
parser.add_argument('--sync-bn', type=bool, default=None,
help='whether to use sync bn (default: auto)')
parser.add_argument('--freeze-bn', type=bool, default=False,
help='whether to freeze bn parameters (default: False)')
parser.add_argument('--loss-type', type=str, default='ce',
choices=['ce', 'focal'],
help='loss func type (default: ce)')
# training hyper params
parser.add_argument('--epochs', type=int, default=None, metavar='N',
help='number of epochs to train (default: auto)')
parser.add_argument('--start_epoch', type=int, default=0,
metavar='N', help='start epochs (default:0)')
parser.add_argument('--batch-size', type=int, default=None,
metavar='N', help='input batch size for \
training (default: auto)')
parser.add_argument('--test-batch-size', type=int, default=None,
metavar='N', help='input batch size for \
testing (default: auto)')
parser.add_argument('--use-balanced-weights', action='store_true', default=False,
help='whether to use balanced weights (default: False)')
# optimizer params
parser.add_argument('--lr', type=float, default=None, metavar='LR',
help='learning rate (default: auto)')
parser.add_argument('--lr-scheduler', type=str, default='poly',
choices=['poly', 'step', 'cos'],
help='lr scheduler mode: (default: poly)')
parser.add_argument('--momentum', type=float, default=0.9,
metavar='M', help='momentum (default: 0.9)')
parser.add_argument('--weight-decay', type=float, default=5e-4,
metavar='M', help='w-decay (default: 5e-4)')
parser.add_argument('--nesterov', action='store_true', default=False,
help='whether use nesterov (default: False)')
# cuda, seed and logging
parser.add_argument('--no-cuda', action='store_true', default=
False, help='disables CUDA training')
parser.add_argument('--gpu-ids', type=str, default='',
help='use which gpu to train, must be a \
comma-separated list of integers only (default=0)')
parser.add_argument('--seed', type=int, default=1, metavar='S',
help='random seed (default: 1)')
# checking point
parser.add_argument('--resume', type=str, default=None,
help='put the path to resuming file if needed')
parser.add_argument('--checkname', type=str, default=None,
help='set the checkpoint name')
# finetuning pre-trained models
parser.add_argument('--ft', action='store_true', default=False,
help='finetuning on a different dataset')
# evaluation option
parser.add_argument('--eval-interval', type=int, default=1,
help='evaluuation interval (default: 1)')
parser.add_argument('--no-val', action='store_true', default=False,
help='skip validation during training') args = parser.parse_args()
args.cuda = not args.no_cuda and torch.cuda.is_available()
if args.cuda:
try:
args.gpu_ids = [int(s) for s in args.gpu_ids.split(',')]
except ValueError:
raise ValueError('Argument --gpu_ids must be a comma-separated list of integers only') if args.sync_bn is None:
if args.cuda and len(args.gpu_ids) > 1:
args.sync_bn = True
else:
args.sync_bn = False # 默认的 epochs, batch_size and lr
if args.epochs is None:
epoches = {
'coco': 30,
'cityscapes': 200,
'pascal': 50,
#
}
args.epochs = epoches[args.dataset.lower()] if args.batch_size is None:
args.batch_size = 2 * len(args.gpu_ids) # 4* if args.test_batch_size is None:
args.test_batch_size = args.batch_size if args.lr is None:
lrs = {
'coco': 0.1,
'cityscapes': 0.01,
'pascal': 0.007,
}
args.lr = lrs[args.dataset.lower()] / (2 * len(args.gpu_ids)) * args.batch_size if args.checkname is None:
args.checkname = 'deeplab-'+str(args.backbone)
print(args)
torch.manual_seed(args.seed)
trainer = Trainer(args)
print('Starting Epoch:', trainer.args.start_epoch)
print('Total Epoches:', trainer.args.epochs)
for epoch in range(trainer.args.start_epoch, trainer.args.epochs):
trainer.training(epoch)
if not trainer.args.no_val and epoch % args.eval_interval == (args.eval_interval - 1):
trainer.validation(epoch) trainer.writer.close() if __name__ == "__main__":
main()

pytorch中tensorboardX的用法的更多相关文章

  1. [转载]PyTorch中permute的用法

    [转载]PyTorch中permute的用法 来源:https://blog.csdn.net/york1996/article/details/81876886 permute(dims) 将ten ...

  2. PyTorch中view的用法

    相当于numpy中resize()的功能,但是用法可能不太一样. 我的理解是: 把原先tensor中的数据按照行优先的顺序排成一个一维的数据(这里应该是因为要求地址是连续存储的),然后按照参数组合成其 ...

  3. Pytorch中randn和rand函数的用法

    Pytorch中randn和rand函数的用法 randn torch.randn(*sizes, out=None) → Tensor 返回一个包含了从标准正态分布中抽取的一组随机数的张量 size ...

  4. Pytorch中nn.Conv2d的用法

    Pytorch中nn.Conv2d的用法 nn.Conv2d是二维卷积方法,相对应的还有一维卷积方法nn.Conv1d,常用于文本数据的处理,而nn.Conv2d一般用于二维图像. 先看一下接口定义: ...

  5. [PyTorch]PyTorch中反卷积的用法

    文章来源:https://www.jianshu.com/p/01577e86e506 pytorch中的 2D 卷积层 和 2D 反卷积层 函数分别如下: class torch.nn.Conv2d ...

  6. pytorch中如何使用DataLoader对数据集进行批处理

    最近搞了搞minist手写数据集的神经网络搭建,一个数据集里面很多个数据,不能一次喂入,所以需要分成一小块一小块喂入搭建好的网络. pytorch中有很方便的dataloader函数来方便我们进行批处 ...

  7. pytorch中检测分割模型中图像预处理探究

    Object Detection and Classification using R-CNNs 目标检测:数据增强(Numpy+Pytorch) - 主要探究检测分割模型数据增强操作有哪些? - 检 ...

  8. Pytorch使用tensorboardX网络结构可视化。超详细!!!

    https://www.jianshu.com/p/46eb3004beca 1 引言 我们都知道tensorflow框架可以使用tensorboard这一高级的可视化的工具,为了使用tensorbo ...

  9. Pytorch使用tensorboardX可视化。超详细!!!

    tensorboard --logdir runs 改为 tensorboard --logdir=D:\model\tensorboard\runs 重点 在网上看了很多方法后发现将原本链接中的计算 ...

随机推荐

  1. C++运算符重载——类型转换

    类型转换函数能够实现把一个类 类型 转换成 基本数据类型(int.float.double.char等) 或者 另一个类 类型. 其定义形式如下,注意不能有返回值,不能有参数,只能返回要转换的数据类型 ...

  2. xlrd模块;xlwt模块使用,smtp发送邮件

    先安装 pip3 install xlwt pip3 install xlrd import xlwt, xlrd from xlrd.book import Book from xlrd.sheet ...

  3. Gym 100963B

    Gym 100963B啊,郁闷,就tm调小了一点范围就A了,就写dp和贪心比较一下,范围到最大值的二倍-1就好了假设最大值的2倍以内能满足最优条件,当金额范围超过最大值2倍的时候:至于为什么,还不清楚 ...

  4. datatables隐藏列与createdRow渲染bootstrapSwitch形成的BUG

    背景: 昨天写了一个页面用于规则库的增删改查. 数据使用datatables渲染,后端返回数据由前端进行一次性渲染和分页. 隐藏列: 排序的ID不展示,但是排序又想按照ID来排,所以把ID单独作为一列 ...

  5. 【AtCoder】【思维】【置换】Rabbit Exercise

    题意: 有n只兔子,i号兔子开始的时候在a[i]号位置.每一轮操作都将若干只兔子依次进行操作: 加入操作的是b[i]号兔子,就将b[i]号兔子移动到关于b[i]-1号兔子现在所在的位置对称的地方,或者 ...

  6. Pok 使用指南

    Pok 使用指南 POK 是一个开源的符合ARINC653的操作系统,因为一些原因,我要开始接触一个全新的领域,再此希望记录下每天点滴进步,同时也欢迎指正吧. 目前先简单说明POK的使用指南 获取源码 ...

  7. C/C++中的内存对齐问题和pragma pack命令详解

    这个内存对齐问题,居然影响到了sizeof(struct)的结果值.突然想到了之前写的一个API库里,有个API是向后台服务程序发送socket请求.其中的socket数据包是一个结构体.在发送soc ...

  8. requirejs的使用和快速理解

    样例来自https://www.jianshu.com/p/b8a6824c8e07 requirejs有以下功能 声明不同js文件之间的依赖 可以按需.并行.延时载入js库 可以让我们的代码以模块化 ...

  9. 重构file_get_contents实现一个带超时链接访问的函数

    function wp_file_get_contents($url, $timeout = 30) { $context = stream_context_create(array( 'http' ...

  10. 廖雪峰Python3笔记

    主要复习过一遍 简介 略 安装 略 *** 第一个Python程序 第一行的注释: # _*_ coding: utf-8 _*_ #!/usr/bin/env python3 print() 可以接 ...