Tensorboard 常用用法
1、使用场景示例:有一个需求是提取出 tensorboard 中保存的性能值,然后自己画图
- 参考资料:https://blog.csdn.net/nima1994/article/details/82844988#commentBox
教师网络:
For Supervisely Fine and Supervisely Manual Coarse,PointRend、DeepLabv3+、OCRNet、HRNet are selected as teacher Network
For Cityscapes Fine and Cityscapes Official Coarse,我们使用PointRend、DeepLabv3+、OCRNet、HRNet作为Teacher Network
其中DeepLabv3+ 使用 ResNet101 作为Backbone
学生网络:
For Supervisely Fine and Supervisely Manual Coarse,U-Net was selected as the student network. For Cityscapes Fine and Cityscapes Official Coarse,DeepLabv3+ with ResNet18 Backbone was selected as the student network.
教师网络权重推理
For Supervisely Fine and Supervisely Manual Coarse, we leverage the original weight file of the trained models from the {HRNet \cite{WangSCJDZLMTWLX19}}, {OCRNet \cite{YuanCW19} }, {PointRend\cite{kirillov2020pointrend}} and, {DeepLabv3+\cite{chen2018encoder} } trained on the Cityscapes fine dataset as our teacher network. The results of the above methods are reproduced by the publicly available models provided with the recommended test settings.
In order to initially verify the cross-domain performance of these teacher models, the weights pretrained on Cityscapes Fine are used to initialize the teacher network, and directly use the supervisely dataset for inferring. Portrait results are retained to generate teacher weights.
As shown in Tab.~\ref{t_param_flops}, the four methods (including a student network Deeplabv3+) achieve 60-90% mIoU with 33-79 BF score on the test dataset of Supervisely Fine . The best performance on both mIoU and BF score belongs to PointRend which is the newest semantic segmentation model among the four methods.
For Cityscapes Fine and Cityscapes Official Coarse, the teacher networks pretrained on on ADE20k dataset. It is worth mentioning that Cityscapes and ADE20k have only 13 crossover categories that were retained to generate teacher weights, and these categories include “road, sidewalk, building, wall, fence, pole, traffic light, sky, person, car, truck, bus, bicycle”.
In the following experiment, we hope to verify whether our proposed method is effective for the knowledge distillation of teacher models with different model capacities and different segmentation performances through various comparisons.
For Supervisely Fine and Supervisely Manual Coarse, the teacher Network trained on on Cityscapes dataset.
我们使用的教师模型都是在 Cityscapes 上预训练权重推理伪标签with only Portrait result
我们使用的教师模型都是在 ADE20k 上预训练权重推理伪标签with only Portrait result,保留交叉的结果,包括13个类别,
训练学生网络
For Supervisely Fine and Supervisely Manual Coarse, the learning rate starts at 0.1 and changes to 0.5 times the original rate every 25 epochs. We set the number of training epochs to 100 and batch size to 32 for all trials. , we set the optimizer to Ranger\cite{yong2020gradient}with a weight decay 5e−4. The data augmentation is only normalization.
For Cityscapes Fine and Cityscapes Official Coarse, the student network is trained for 80000 iterations with 2 batchsize for each GPU. The learning rate policy is poly, from 0.01 to 0.0001. The optimizer is SGD with a weight decay 0.0001 and momentum 0.9. The data augmentation contains a series of randomly cropping to (512, 1024), randomly flipping, photometric distortion and normalization.
All experiments are performed on four GTX2080Ti GPUs with mixed precision training. Besides, the λ is initialized to 0 during the 10% training epochs/iterations, linearly increases λ to 0.5 during the rest of the training phase.
- 02
- README 美化05-20
- 03
- 常见 Tricks 代码片段05-12