EOLO: Deep Machine Learning Algorithm for Embedded Object Segmentation that Only Looks Once

AUTHORS

Longfei Zeng,Lakehead University, Smart Health FabLab, Department of Computer Science, Canada
Sabah Mohammed,Lakehead University, Smart Health FabLab, Department of Computer Science, Canada

ABSTRACT

In this paper, we introduce an anchor-free and single-shot instance segmentation method, which is conceptually simple with 3 independent branches, fully convolutional and can be used by easily embedding it into mobile and embedded devices. Our method, refer as EOLO, reformulates the instance segmentation problem as predicting semantic segmentation and distinguishing overlapping objects problem, through instance center classification and 4D distance regression on each pixel. Moreover, we propose one effective loss function to deal with sampling high-quality center of gravity examples and optimization for 4D distance regression, which can significantly improve the mAP performance. Without any bells and whistles, EOLO achieves 27.7% in mask mAP under IoU50 and reaches 30 FPS on 1080Ti GPU, with single-model and single-scale training/testing on the challenging COCO2017 dataset. For the first time, we show the different comprehension of instance segmentation in recent methods, in terms of both up-bottoms, down-ups, and direct-predict paradigms. Then we illustrate our model and present related experiments and results. We hope that the proposed EOLO framework can serve as a fundamental baseline for a single-shot instance segmentation task in Real-time Industrial Scenarios.

 

KEYWORDS

Deep machine learning, Image segmentation, Instance segmentation, Embedded platforms

REFERENCES

[1]       K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” IEEE International Conference on Computer Vision (ICCV), (2017)
[2]       J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016)
[3]       T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” in The IEEE International Conference on Computer Vision (ICCV), (2017)
[4]       Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully convolutional one-stage object detection,” in The IEEE International Conference on Computer Vision (ICCV), (2019)
[5]       P. K. Xingyi Zhou and Dequan Wang, “Objects as points,” in arXiv:1904.07850, (2019)
[6]       Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Ding Liang, Chunhua Shen, and Ping Luo, “Polarmask: Single shot instance segmentation with polar representation,” in arXiv:1909.13226, (2020)
[7]       Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, and Lei Li, “Solo: Segmenting objects by locations,” in arXiv:1912.04488, (2019)
[8]       X. Chen, R. Girshick, K. He, and P. Dollar, “Tensormask: A foundation for dense object segmentation,” in The IEEE International Conference on Computer Vision (ICCV), (2019)
[9]       R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2014)
[10]    R. Girshick, “Fast R-CNN,” in The IEEE International Conference on Computer Vision (ICCV), December, (2015)
[11]    S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc., pp.91–99. [Online]. Available: http://papers.nips.cc/paper/ 5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks. pdf, (2015)
[12]    J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July, (2017)
[13]    A. F. Joseph Redmon, “Yolov3: An incremental improvement,” in arXiv:1804.02767, April, (2018)
[14]    Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg, “Ssd: Single shot multibox detector,” in arXiv:1804.02767, April, (2016)
[15]    Lichao Huang, Yi Yang, Yafeng Deng, and Yinan Yu, “Densebox: Unifying landmark localization with end to end object detection,” in arXiv:1509.04874, Sep, (2015)
[16]    Z. Yang, S. Liu, H. Hu, L. Wang, and S. Lin, “Reppoints: Point set representation for object detection,” in The IEEE International Conference on Computer Vision (ICCV), October, (2019)
[17]    K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “Centernet: Keypoint triplets for object detection,” in The IEEE International Conference on Computer Vision (ICCV), October, (2019)
[18]    Tao Kong, Fuchun Sun, Huaping Liu, Yuning Jiang, and Jianbo Shi, “Foveabox: Beyond anchorbased object detector,” in arXiv:1904.03797, April, (2019)
[19]    D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “Yolact: Real-time instance segmentation,” in The IEEE International Conference on Computer Vision (ICCV), October, (2019)
[20]    X. Zhou, J. Zhuo, and P. Krahenbuhl, “Bottom-up object detection by grouping extreme and center points,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June, (2019)
[21]    J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June, (2015)
[22]    J. Dai, K. He, Y. Li, S. Ren, and J. Sun, “Instance-sensitive fully convolutional networks,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer International Publishing, pp.534–549, (2016)
[23]    J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object detection via region-based fully convolutional networks,” in Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., pp.379–387. [Online]. Available: http://papers.nips.cc/paper/ 6465-r-fcn-object-detection-via-region-based-fully-convolutional-networks. pdf, (2016)
[24]    Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei, “Fully convolutional instanceaware semantic segmentation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017)
[25]    N. Gao, Y. Shan, Y. Wang, X. Zhao, Y. Yu, M. Yang, and K. Huang, “Ssap: Single-shot instance segmentation with affinity pyramid,” in The IEEE International Conference on Computer Vision (ICCV), (2019)
[26]    Law, Hei, and Jia Deng, “Cornernet: Detecting objects as paired keypoints,” In Proceedings of the European Conference on Computer Vision (ECCV), pp.734-750, (2018)

CITATION

  • APA:
    Zeng,L.& Mohammed,S.(2020). EOLO: Deep Machine Learning Algorithm for Embedded Object Segmentation that Only Looks Once. International Journal of Multimedia and Ubiquitous Engineering, 15(1), 35-48. 10.21742/IJMUE.2020.15.1.04
  • Harvard:
    Zeng,L., Mohammed,S.(2020). "EOLO: Deep Machine Learning Algorithm for Embedded Object Segmentation that Only Looks Once". International Journal of Multimedia and Ubiquitous Engineering, 15(1), pp.35-48. doi:10.21742/IJMUE.2020.15.1.04
  • IEEE:
    [1] L.Zeng, S.Mohammed, "EOLO: Deep Machine Learning Algorithm for Embedded Object Segmentation that Only Looks Once". International Journal of Multimedia and Ubiquitous Engineering, vol.15, no.1, pp.35-48, May. 2020
  • MLA:
    Zeng Longfei and Mohammed Sabah. "EOLO: Deep Machine Learning Algorithm for Embedded Object Segmentation that Only Looks Once". International Journal of Multimedia and Ubiquitous Engineering, vol.15, no.1, May. 2020, pp.35-48, doi:10.21742/IJMUE.2020.15.1.04

ISSUE INFO

  • Volume 15, No. 1, 2020
  • ISSN(p):1975-0080
  • ISSN(e):2652-1954
  • Published:May. 2020

DOWNLOAD