AI, Deep Learning Basics/Computer Vision

[Computer Vision] Image, Video 분야 subtask 및 데이터 종류 정리

Task 종류와 관련 dataset (핫한 순위별로)

  • Video Object Tracking
    • Dataset
      • OTB(Object Tracking Benchmark)
  • Video Action Classification / Action Recogntion 
    • Dataset
      • UCF101: 13,320 video clip and its categories(101 categories, divided into 5 types: 1)Human-Object Interaction 2) Body-Motion Only 3) Human-Human Interaction 4) Playing Musical Instruments 5) Sports)
      • Kinetics: 500,000 video clips(around 10 second, high-quality) and 600 human action classes
      • HMDB51: 6,849 video clips and 51 action categories
  • Video Object Segmentation / Semantic Segmentation / Panoptic Segmentation
    • Dataset
      • DAVIS(Densely ANnotated Video Segmentation): 50 video sequences with densely annotated frames
      • Cityscapes: images(5000 fine annotated, 20000 coarse annotated, X video), pixel annotations for 30 classes 
      • NYUv2(NYU-Depth V2): a variety of indoor scenes as recorded by both the RGB and Depth camera
  • Video Understanding
  • Video Classification (not action classfication)
    • : task of producing a label that is relevant to the video given its frames
  • Video Prediction
  • Video Super-Resolution
  • Video Compression
  • Human Pose Estimation
    • Dataset
      • Human3.6M: motion capture datasets, accurate 3D joint positions(3.6M human poses) and high-resolution video
  • Video Dense Estimation
    • : task of estimating depth 
    • Dataset
      • NYUv2(NYU-Depth V2)

Image dataset

Reference

  • CIFAR-10
    • CIFAR-100
  • ImageNet
  • COCO
  • MNIST
    • Fashion-MNIST
  • Cityscapes
  • CelebA
  • CUB-200-2011
  • Visual Question Answering
  • SVHN