I’m training an object detector for wildlife in videos. I’ve been using YoloV3 with reasonable results. As accuracy is more important than speed, I will now try Faster RCNN.
With Yolo, because the network is trained with the entire image, in this case a frame, it was important that the training data was very well labelled. For example, if you train with an image that has two animals in it but only one labelled, the network is trained to both detect the animal (with the labelled one) and to ignore it (the unlabelled one).
Is this also an issue in F-RCNN, or is the network(s) trained on only the labelled section of the image?