I am new to machine learning and I am trying to detect faces in a video stream using the code in this blog post. I am using a pre-trained ResNet 101 caffemodel, instead of the ResNet 10 caffemodel used in the blog post. My model expects an input of 224x224 hence I have changed the line,
blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 1.0, (300, 300), (104.0, 177.0, 123.0))
blob = cv2.dnn.blobFromImage(cv2.resize(frame, (224, 224)), 1.0, (224, 224), (104.0, 177.0, 123.0))
then I get “tuple index out of range” error at line,
for i in range(0, detections.shape):
due to the detections returned from net.forward() being in shape (1,1000). I do not understand how to change the code to satisfy my requirement. I am unable to extract the confidence associated with the prediction and also compute the (x, y) coordinates of the bounding box for the object due to this problem. I’m grateful if someone can tell me what I am doing wrong and help me solve this.