I just went through @PulkitS Video Classification Using Deep Learning and will like to ask and point out the difference between image vs video classification.
Is video classification just modelling videos as frames and training a classification model on the individual frames? I believe this is still image classification and not video classification.
Is video classification modelling videos as frames but keeping the frames of each video together as a sequence of frames? Later, training a classification model on each sequence of frames to capture the temporal dynamics in the video? This is actually video classification.
What do you think? and @PulkitS can we adapt your blog to model and train sequence of frames and not just individual frames?