Video Understanding Models in AI
Today I did a deep dive on video understanding models. I believe true video understanding is the next milestone in AI. Most people in the space understand how image understanding models work, and have for awhile. But the truth of a scene is in what happens over time.
Video understanding is a much more complex subject than image understanding because not only are you dealing with static image data, which can have a ton of frame level contextual variation — but you’re also dealing with temporal evolutions of the frame sequence over time, which requires understanding how the time sequence evolves.
Here, I go into some of the models, the benchmarks and evaluations, along with with demos I’ve written in python into building functional video understanding applications. Again, there is a bit of depth in this video, but hopefully it’s still balanced enough to keep you engaged. Thanks for watching!