Multimodal Machine Learning — A Deep Dive

2 min readAug 5, 2024

About two months ago I did a deep dive into multi-modal machine learning at PyData NYC in June of 2024. If anyone isn’t familiar with PyData, it’s a varied and popular multi national community that hosts talks on machine learning, A.I., data science, Python, and the whole 9 yards.

Multimodal applications have been a round for a long time, and they are now being thrust into the spot light more and more because they essentially serve to augment the landscape of the capabilities of A.I. beyond what LLMs can do on their own.

When it comes to vision, and sound, and all the information that is carried by the universe in other forms other than human understandable text, we need specialized models to be able to train, understand, and recreate this data.

This is where multi-modality comes into play. Only through the use of techniques beyond plain old text based transformers and their variants, will we truely be able to access the full gamut of cross modal and even cross spectral intelligence. Here is my breakdown lecture for PyData NYC, from June on the subject, and the Microsoft headquarters in New York City.

I specifically attempt to go a bit into some fundamental models for multi-modal perceptions and multimodal fusion, such as deep correlational autoendcoders.

These models help facilitate fusion, by compressing the input from multiple modalities into the model’s latent space during training of the model and then restoring a correlated version of the output. They truly are fascinating and relatively complex — but it’s important to understand them, and the concept of correlational loss, in order to appreciate how multi-multimodality can be approached for AI applications.

Here is the video in which I deep dive into the details. If you enjoy it and extract anything useful from it, please like and subscribed and share. It’s somewhat of a long talk, with questions and answers at the end, but there was alot of material to cover and I’m thankful it’s been recorded!

Linked In: https://www.linkedin.com/in/shafik-quoraishee/

Multimodal Machine Learning — A Deep Dive

Written by Shafik Quoraishee

No responses yet