PHD Discussions Logo

Ask, Learn and Accelerate in your PhD Research

Question Icon Post Your Answer

Question Icon

What are key references for retrieving and classifying multimodal files (text, image, audio, video)?

Can you recommend essential resources or methodologies for indexing, searching, and categorizing files across multiple modalities like text, images, audio, and video?

All Answers (1 Answers In All)

By Rani Answered 4 months ago

Foundational works include "Multimodal Machine Learning: A Survey and Taxonomy" by Baltrušaitis et al. For feature extraction, consider deep learning texts like Goodfellow's "Deep Learning." Key papers on fusion techniques from CVPR/ACL conferences and early works on canonical correlation analysis for multimodal representation are also essential.

Your Answer