📢 Announcement: We released X-InstructBLIP a simple and effective, scalable cross-modal framework to empower LLMs to handle a diverse range of tasks across a variety of modalities (image, text, video, audio, and 3D), without requiring modality-specific pre-training. Checkout our paper and code🤖🤖