Multimodal AI: Building Systems That See, Hear, and Understand