Toward Multimodal Agent Intelligence: Perception, Reasoning, Generation and Interaction | doi.page