MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention | doi.pageMInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
2024·pp. 52481–52515
Citations Over TimeTop 20% of 2024 papers
Amir H. Abdi, Surin Ahn, Zhenhua Han, Huiqiang Jiang, Dongsheng Li, Yucheng Li, Chin-Yew Lin, Xufang Luo, Lili Qiu, Qianhui Wu, Yuqing Yang, Chengruidong Zhang