Part of International Conference on Representation Learning 2025 (ICLR 2025) Conference
Peng Xu, Wei Ping, Xianchao Wu, Chejian Xu, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro
In this work, we introduce ChatQA 2, an Llama 3.0-based model with a 128Kcontext window, designed to bridge the gap between open-source LLMs andleading proprietary models (e.g., GPT-4-Turbo-2024-04-09) in long context un-derstanding and retrieval-augmented generation (RAG) capabilities. These twocapabilities are complementary to each other and essential for LLMs to processlarge volumes of information that cannot fit into a single prompt. We presenta detailed continued training recipe to extend the context window of Llama3-70B-base from 8K to 128K tokens, along with a three-stage instruction tun-ing process to enhance the model’s instruction-following, RAG performance,and long-context understanding capabilities. Our results demonstrate that theLlama3-ChatQA-2-70B model outperforms most existing state-of-the-art models,including GPT-4-Turbo-2024-04-09, Qwen2-72B-Instruct, and Llama3.1-70B-Instruct, on ultra-long tasks beyond 100K tokens, as well as on the RAG benchmarkusing only a 4K context window, showing the strong long context capability acrossvarying sequence lengths. We further provide extensive comparisons betweendirect long-context and RAG solutions using the same state-of-the-art long-contextLLMs. Interestingly, we find that the performance of strong long-context LLMsusing RAG improves when retrieving a larger number of chunks. With a large setof top-k chunks, RAG consistently outperforms direct long-context solution usingthe same state-of-the-art long-context models (e.g., Llama3-ChatQA-2-70B andQwen2-72B-Instruct) on both 32K and 128K benchmarks. We open-source themodel weights, training data, and the evaluation setup for the for the community:https://chatqa2-project.github.io/