Top ML Papers of the Week (April 15 — April 21)

Thongchan Thananate
2 min readApr 22, 2024
  1. Llama 3: This paper introduces Llama 3, a family of Large Language Models (LLMs) that include 8B and 70B pretrained and instruction-tuned models1. Llama 3 8B outperforms Gemma 7B and Mistral 7B Instruct, while Llama 3 70 broadly outperforms Gemini Pro 1.5 and Claude 3 Sonnet1.
  2. Mixtral 8x22B: Mixtral 8x22B is a new open-source sparse mixture-of-experts model2. It reports that compared to other community models, it delivers the best performance/cost ratio on MMLU and shows strong performance on reasoning, knowledge, maths, and coding2.
  3. Chinchilla Scaling: A replication attempt: This paper attempts to replicate the third estimation procedure of the compute-optimal scaling law proposed in Hoffmann et al. (2022) (i.e., Chinchilla scaling)3. The authors find that the reported estimates are inconsistent with their first two estimation methods, fail at fitting the extracted data, and report implausibly narrow confidence intervals3.
  4. How Faithful are RAG Models?: This paper investigates the faithfulness of Retrieval-Augmented Generation (RAG) models4. It finds that providing correct retrieved information fixes most model mistakes (94% accuracy). However, when the documents contain more incorrect values and the LLM’s internal prior is weak, the LLM is more likely to recite incorrect information4.
  5. A Survey on Retrieval-Augmented Text Generation for LLMs: This survey paper presents a comprehensive overview of the RAG domain, its evolution, and challenges56.
  6. The Illusion of State in State-Space Models: This paper investigates the expressive power of state space models (SSMs) and reveals that it is limited similar to transformers in that SSMs cannot express computation outside the complexity class7.

--

--

Thongchan Thananate

People might laugh at it or call it foolish logic, but that’s enough for me. That’s what romanticism is about!