Papers

  1. PaLM-2 Technical Report
    Google et al. (including Yi Tay, Co-lead Architecture & Modeling, alphabetically ordered in tiers).
    Google I/O launch | Technical Report

  2. Transcending Scaling Laws with 0.1% extra compute
    Yi Tay, Jason Wei, Hyung Won Chung, Vinh Q. Tran, David R. So, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani.
    Proceedings of EMNLP 2023

  3. UL2: Unifying Language Learning Paradigms
    Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, Donald Metzler.
    Paper | Google AI Blogpost | Flan-UL2 blog-post | Proceedings of ICLR 2023

  4. Scaling Instruction-Finetuned Language Models
    Hyung Won Chung, Le hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei.
    Journal of Machine Learning Research

  5. PaLM: Scaling Language Models with Pathways
    Aakanksha Chowdhery, Sharan Narang, Jacob Devlin et al. (including Yi Tay).
    Journal of Machine Learning Research

  6. Emergent Abilities of Large Language Models
    Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus.
    Transactions of Machine Learning Research (TMLR)

  7. PaLI-X: On Scaling a Multilingual Vision and Language Model
    Chen et al. (including Yi Tay.)
    Proceedings of CVPR 2024

  8. Scaling Vision Transformers to 22B Parameters
    Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby.
    Proceedings of ICML 2023.

  9. The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
    Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, Adam Roberts

  10. Symbol tuning improves in-context learning in language models
    Jerry Wei, Le Hou, Andrew Lampinen, Xiangning Chen, Da Huang, Yi Tay, Xinyun Chen, Yifeng Lu, Denny Zhou,
    Tengyu Ma, Quoc V Le
    Proceedings of EMNLP 2023

  11. Large language models do in-context learning differently
    Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma

  12. ColT5: Faster Long-Range Transformers with Conditional Computation
    Joshua Ainslie, Tao Lei, Michiel de Jong, Santiago Ontañón, Siddhartha Brahma, Yury Zemlyanskiy, David Uthus, Mandy Guo, James Lee-Thorp, Yi Tay, Yun-Hsuan Sung, Sumit Sanghai
    Proceedings of EMNLP 2023

  13. Inverse Scaling can become U-shaped
    Jason Wei, Najoung Kim, Yi Tay, Quoc V Le.
    Proceedings of EMNLP 2023

  14. Scaling Laws vs Model Architectures: How does Inductive Bias influence Scaling
    Yi Tay, Mostafa Dehghani, Samira Abnar, Hyung Won Chung, William Fedus, Jinfeng Rao, Sharan Narang, Vinh Q. Tran, Dani Yogatama, Donald Metzler.
    Proceedings of EMNLP 2023

  15. Recitation-Augmented Language Models
    Zhiqing Sun, Xuezhi Wang, Yi Tay, Yiming Yang, Denny Zhou.
    Proceedings of ICLR 2023

  16. Language Models are Multilingual Chain-of-Thought Reasoners
    Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei.
    Proceedings of ICLR 2023

  17. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
    Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei.

  18. UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining
    Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant
    Proceedings of ICLR 2023

  19. Transformer Memory as a Differentiable Search Index
    Yi Tay*, Vinh Q. Tran*, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler.
    Proceedings of NeurIPS 2022.

  20. DSI++: Updating Transformer Memory with New Documents
    Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Jinfeng Rao, Marc Najork, Emma Strubell, Donald Metzler.

  21. Recommender Systems with Generative Retrieval
    Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q Tran, Jonah Samost, Maciej Kula, Ed H Chi, Maheswaran Sathiamoorthy

  22. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
    Aran Komatsuzaki, Joan Puigcerver, James Lee-Thorp, Carlos Riquelme Ruiz, Basil Mustafa, Joshua Ainslie, Yi Tay, Mostafa Dehghani, Neil Houlsby.
    Proceedings of ICLR 2023

  23. Efficient Transformers: A Survey
    Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
    ACM Computing Surveys 2022.

  24. ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
    Vamsi Aribandi*, Yi Tay* , Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler.
    Proceedings of ICLR 2022.

  25. Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
    Yi Tay*, Mostafa Dehghani*, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler
    Proceedings of ICLR 2022

  26. Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
    Yi Tay* , Vinh Q. Tran*, Sebastian Ruder, Jai Gupta, Hyung Won Chung, Dara Bahri, Zhen Qin, Simon Baumgartner, Cong Yu, Donald Metzler
    Proceedings of ICLR 2022.

  27. The Efficiency Misnomer
    Mostafa Dehghani*, Anurag Arnab*, Lucas Beyer*, Ashish Vaswani, Yi Tay*
    Proceedings of ICLR 2022.

  28. HyperPrompt: Prompt-based Task-Conditioning of Transformers
    Yun He*, Huaixiu Steven Zheng*, Yi Tay, Jai Gupta, Yu Du, Vamsi Aribandi, Zhe Zhao, YaGuang Li, Zhao Chen, Donald Metzler, Heng-Tze Cheng, Ed H. Chi.
    Proceedings of ICML 2022.

  29. SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption
    Dara Bahri, Heinrich Jiang, Yi Tay , Donald Metzler
    Proceedings of ICLR 2022.

  30. Sharpness-Aware Minimization Improves Language Model Generalization
    Dara Bahri, Hossein Mohabi, Yi Tay
    Proceedings of ACL 2022.

  31. Improving Compositional Generalization with Self-Training for Data-to-Text Generation
    Sanket Vaibhav Mehta, Jinfeng Rao, Yi Tay , Mihir Kale, Ankur Parikh, Hongtao Zhong, Emma Strubell
    Proceedings of ACL 2022.

  32. ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference
    Kai Hui, Honglei Zhuang, Tao Chen, Zhen Qin, Jing Lu, Dara Bahri, Ji Ma, Jai Gupta, Cicero Nogueira dos Santos, Yi Tay, Donald Metzler
    Proceedings of ACL 2022 (Findings).

  33. A New Generation of Perspective API: Efficient Multilingual Character-level Transformers
    Alyssa Lees*, Vinh Q. Tran*, Yi Tay*, Jeffrey Sorensen, Jai Gupta, Donald Metzler, Lucy Vasserman
    Proceedings of KDD 2022 (Applied Data Science Track)

  34. Confident Adaptive Language Modeling
    Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Q. Tran, Yi Tay, Donald Metzler
    Proceedings of NeurIPS 2022.

  35. Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search Classification
    Jai Prakash Gupta, Yi Tay , Chaitanya Kamath, Vinh Tran, Donald Metzler, Shailesh Bavadekar, Mimi Sun, Evgeniy Gabrilovich
    EMNLP 2022 Industry Track

  36. PolyViT: Co-training Vision Transformers on Images, Videos and Audio
    Valerii Likhosherstov, Mostafa Dehghani, Anurag Arnab, Krzysztof Marcin Choromanski, Mario Lucic, Yi Tay, Adrian Weller
    Proceedings of TMLR 2022.

  37. OmniNet: Omnidirectional Representations from Transformers
    Yi Tay* , Mostafa Dehghani*, Vamsi Aribandi, Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Donald Metzler.
    Proceedings of ICML 2021.

  38. Synthesizer: Rethinking Self-Attention in Transformer Models
    Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng
    Proceedings of ICML 2021

  39. Long Range Arena: A Benchmark for Efficient Transformers
    Yi Tay* , Mostafa Dehghani*, Samira Abnar, Yikang Shen, Dara Bahri, Phillip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler
    Proceedings of ICLR 2021

  40. HyperGrid - Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections
    Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan
    Proceedings of ICLR 2021

  41. Are Neural Rankers still Outperformed by Gradient Boosted Decision Trees?
    Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork
    Proceedings of ICLR 2021.

  42. Are Pretrained Convolutions Better than Pretrained Transformers?
    Yi Tay, Mostafa Dehghani, Jai Gupta, Dara Bahri, Vamsi Aribandi, Zhen Qin, Donald Metzler
    Proceedings of ACL 2021 (Long Paper)

  43. Structformer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling
    Yikang Shen, Yi Tay, Che Zheng, Dara Bahri, Donald Metzler, Aaron Courville
    Proceedings of ACL 2021 (Long Paper)

  44. Are Model Diagnostics Reliable?
    Vamsi Aribandi, Yi Tay, Donald Metzler
    Proceedings of ACL 2021 (Short Paper, Findings of ACL)

  45. Do Transformer Modifications Transfer Across Implementations and Applications?
    Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel
    Proceedings of EMNLP 2021.

  46. Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study
    Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Cliff Brunk, Andrew Tomkins
    Proceedings of WSDM 2021 (Best paper award runner-up)

  47. Rethinking Search: Making Domain Experts out of Dilettantes
    Donald Metzler, Yi Tay, Dara Bahri, Marc Najork
    ACM SIGIR forum

  48. Sparse Sinkhorn Attention
    Yi Tay, Dara Bahri, Liu Yang, Donald Metzler, Da-Cheng Juan
    Proceedings of ICML 2020

  49. Reverse Engineering Configurations of Neural Text Generation Models
    Yi Tay, Dara Bahri, Che Zheng, Clifford Brunk, Donald Metzler, Andrew Tomkins
    Proceedings of ACL 2020 (Short Paper)

  50. Choppy: Cut Transformers for Ranked List Truncation
    Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Andrew Tomkins
    Proceedings of SIGIR 2020 (Short Paper)