Bio
Alexander Liu is a final year Ph.D. candidate in Computer Science at MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He is a member of the Spoken Language System (SLS) Group led by Dr. James Glass. His work focuses on natural language and speech processing for artificial intelligence, with the goal of building machines human can seamlessly interact with through voice. Examples include multimodal audio representation learning, multimodal alignment, large language models and generative models for audio.
Prior to joining MIT, Alex received his M.S. and B.S. degrees in Computer Science & Information Engineering (CSIE) from National Taiwan University (NTU). He was a member of the Speech Processing Lab working with Lin-shan Lee and Prof. Hung-yi Lee in the area of machine learning and speech processing. During his undergraduate years, he worked with Yu-Chiang Frank Wang in computer vision and representation learning. Besides academic labs, he also spent time working at Facebook AI Research (now FAIR at Meta AI) and Nvidia Applied Deep Learning Research (ADLR) as a research intern.
Selected Publications / Selected Reports/ Teaching / Honors / Open-source Contributions
News
I’m currently on leave until Spring 2026, working at Mistral AI to build frontier open audio models such as Voxtral. (Our speech team is growing, please reach out if you are interested!)
Inspired by Wei-Chiu Ma, I would like to commit 1-2 hours per week to provide suggestions and/or mentorships to junior students in need, especially those from underrepresented groups. Please fill out this form if you are interested.
Selected Publications
For the complete list, please visit google scholar.
SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction
Shester Gueuwou, Xiaoyi Du, Gregory Shakhnarovich, Karen Livescu, Alexander H. Liu
Annual Meeting of the Association for Computational Linguistics (ACL) 2025 (Oral)
[ paper | project page | code | interactive demo ]UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
Alexander H. Liu, Sungho Lee, Chien-Hsin H. Yang, Yu Gong, Yu-Chiang Frank Wang, James R. Glass, Rafael Valle, Bryan Catanzaro
International Conference on Learning Representations (ICLR) 2025
[ paper ]Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
Po-Jui Ku, Alexander H. Liu, Roman Korostik, Sung-Feng Huang, Szu-Wei Fu, Ante Jukić
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
[ paper | code | demo ]Generative Pre-training for Speech with Flow Matching
Alexander H. Liu, Matthew Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu
International Conference on Learning Representations (ICLR) 2024
[ paper ]Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective
Alexander H. Liu(co-first), Sung-Lin Yeh(co-first), James Glass
In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
[ paper ]DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Alexander H. Liu, Heng-Jui Chang, Michael Auli(co-last), Wei-Ning Hsu(co-last), James Glass(co-last)
In Advances in Neural Information Processing Systems (NeurIPS) 2023
[ paper | code ]Joint Audio and Speech Understanding
Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass
IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023
[ paper | interactive demo ]Listen, Think, and Understand
Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James Glass
International Conference on Learning Representations (ICLR) 2024
[ paper | interactive demo ]Contrastive Audio-Visual Masked Autoencoder
Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass
International Conference on Learning Representations (ICLR) 2023
[ paper | code ]Simple and Effective Unsupervised Speech Synthesis
Alexander H. Liu (co-first), Cheng-I Jeff Lai (co-first), Wei-Ning Hsu, Michael Auli, Alexei Baevskiv, James Glass
InterSpeech 2022
[ paper | demo ]Towards End-to-end Unsupervised Speech Recognition
Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski
Spoken Language Technology Workshop (SLT) 2022
[ paper | code ]Cross-Modal Discrete Representation Learning
Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass
Annual Meeting of the Association for Computational Linguistics (ACL) 2022
[ paper ]Spoken moments: Learning Joint Audio-visual Representations from Video Descriptions
Mathew Monfort (co-first), SouYoung Jin (co-first), Alexander H. Liu, David Harwath, Rogerio Feris, James Glass, Aude Oliva
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
[ paper | dataset ]Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies
Alexander H. Liu, Yu-An Chung, James Glass
InterSpeech 2021
[ paper | code ]Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning
Alexander H. Liu (co-first), Tao Tu (co-first), Hung-yi Lee, Lin-shan Lee
In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
[ paper | demo ]
- Towards Scene Understanding: Unsupervised Monocular Depth Estimation with Semantic-Aware Representation
Alexander H. Liu (co-first), Po-Yi Chen (co-first), Yen-Cheng Liu, Yu-Chiang Frank Wang
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
[ paper | oral | supplementary ]
- A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation
Alexander H. Liu, Yen-Cheng Liu, Yu-Ying Yeh, Yu-Chiang Frank Wang
In Advances in Neural Information Processing Systems (NeurIPS) 2018
[ paper | code | supplementary & reviews ]
Selected Reports
Voxtral
Alexander H. Liu, Andy Ehrenberg, Andy Lo, Clément Denoix, Corentin Barreau, Guillaume Lample, Jean-Malo Delignon, Khyathi Raghavi Chandu, Patrick von Platen, Pavankumar Reddy Muddireddy, Sanchit Gandhi, Soham Ghosh, Srijan Mishra, Thomas Foubert, et al.
Mistral AI, 2025
[ technical report | model weights | blog ]Towards Audio Language Modeling - An Overview
Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-Wei Chang, Ho-Lam Chung, Alexander H. Liu, Hung-yi Lee
arXiv preprint, 2024
[ paper ]
Teaching
Guest Lecturer, Spoken Language Processing MIT, Spring 2024
TA of Natural Language Processing MIT, Fall 2021
Head TA of Fundamentals of Speech Signal Processing NTU, Academic Year 2018
TA of Deep Learning for Human Language Processing NTU, Fall 2018
TA of Machine Learning and having it Deep and Structured NTU, Spring 2018
TA of Deep Learning for Computer Vision NTU, Spring 2018
TA of Advanced Deep Learning NTU, Spring 2018
Honors
MOE Taiwan Scholarship Ministry of Education, Taiwan 2022-2023
Outstanding Master's Thesis Award ACLCLP 2020
Excellent Teaching Assistant Award NTU CSIE Dept. 2019
Advanced Speech Technologies Scholarship NTU EECS 2019
Verizon Media AI Scholarship Verizon (Taiwan) 2019
Best Student Speaker Award 3rd Augmented Intelligence and Interaction (AII) Workshop 2019
1st Prize, Formosa Spoken QA Challenge Ministry of Science and Technology, Taiwan 2019
Technology Scholarship Foxconn Education Foundation 2018
Presidential Awards (top 5%) National Taiwan University 2017/2018
Open-source Contributions
Voxtral
Core contributor -- state-of-the-art large spoken language models, 400k total downloads 2025Fairseq -- wav2vec-U 2.0
Contributor -- speech recognition algorithm to the well-known NLP toolkit, 31.9k stars 2022End-to-end-ASR-Pytorch
Creator -- one of the first deep speech recognition model in Pytorch, 1.2k stars 2018
