Bio

Alexander Liu is a final year Ph.D. candidate in Computer Science at MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He is a member of the Spoken Language System (SLS) Group led by Dr. James Glass. His work focuses on natural language and speech processing for artificial intelligence, with the goal of building machines human can seamlessly interact with through voice. Examples include multimodal audio representation learning, multimodal alignment, large language models and generative models for audio.

Prior to joining MIT, Alex received his M.S. and B.S. degrees in Computer Science & Information Engineering (CSIE) from National Taiwan University (NTU). He was a member of the Speech Processing Lab working with Lin-shan Lee and Prof. Hung-yi Lee in the area of machine learning and speech processing. During his undergraduate years, he worked with Yu-Chiang Frank Wang in computer vision and representation learning. Besides academic labs, he also spent time working at Facebook AI Research (now FAIR at Meta AI) and Nvidia Applied Deep Learning Research (ADLR) as a research intern.

Selected Publications / Selected Reports/ Teaching / Honors / Open-source Contributions

News

  • I’m currently on leave until Spring 2026, working at Mistral AI to build frontier open audio models such as Voxtral. (Our speech team is growing, please reach out if you are interested!)

  • Inspired by Wei-Chiu Ma, I would like to commit 1-2 hours per week to provide suggestions and/or mentorships to junior students in need, especially those from underrepresented groups. Please fill out this form if you are interested.

Selected Publications

For the complete list, please visit google scholar.

  • SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction
    Shester Gueuwou, Xiaoyi Du, Gregory Shakhnarovich, Karen Livescu, Alexander H. Liu
    Annual Meeting of the Association for Computational Linguistics (ACL) 2025 (Oral)
    [ paper | project page | code | interactive demo ]

  • UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
    Alexander H. Liu, Sungho Lee, Chien-Hsin H. Yang, Yu Gong, Yu-Chiang Frank Wang, James R. Glass, Rafael Valle, Bryan Catanzaro
    International Conference on Learning Representations (ICLR) 2025
    [ paper ]

  • Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
    Po-Jui Ku, Alexander H. Liu, Roman Korostik, Sung-Feng Huang, Szu-Wei Fu, Ante Jukić
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
    [ paper | code | demo ]

  • Generative Pre-training for Speech with Flow Matching
    Alexander H. Liu, Matthew Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu
    International Conference on Learning Representations (ICLR) 2024
    [ paper ]

  • Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective
    Alexander H. Liu(co-first), Sung-Lin Yeh(co-first), James Glass
    In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
    [ paper ]

  • DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
    Alexander H. Liu, Heng-Jui Chang, Michael Auli(co-last), Wei-Ning Hsu(co-last), James Glass(co-last)
    In Advances in Neural Information Processing Systems (NeurIPS) 2023
    [ paper | code ]

  • Joint Audio and Speech Understanding
    Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass
    IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023
    [ paper | interactive demo ]

  • Listen, Think, and Understand
    Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James Glass
    International Conference on Learning Representations (ICLR) 2024
    [ paper | interactive demo ]

  • Contrastive Audio-Visual Masked Autoencoder
    Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass
    International Conference on Learning Representations (ICLR) 2023
    [ paper | code ]

  • Simple and Effective Unsupervised Speech Synthesis
    Alexander H. Liu (co-first), Cheng-I Jeff Lai (co-first), Wei-Ning Hsu, Michael Auli, Alexei Baevskiv, James Glass
    InterSpeech 2022
    [ paper | demo ]

  • Towards End-to-end Unsupervised Speech Recognition
    Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski
    Spoken Language Technology Workshop (SLT) 2022
    [ paper | code ]

  • Cross-Modal Discrete Representation Learning
    Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass
    Annual Meeting of the Association for Computational Linguistics (ACL) 2022
    [ paper ]

  • Spoken moments: Learning Joint Audio-visual Representations from Video Descriptions
    Mathew Monfort (co-first), SouYoung Jin (co-first), Alexander H. Liu, David Harwath, Rogerio Feris, James Glass, Aude Oliva
    In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
    [ paper | dataset ]

  • Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies
    Alexander H. Liu, Yu-An Chung, James Glass
    InterSpeech 2021
    [ paper | code ]

  • Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning
    Alexander H. Liu (co-first), Tao Tu (co-first), Hung-yi Lee, Lin-shan Lee
    In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
    [ paper | demo ]

  • Towards Scene Understanding: Unsupervised Monocular Depth Estimation with Semantic-Aware Representation
    Alexander H. Liu (co-first), Po-Yi Chen (co-first), Yen-Cheng Liu, Yu-Chiang Frank Wang
    In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
    [ paper | oral | supplementary ]
  • A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation
    Alexander H. Liu, Yen-Cheng Liu, Yu-Ying Yeh, Yu-Chiang Frank Wang
    In Advances in Neural Information Processing Systems (NeurIPS) 2018
    [ paper | code | supplementary & reviews ]

Selected Reports

  • Voxtral
    Alexander H. Liu, Andy Ehrenberg, Andy Lo, Clément Denoix, Corentin Barreau, Guillaume Lample, Jean-Malo Delignon, Khyathi Raghavi Chandu, Patrick von Platen, Pavankumar Reddy Muddireddy, Sanchit Gandhi, Soham Ghosh, Srijan Mishra, Thomas Foubert, et al.
    Mistral AI, 2025
    [ technical report | model weights | blog ]

  • Towards Audio Language Modeling - An Overview
    Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-Wei Chang, Ho-Lam Chung, Alexander H. Liu, Hung-yi Lee
    arXiv preprint, 2024
    [ paper ]

Teaching

Honors

  • MOE Taiwan Scholarship Ministry of Education, Taiwan 2022-2023

  • Outstanding Master's Thesis Award ACLCLP 2020

  • Excellent Teaching Assistant Award NTU CSIE Dept. 2019

  • Advanced Speech Technologies Scholarship NTU EECS 2019

  • Verizon Media AI Scholarship Verizon (Taiwan) 2019

  • Best Student Speaker Award 3rd Augmented Intelligence and Interaction (AII) Workshop 2019

  • 1st Prize, Formosa Spoken QA Challenge Ministry of Science and Technology, Taiwan 2019

  • Technology Scholarship Foxconn Education Foundation 2018

  • Presidential Awards (top 5%) National Taiwan University 2017/2018

Open-source Contributions

  • Voxtral
    Core contributor -- state-of-the-art large spoken language models, 400k total downloads
    2025

  • Fairseq -- wav2vec-U 2.0
    Contributor -- speech recognition algorithm to the well-known NLP toolkit, 31.9k stars
    2022

  • End-to-end-ASR-Pytorch
    Creator -- one of the first deep speech recognition model in Pytorch, 1.2k stars
    2018