Bio

Alexander Haojan (浩然) Liu is a 4th Ph.D. student in Computer Science at MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He is a member of the Spoken Language System (SLS) Group leading by Dr. James Glass. His research interests are in the field of machine learning, natural language processing (speech processing in particular), and computer vision. Currently, his work focuses on self-supervised learning of audio and their applications.

Prior to joining MIT, Alex received his M.S. and B.S. degrees in Computer Science & Information Engineering (CSIE) from National Taiwan University (NTU). He was a member of the Speech Processing Lab working with Lin-shan Lee and Prof. Hung-yi Lee in the area of machine learning and speech processing. During his undergraduate years, he worked with Yu-Chiang Frank Wang in computer vision and representation learning. Besides academic labs, he also spent time working at Meta AI (formerly known as Facebook AI Research) as a research intern.

Publications / Teaching / Honors / Side Projects

News

  • Inspired by Wei-Chiu Ma, I would like to commit 1-2 hours per week to provide suggestions and/or mentorships to junior students in need, especially those from underrepresented groups. Please fill out this form if you are interested.

  • I’m activelty looking for research collaboration with students outside of MIT on machine learning for audio and multi-modal data. Pleae feel free to drop me an email if you find my recent works interesting!

Selected Publications

  • Generative Pre-training for Speech with Flow Matching
    Alexander H. Liu, Matthew Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu
    International Conference on Learning Representations (ICLR) 2024
    [ paper ]

  • Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective
    Alexander H. Liu(co-first), Sung-Lin Yeh(co-first), James Glass
    In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
    [ paper ]

  • DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
    Alexander H. Liu, Heng-Jui Chang, Michael Auli(co-last), Wei-Ning Hsu(co-last), James Glass(co-last)
    In Advances in Neural Information Processing Systems (NeurIPS) 2023
    [ paper | code ]

  • Joint Audio and Speech Understanding
    Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass
    IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023
    [ paper | interactive demo ]

  • Listen, Think, and Understand
    Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James Glass
    International Conference on Learning Representations (ICLR) 2024
    [ paper | interactive demo ]

  • Contrastive Audio-Visual Masked Autoencoder
    Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass
    International Conference on Learning Representations (ICLR) 2023
    [ paper | code ]

  • Simple and Effective Unsupervised Speech Synthesis
    Alexander H. Liu (co-first), Cheng-I Jeff Lai (co-first), Wei-Ning Hsu, Michael Auli, Alexei Baevskiv, James Glass
    InterSpeech 2022
    [ paper | demo ]

  • Towards End-to-end Unsupervised Speech Recognition
    Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski
    Spoken Language Technology Workshop (SLT) 2022
    [ paper | code ]

  • Cross-Modal Discrete Representation Learning
    Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass
    Annual Meeting of the Association for Computational Linguistics (ACL) 2022
    [ paper ]

  • Spoken moments: Learning Joint Audio-visual Representations from Video Descriptions
    Mathew Monfort (co-first), SouYoung Jin (co-first), Alexander H. Liu, David Harwath, Rogerio Feris, James Glass, Aude Oliva
    In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
    [ paper | dataset ]

  • Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies
    Alexander H. Liu, Yu-An Chung, James Glass
    InterSpeech 2021
    [ paper | code ]

  • Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning
    Alexander H. Liu (co-first), Tao Tu (co-first), Hung-yi Lee, Lin-shan Lee
    In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
    [ paper | demo ]

  • Towards Scene Understanding: Unsupervised Monocular Depth Estimation with Semantic-Aware Representation
    Alexander H. Liu (co-first), Po-Yi Chen (co-first), Yen-Cheng Liu, Yu-Chiang Frank Wang
    In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
    [ paper | oral | supplementary ]
  • A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation
    Alexander H. Liu, Yen-Cheng Liu, Yu-Ying Yeh, Yu-Chiang Frank Wang
    In Advances in Neural Information Processing Systems (NeurIPS) 2018
    [ paper | code | supplementary & reviews ]

For the complete list, please visit google scholar.

Teaching

Honors

  • Advanced Speech Technologies Scholarship NTU EECS 2019

  • Verizon Media AI Scholarship Verizon Media, Taiwan 2019

  • Best Student Speaker Award 3rd AII Workshop 2019

  • 1st Price, Formosa Spoken QA Challenge Ministry of Science and Technology, Taiwan 2019

  • Excellent Teaching Assistant Award NTU CSIE Dept. 2019

  • Technology Scholarship Foxconn Education Foundation 2019

  • Presidential Awards NTU CSIE 2017/2018

Projects

  • Open Sourced End-to-end Speech Recognition System [ code GitHub stars ]
  • Mandarin Spoken QA System [ demo ]