Multimodal LLM based Gemini Model for ASR

On-Device Federated Learning of Large Models

LocLok: Protecting Location Privacy in Smartphones

DPCube: Releasing Differentially Private Data Cubes for Health Information

Multimodal LLM based Gemini Model for ASR

I work on multimodal Gemini model for ASR (Automatic Speech Recognition) and improved ASR result at Google on the Gemini 1.5 architecture by supervised fine-tuning the pre-trained model with various new algorithms designed for multimodal LLM. Due to confidentiality I can only say it's the best Gemini ASR result as of July 2024.

To the top.

On-Device Federated Learning of Large Models

I led the full-Conformer based ASR model training under federated learning. By leveraging the real data from real users, the on-device federated trained model can improve the ASR performance. However, there are several constraints on devices including the RAM size, CPU computation power and network bandwidth etc. I was the first person at Google to achieve the federated learning of large Conformer model, and achieved best ASR model under federated learning.

To the top.

LocLok: Protecting Location Privacy in Smartphones

Concerns on location privacy frequently arise with the rapid development of GPS enabled devices and location-based applications. However, there is no offically privacy-preserving cell phone apps which can protect users' location privacy while enabling the GPS based services. In academia, the popular method is to replace a position of latitude and longitude with a randomly generated area, called spatial cloaking. For more details, please go to http://forum.loclok.com.

Click here to view a demo system.

Our technique can rigorously protect location privacy even when the attacker is acquainted with the user. For example, the attacker knows exactly the moving habit of the user and the historically visited places of the user. Our technique can still prove the current location privacy is protected by the state-of-art differential privacy. Furthermore, we prove that our technique is the optimal solution to satisfy the guarantee of differential privacy.

I show an example as follows. Figure 1 shows a real trajectory on a real map where the two axes represent longitude and latitude. A user travaled in 500 timestamps. The corresponding grid map of the same trajectory is shown in Figure 2. Then Figure 3 shows the released trajecotry of existing method; while Figure 4 is the released trajectory of our method. Obviously our method provides more utility, which is also verified in Figure 5 where the distances between the true locations and released locations are demonstrated. LM is existing method of Laplace Mechanism; PIM is our method of Planar Isotropic Mechanism. The formal proof and experiment settings can be found in our paper.

To the top.

DPCube: Releasing Differentially Private Data Cubes for Health Information

As we all know that health data is highly private. For example, a patient's disease should never be exposed to anyone without the patient's consent. On the other hand, health data is very useful for clinical treatment improvement, or medical research institute. Then the question is how to release health data so that it can protect the sensitive part while releasing the useful part. Or generally, if we have a database containing both private and useful information, how to use such data without breaching any privacy?

DPCube is a practical and rigorous method to tackle this. It satisfies the state-of-art differential privacy guarantee for privacy preservation. On the other hand, it provides useful information of the data. The brief components of DPCube is desribed as follows.

We also have the Matlab code to implement DPCube, which is available upon request. Following figure shows the interface of a data releasing scenario.

To the top.

FedAQT: Accurate Quantized Training with Federated Learning

This is our FedAQT work under federated learning.

Federated Pruning: Improving Neural Network Efficiency with Federated Learning

This is our FedPruning work under federated learning.

An Adversarial Machine Learning Model

ML model is known for it's vulnerability under adversarial attacks. Even for the state-of-art ML models, it can be easily fooled by adding some amount of noise.

In the following example, we shows that a cat (on the left) in tiny ImageNet can be misclassfied to other objects when adding little noises which are not even perceived by humans. Although the three images on the right side look similar to the original cat, they are classified as sea cucumber, desk and dragonfly. Note that the ML model is the ResNet18, one of the best vision models as of 2018 with about 76% accuracy.

The attacking time to generate one adversarial sample is about 60~90 seconds. Our code uses the Carlini algorithm, and is available upon request. Also note that the perturbation can also be less than 1.0 pixel (in the 0~255 image scale) under the attack.

To the top.

A P2P Video Sharing System

Many of us watch online videos everyday, either on Youtube or other video providers. However, the technique behind the video sharing is not hard. Here I show a simple video sharing system with the original code.

In my system, there are three major parts, a tracker, a super peer and many peers. The tracker is a MySQL database service. It contains the all the information about the availabe videos. If a user, which is a peer, wants to browse the videos of our system, it sends a query request to the tracker, which returns all the channels to the peer. Each of the channel has a super peer, containing the detailed information of the channel. If a user selected a channel, then it will build a TCP connection with the super peer. Then the super peer accepts the TCP connection and add the user to the neighbors of audiences.

In the TCP connection, the video content is transformed in the unit of data packets. As shown in the following figure, a peer has a data buffer area to contain these packets.

The classes used in this program is summarized as follows.

To download the original code written in C++, please click the following: super peer, tracker01, tracker01.client and tracker02.

To the top.

Deep Learning Certificate

Machine Learning Certificate

I-Corps Innovation Certificate

Deep Learning Certificate

I obtained the Deep Learning certificate from deeplearning.ai.

To the top.

Machine Learning Certificate

I obtained the Machine Learning certificate from Stanford University.

To the top.

I-Corps Innovation Program

I am very lucky to participate the I-Corps program supported by National Science Foundation in 2016. Although my startup did not work after exploring the business opportunities, I am grateful to the I-Corps program, shown in the following image.

To the top.

Yonghui Xiao

Introduction

Hi, I am a staff software engineer at Meta leading the AI agent for real-world applications. The AI agent was created from 0 to 1 with LLM, RAG, toolsets with APIs and our innovative approaches to boost the agent quality. We build the e2e solution ranging from the LLM model, data mining and crawling tools, backend database for RAG with embbedding based search, image&multimodel content support, advanced agent architecture, backend APIs integrated with LLM, innovative multi-step task resolution method, offline evaluation with LLM auto-eval and online evaluation with LLM based eval agents etc. The AI agent has been launched for real customers with real production traffic.

I was a software engineer at Google from May 2017 to Dec 2024. I worked on multimodal Gemini model for ASR (Automatic Speech Recognition) and improved ASR result at Google on the Gemini 1.5 architecture by supervised fine-tuning the pre-trained model with various new algorithms designed for multimodal LLM. My new algorithms tackled the fundamental problems of the multimodal LLM, and achieved the best Gemini ASR result as of July 2024. My previous project before Gemini was to train the ASR model under federated learning. I led the full Conformer based ASR model training which was launched on millions of user's smartphones. The federated trained model improved the Conformer model by leveraging the real data from real users under the differential privacy guarantee. Please check my publications for details.

Before joining Google, I was a Ph.D. student in CS department at Emory University with advisor Prof Li Xiong. Prior to Emory, I received 3 bachelor degrees at Xi'an Jiaotong University in 2005. After graduation, I worked at IVO in China. In 2008, I became a graduate student in CS department at Tsinghua University, where I was lucky to join the collaborative Tsinghua-Emory research project. I interned at Samsung Research America (SRA) in summer 2014.

Click here to download my CV.

My Research

My research at Google is on the AI areas of ASR model. I belive the direction of LLM is the multimodal approach to combine all modalities including audio, video and text etc in one model, which is an interesting and exciting challenge.

My Ph.D. research mainly focused on data privacy protection. I was the top expert on differential privacy in 2017, and I am the first person in the world who proposed the optimal algorithm under differential privacy to achieve the best data utility under the constrained differential privacy buget.

Professional Service

Program Committee: ICASSP 2023, 2024, INTERSPEECH 2024, SLT 2024, student PC of IEEE SP (IEEE Symposium on Security and Privacy) 2016, 2017; paper reviewer of TDSC (IEEE Transactions on Dependable and Secure Computing) 2017; paper reviewer of TKDE (IEEE Transactions on Knowledge and Data Engineering) 2016; paper reviewer of TIFS (IEEE Transactions on Information Forensics and Security) 2016; paper reviewer of TOPS (ACM Transactions on Privacy and Security) 2016; paper reviewer of TMC (IEEE Transactions on Mobile Computing) 2016.

Miscellaneous

I love Math, especially vector space and matrix computations. I am fascinated by black hole theory, string theory and 11-dimensional universe (or maybe multiverse).

Patent

Yonghui Xiao, Li Xiong. Methods and systems for determining protected location information based on temporal correlations. US Patent 9,867,041

Publications

Yonghui Xiao, Yuxin Ding, Changwan Ryu et al. Federated Learning of Large ASR Models in the Real World.
Renkun Ni, Yonghui Xiao et al. FedAQT: Accurate Quantized Training with Federated Learning. ICASSP 2024
Yuxin Ding, Yonghui Xiao et al. Improved Federated Learning for Handling Long-tail Words. Defensive publication
Tien-Ju Yang, Yonghui Xiao et al. Online model compression for federated learning with large models. ICASSP 2023
Rongmei Lin, Yonghui Xiao et al. Federated pruning: Improving neural network efficiency with federated learning. INTERSPEECH 2023
Dhruv Guliani, Yonghui Xiao et al. Enabling on-device training of speech recognition models with federated dropout. ICASSP 2022
Qiuchen Zhang, Jing Ma, Yonghui Xiao, Jian Lou, and Li Xiong. Broadening Differential Privacy for Deep Learning Against Model Inversion Attacks. IEEE BigData 2020
Yang Cao, Yonghui Xiao, Shun Takagi, Li Xiong, Masatoshi Yoshikawa, Yilin Shen, Jinfei Liu, Hongxia Jin, and Xiaofeng Xu. Customizable and Rigorous Location Privacy through Policy Graph, 25th European Symposium on Research in Computer Security (ESORICS), 2020
Yang Cao, Shun Takagi, Yonghui Xiao, Li Xiong, Masatoshi Yoshikawa. PANDA: Policy-aware Location Privacy for Epidemic Surveillance. 46rd International Conference on Very Large Data Base (VLDB) demo 2020
Yang Cao, Yonghui Xiao, Li Xiong, Liquan Bai and Masatoshi Yoshikawa. Protecting Spatiotemporal Event Privacy in Continuous Location-Based Services. IEEE Transactions on Data and Knowledge Enginnering (TKDE) 2019
Yang Cao, Yonghui Xiao, Li Xiong, Liquan Bai, Masatoshi Yoshikawa. PriSTE: Protecting Spatiotemporal Event Privacy in Continuous Location-Based Services. 45rd International Conference on Very Large Data Base (VLDB) demo 2019.
Yang Cao, Yonghui Xiao, Li Xiong, Liquan Bai. PriSTE: From Location Privacy to Spatiotemporal Event Privacy. International Conference on Data Engieering (ICDE) 2019, poster
Yang Cao, Li Xiong, Masatoshi Yoshikawa, Yonghui Xiao, Si Zhang. ConTPL: Controlling Temporal Privacy Leakage in Differentially Private Continuous Data Release. Proceedings of the VLDB Endowment (PVLDB), demo, 2018
Yang Cao, Masatoshi Yoshikawa, Yonghui Xiao and Li Xiong. Quantifying Differential Privacy in Continuous Data Release under Temporal Correlations. IEEE Transactions on Data and Knowledge Enginnering (TKDE), 2018
Yonghui Xiao, Li Xiong, Si Zhang, Yang Cao. LocLok: Location Cloaking with Differential Privacy via Hidden Markov Model. 43rd International Conference on Very Large Data Base (VLDB) demo, 2017
Yang Cao, Masatoshi Yoshikawa, Yonghui Xiao, Li Xiong. Quantifying Differential Privacy under Temporal Correlations. IEEE International Conference on Data Engineering (ICDE), 2017
Xiaofeng Xu, Li Xiong, Vaidy Sunderam, Yonghui Xiao. A Markov Chain Based Pruning Method for Predictive Range Queries. ACM SIGSPATIAL, 2016
Yonghui Xiao, Li Xiong. Protecting Locations with Differential Privacy under Temporal Correlations. 22nd ACM Conference on Computer and Communications Security (CCS), 2015
Yonghui Xiao, Li Xiong, Liyue Fan, Slawomir Goryczka, Haoran Li. DPCube: Differentially Private Histogram Release through Multidimensional Partitioning, Transactions of Data Privacy (TDP), 7:3 (2014) 195 - 222, 2014.
James Gardner, Li Xiong, Yonghui Xiao, Jingjing Gao, Andrew Post, Xiaoqian Jiang, Lucila Ohno-Machado. SHARE: System Design and Case Studies for Statistical Health Information Release. Journal of the American Medical Informatics Association (JAMIA), 20(1), 2013
Xiao Y, Gardner J, Xiong L. DPCube: Releasing Differentially Private Data Cubes for Health Information. International Conference on Data Engieering (ICDE) demo 2012
Xiao Y, Xiong L, Yuan C. Differentially Private Data Release through Multidimensional Partitioning. 7th VLDB Workshop on Secure Date Management, Singapore, SINGAPORE, SEP 17, 2010.

Awards

Amazon graduate research symposium, 2017
IEEE S&P student PC travel award, 2017
NSF I-Corps award as entrepreneur lead, 2016
CCS travel award, 2015.
NSF ICDE scholarship, 2012.
Ph.D. Fellowship, Laney Graduate School of Emory University, 2011-Present
Scholarship of Foxconn at Tsinghua University, Beijing, China 2009.

Contact Information

Welcome to contatct me if you have any concerns (^_^).
Please find my conatact information below.

To Contact me

Email: yhandxiao AT gmail dot com

Office: Mountain View office of Google

Phone: 404-772-0x1c2d9401 where the last four digits were encrypted by RSA :)