Multimodal LLM based Gemini Model for ASR
On-Device Federated Learning of Large Models
LocLok: Protecting Location Privacy in Smartphones
DPCube: Releasing Differentially Private Data Cubes for Health Information
Multimodal LLM based Gemini Model for ASR
I work on multimodal Gemini model for ASR (Automatic Speech Recognition) and improved ASR result at Google on the Gemini 1.5 architecture by supervised fine-tuning the pre-trained model with various new algorithms designed for multimodal LLM. Due to confidentiality I can only say it's the best Gemini ASR result as of July 2024.
On-Device Federated Learning of Large Models
I led the full-Conformer based ASR model training under federated learning. By leveraging the real data from real users, the on-device federated trained model can improve the ASR performance. However, there are several constraints on devices including the RAM size, CPU computation power and network bandwidth etc. I was the first person at Google to achieve the federated learning of large Conformer model, and achieved best ASR model under federated learning.
LocLok: Protecting Location Privacy in Smartphones
Concerns on location privacy frequently arise with the rapid development of GPS enabled devices and location-based applications. However, there is no offically privacy-preserving cell phone apps which can protect users' location privacy while enabling the GPS based services. In academia, the popular method is to replace a position of latitude and longitude with a randomly generated area, called spatial cloaking. For more details, please go to http://forum.loclok.com.
Click here to view a demo system.
Our technique can rigorously protect location privacy even when the attacker is acquainted with the user. For example, the attacker knows exactly the moving habit of the user and the historically visited places of the user. Our technique can still prove the current location privacy is protected by the state-of-art differential privacy. Furthermore, we prove that our technique is the optimal solution to satisfy the guarantee of differential privacy.
I show an example as follows. Figure 1 shows a real trajectory on a real map where the two axes represent longitude and latitude. A user travaled in 500 timestamps. The corresponding grid map of the same trajectory is shown in Figure 2. Then Figure 3 shows the released trajecotry of existing method; while Figure 4 is the released trajectory of our method. Obviously our method provides more utility, which is also verified in Figure 5 where the distances between the true locations and released locations are demonstrated. LM is existing method of Laplace Mechanism; PIM is our method of Planar Isotropic Mechanism. The formal proof and experiment settings can be found in our paper.
DPCube: Releasing Differentially Private Data Cubes for Health Information
As we all know that health data is highly private. For example, a patient's disease should never be exposed to anyone without the patient's consent. On the other hand, health data is very useful for clinical treatment improvement, or medical research institute. Then the question is how to release health data so that it can protect the sensitive part while releasing the useful part. Or generally, if we have a database containing both private and useful information, how to use such data without breaching any privacy?
DPCube is a practical and rigorous method to tackle this. It satisfies the state-of-art differential privacy guarantee for privacy preservation. On the other hand, it provides useful information of the data. The brief components of DPCube is desribed as follows.
We also have the Matlab code to implement DPCube, which is available upon request. Following figure shows the interface of a data releasing scenario.
FedAQT: Accurate Quantized Training with Federated Learning
This is our FedAQT work under federated learning.
Federated Pruning: Improving Neural Network Efficiency with Federated Learning
This is our FedPruning work under federated learning.
An Adversarial Machine Learning Model
ML model is known for it's vulnerability under adversarial attacks. Even for the state-of-art ML models, it can be easily fooled by adding some amount of noise.
In the following example, we shows that a cat (on the left) in tiny ImageNet can be misclassfied to other objects when adding little noises which are not even perceived by humans. Although the three images on the right side look similar to the original cat, they are classified as sea cucumber, desk and dragonfly. Note that the ML model is the ResNet18, one of the best vision models as of 2018 with about 76% accuracy.
The attacking time to generate one adversarial sample is about 60~90 seconds. Our code uses the Carlini algorithm, and is available upon request. Also note that the perturbation can also be less than 1.0 pixel (in the 0~255 image scale) under the attack.
A P2P Video Sharing System
Many of us watch online videos everyday, either on Youtube or other video providers. However, the technique behind the video sharing is not hard. Here I show a simple video sharing system with the original code.
In my system, there are three major parts, a tracker, a super peer and many peers. The tracker is a MySQL database service. It contains the all the information about the availabe videos. If a user, which is a peer, wants to browse the videos of our system, it sends a query request to the tracker, which returns all the channels to the peer. Each of the channel has a super peer, containing the detailed information of the channel. If a user selected a channel, then it will build a TCP connection with the super peer. Then the super peer accepts the TCP connection and add the user to the neighbors of audiences.
In the TCP connection, the video content is transformed in the unit of data packets. As shown in the following figure, a peer has a data buffer area to contain these packets.
The classes used in this program is summarized as follows.
To download the original code written in C++, please click the following: super peer, tracker01, tracker01.client and tracker02.
Deep Learning Certificate
I obtained the Deep Learning certificate from deeplearning.ai.
Machine Learning Certificate
I obtained the Machine Learning certificate from Stanford University.
I-Corps Innovation Program
I am very lucky to participate the I-Corps program supported by National Science Foundation in 2016. Although my startup did not work after exploring the business opportunities, I am grateful to the I-Corps program, shown in the following image.
Yonghui Xiao
Introduction
Hi, I am a software engineer at Google since May 2017. I work on multimodal Gemini model for ASR (Automatic Speech Recognition) and improved ASR result at Google on the Gemini 1.5 architecture by supervised fine-tuning the pre-trained model with various new algorithms designed for multimodal LLM. My new algorithms tackle the fundamental problems of the multimodal LLM, and achieved the best Gemini ASR result as of July 2024. My previous project before Gemini was to train the ASR model under federated learning. I led the full Conformer based ASR model training which was launched on millions of user's smartphones. The federated trained model improved the Conformer model by leveraging the real data from real users under the differential privacy guarantee. Please check my publications for details.
Before joining Google, I was a Ph.D. student in CS department at Emory University with advisor Prof Li Xiong. Prior to Emory, I received 3 bachelor degrees at Xi'an Jiaotong University in 2005. After graduation, I worked at IVO in China. In 2008, I became a graduate student in CS department at Tsinghua University, where I was lucky to join the collaborative Tsinghua-Emory research project. I interned at Samsung Research America (SRA) in summer 2014.
Click here to download my CV.
My Research
My research at Google is on the AI areas of ASR model. I belive the direction of LLM is the multimodal approach to combine all modalities including audio, video and text etc in one model, which is an interesting and exciting challenge.
My Ph.D. research mainly focused on data privacy protection. I was the top expert on differential privacy in 2017, and I am the first person in the world who proposed the optimal algorithm under differential privacy to achieve the best data utility under the constrained differential privacy buget.
Professional Service
-
Program Committee: ICASSP 2023, 2024, INTERSPEECH 2024, SLT 2024, student PC of IEEE SP (IEEE Symposium on Security and Privacy) 2016, 2017; paper reviewer of TDSC (IEEE Transactions on Dependable and Secure Computing) 2017; paper reviewer of TKDE (IEEE Transactions on Knowledge and Data Engineering) 2016; paper reviewer of TIFS (IEEE Transactions on Information Forensics and Security) 2016; paper reviewer of TOPS (ACM Transactions on Privacy and Security) 2016; paper reviewer of TMC (IEEE Transactions on Mobile Computing) 2016.
Miscellaneous
I love Math, especially vector space and matrix computations. I am fascinated by black hole theory, string theory and 11-dimensional universe (or maybe multiverse).
Patent
- Yonghui Xiao, Li Xiong. Methods and systems for determining protected location information based on temporal correlations. US Patent 9,867,041
Publications
- Yonghui Xiao, Yuxin Ding, Changwan Ryu et al. Federated Learning of Large ASR Models in the Real World.
- Renkun Ni, Yonghui Xiao et al. FedAQT: Accurate Quantized Training with Federated Learning. ICASSP 2024
- Yuxin Ding, Yonghui Xiao et al. Improved Federated Learning for Handling Long-tail Words. Defensive publication
- Tien-Ju Yang, Yonghui Xiao et al. Online model compression for federated learning with large models. ICASSP 2023
- Rongmei Lin, Yonghui Xiao et al. Federated pruning: Improving neural network efficiency with federated learning. INTERSPEECH 2023
- Dhruv Guliani, Yonghui Xiao et al. Enabling on-device training of speech recognition models with federated dropout. ICASSP 2022
- Qiuchen Zhang, Jing Ma, Yonghui Xiao, Jian Lou, and Li Xiong. Broadening Differential Privacy for Deep Learning Against Model Inversion Attacks. IEEE BigData 2020
- Yang Cao, Yonghui Xiao, Shun Takagi, Li Xiong, Masatoshi Yoshikawa, Yilin Shen, Jinfei Liu, Hongxia Jin, and Xiaofeng Xu. Customizable and Rigorous Location Privacy through Policy Graph, 25th European Symposium on Research in Computer Security (ESORICS), 2020
- Yang Cao, Shun Takagi, Yonghui Xiao, Li Xiong, Masatoshi Yoshikawa. PANDA: Policy-aware Location Privacy for Epidemic Surveillance. 46rd International Conference on Very Large Data Base (VLDB) demo 2020
- Yang Cao, Yonghui Xiao, Li Xiong, Liquan Bai and Masatoshi Yoshikawa. Protecting Spatiotemporal Event Privacy in Continuous Location-Based Services. IEEE Transactions on Data and Knowledge Enginnering (TKDE) 2019
- Yang Cao, Yonghui Xiao, Li Xiong, Liquan Bai, Masatoshi Yoshikawa. PriSTE: Protecting Spatiotemporal Event Privacy in Continuous Location-Based Services. 45rd International Conference on Very Large Data Base (VLDB) demo 2019.
- Yang Cao, Yonghui Xiao, Li Xiong, Liquan Bai. PriSTE: From Location Privacy to Spatiotemporal Event Privacy. International Conference on Data Engieering (ICDE) 2019, poster
- Yang Cao, Li Xiong, Masatoshi Yoshikawa, Yonghui Xiao, Si Zhang. ConTPL: Controlling Temporal Privacy Leakage in Differentially Private Continuous Data Release. Proceedings of the VLDB Endowment (PVLDB), demo, 2018
- Yang Cao, Masatoshi Yoshikawa, Yonghui Xiao and Li Xiong. Quantifying Differential Privacy in Continuous Data Release under Temporal Correlations. IEEE Transactions on Data and Knowledge Enginnering (TKDE), 2018
- Yonghui Xiao, Li Xiong, Si Zhang, Yang Cao. LocLok: Location Cloaking with Differential Privacy via Hidden Markov Model. 43rd International Conference on Very Large Data Base (VLDB) demo, 2017
- Yang Cao, Masatoshi Yoshikawa, Yonghui Xiao, Li Xiong. Quantifying Differential Privacy under Temporal Correlations. IEEE International Conference on Data Engineering (ICDE), 2017
- Xiaofeng Xu, Li Xiong, Vaidy Sunderam, Yonghui Xiao. A Markov Chain Based Pruning Method for Predictive Range Queries. ACM SIGSPATIAL, 2016
- Yonghui Xiao, Li Xiong. Protecting Locations with Differential Privacy under Temporal Correlations. 22nd ACM Conference on Computer and Communications Security (CCS), 2015
- Yonghui Xiao, Li Xiong, Liyue Fan, Slawomir Goryczka, Haoran Li. DPCube: Differentially Private Histogram Release through Multidimensional Partitioning, Transactions of Data Privacy (TDP), 7:3 (2014) 195 - 222, 2014.
- James Gardner, Li Xiong, Yonghui Xiao, Jingjing Gao, Andrew Post, Xiaoqian Jiang, Lucila Ohno-Machado. SHARE: System Design and Case Studies for Statistical Health Information Release. Journal of the American Medical Informatics Association (JAMIA), 20(1), 2013
- Xiao Y, Gardner J, Xiong L. DPCube: Releasing Differentially Private Data Cubes for Health Information. International Conference on Data Engieering (ICDE) demo 2012
-
Xiao Y, Xiong L, Yuan C. Differentially Private Data Release through Multidimensional Partitioning. 7th VLDB Workshop on Secure Date Management, Singapore, SINGAPORE, SEP 17, 2010.
Awards
-
Amazon graduate research symposium, 2017
-
IEEE S&P student PC travel award, 2017
-
NSF I-Corps award as entrepreneur lead, 2016
-
CCS travel award, 2015.
-
NSF ICDE scholarship, 2012.
-
Ph.D. Fellowship, Laney Graduate School of Emory University, 2011-Present
-
Scholarship of Foxconn at Tsinghua University, Beijing, China 2009.
Contact Information
Welcome to contatct me if you have any concerns (^_^).
Please find my conatact information below.
To Contact me
Email: yhandxiao AT gmail dot com
Office: Mountain View office of Google
Phone: 404-772-0x1c2d9401 where the last four digits were encrypted by RSA :)