- Research Staff
Biography
I am a postdoctoral researcher in the CORE Lab working with Cengiz Öztireli. I work at the intersection of computer science and human cognition to develop intelligent systems that augment and enhance human perception and creativity. My research aims to manifest and recreate the world through imagination by creating models that integrate diverse multisensory perceptual data as multimodal control factors, including visual, textual, neural, and tactile inputs.
Research
Multimodal Learning, Brain Decoding, Computer Vision, Generative Model, NeuroAI, Tactile Representation Learning
Teaching
I will be supervising Part II, Part III, and M.Phil. research projects for students currently enrolled at Cambridge. If you are interested in working with me, please feel free to get in touch via email. Before reaching out, please have a look at Google Scholar or Personal Page to get a better sense of research topics. Broadly, my work focuses on multimodal learning for human sensory understanding, with a particular emphasis on cross-modal translation and representation learning using large multimodal models and generative models.
Some examples of relevant work include:
- TediGAN: Combines pretrained CLIP and StyleGAN for text-to-image generation and editing.
- DREAM: Decodes semantics, color, and depth from brain activations and integrates these into T2I-Adapter for image reconstruction.
- UMBRAE: Interprets brain activations into multimodal explanations with pretrained MLLMs using task prompts.
- RETRO: Learns tactile representations and enables tactile-based image material editing.
Here are some potential topics for Part II research projects:
- Brain: multimodal brain alignment, multimodal brain explanation (language, location), brain visual decoding (image, video, 3D etc), brain-based art creation.
- Touch: tactile representation, tactile-based 2D image material editing, 3D editing and reconstruction.
- Benchmark and evaluation for brain and touch sensory modalities.
For Part III and M.Phil. projects, the topics are more open-ended research questions. You are welcome to build on one of the above projects or propose your own idea. Feel free to email me to discuss further.
Publications
[1] Weihao Xia, Raoul de Charette, Cengiz Öztireli, Jing-Hao Xue. UMBRAE: Unified Multimodal Brain Decoding. In European Conference on Computer Vision (ECCV), 2024.
[2] Weihao Xia, Raoul de Charette, Cengiz Öztireli, Jing-Hao Xue. DREAM: Visual Decoding from Reversing Human Visual System. In Winter Conference on Applications of Computer Vision (WACV), 2024.
[3] Weihao Xia, Yujiu Yang, Jing-Hao Xue, Baoyuan Wu. TediGAN: Text-Guided Diverse Face Image Generation and Manipulation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[4] Weihao Xia, Yujiu Yang, Jing-Hao Xue, Wensen Feng. Controllable Continuous Gaze Redirection. In ACM Multimedia (MM), 2020.
[5] Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, Ming-Hsuan Yang. GAN Inversion: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.