Zilong Wang

PhD Student at UC San Diego, CSE.

profile.jpg

Welcome! I am a fourth-year PhD student at UC San Diego advised by Prof. Jingbo Shang. I spent wonderful time doing research at Google Cloud AI, Google Research, Adobe Research, and Microsoft Research Asia. I received my B.S. in Computer Science from Peking University in 2020, where I was advised by Prof. Xiaojun Wan.

My research focuses on applying NLP to real-world problems. I am particularly interested in building systems that can process and understand a wide range of data, including tabular data, visually-rich documents, web contents, etc. My goal is to bridge the gap between vast knowledge sources and practical NLP applications.

news

Nov 6, 2023 Join Google Cloud AI as a Student Researcher in Fall 2023!
Sep 19, 2023 New preprint on understanding visually-rich documents with LLMs: LMDX: Language Model-based Document Information Extraction and Localization
Aug 20, 2023 New preprint: A Study on Robustness and Reliability of Large Language Model Code Generation

selected publications

  1. VRDU: A Benchmark for Visually-rich Document Understanding
    Zilong Wang, Yichao Zhou, Wei Wei, and 2 more authors
    In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, CA, USA, August 6-10, 2023, 2023
  2. LMDX: Language Model-based Document Information Extraction and Localization
    Vincent Perot, Kai Kang, Florian Luisier, and 7 more authors
    CoRR, 2023
  3. Towards Zero-shot Relation Extraction in Web Mining: A Multimodal Approach with Relative XML Path
    Zilong Wang, and Jingbo Shang
    CoRR, 2023
  4. A Study on Robustness and Reliability of Large Language Model Code Generation
    Li Zhong, and Zilong Wang
    CoRR, 2023
  5. Towards Few-shot Entity Recognition in Document Images: A Label-aware Sequence-to-Sequence Framework
    Zilong Wang, and Jingbo Shang
    In Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, 2022
  6. MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding
    Zilong Wang, Jiuxiang Gu, Chris Tensmeyer, and 5 more authors
    In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, 2022
  7. LayoutReader: Pre-training of Text and Layout for Reading Order Detection
    Zilong Wang, Yiheng Xu, Lei Cui, and 2 more authors
    In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, 2021
  8. DocStruct: A Multimodal Method to Extract Hierarchy Structure in Document for General Form Understanding
    Zilong Wang, Mingjie Zhan, Xuebo Liu, and 1 more author
    In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, 2020