Shannon Zejiang Shen

I am a second year PhD Student at MIT CSAIL,
working at the intersection between NLP and HCI.


On the NLP side, I am interested in language understanding in scientific, legal, or clinical text, documents that are typically authored and used by domain experts.

On the HCI side, I explore how humans, especially domain experts, and AI models, e.g., Large Language Models, can communicate and collaborate.

I also developed a suite of tools for document understanding and parsing. Please check my projects on Document Intelligence for more information.



Towards Verifiable Text Generation with Symbolic References New

Lucas Torroba Hennigen, Shannon Zejiang Shen, Ani Nrusimha, Bernhard Gapp, David Sontag, and Yoon Kim

PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents New

Kyle Lo, Shannon Zejiang Shen, Benjamin Newman, Joseph Chang, Russell Authur, Erin Bransom, Stefan Candra, Yoganand Chandrasekhar, Regan Huff, Bailey Kuehl, Amanpreet Singh, Chris Wilhelm, Angele Zamarron, Marti A. Hearst, Daniel Weld, Doug Downey, Luca Soldaini

EMNLP 2023 Demo Track

American Stories: A Large-Scale Structured Text Dataset of Historical US Newspapers New

Melissa Dell, Jacob Carlson, Tom Bryan, Emily Silcock, Abhishek Arora, Shannon Zejiang Shen, Luca D’Amico-Wong, Quan Le, Pablo Querubin, and Leander Heldring

Conceptualizing Machine Learning for Dynamic Information Retrieval of Electronic Health Record Notes

Sharon Jiang, Shannon Zejiang Shen, Monica Agrawal, Barbara Lam, Nicholas Kurtzman, Steven Horng, David Karger, and David Sontag

Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents

Catherine Chen, Shannon Zejiang Shen, Dan Klein, Gabriel Stanovsky, Doug Downey, and Kyle Lo

Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks

Shannon Zejiang Shen, Tal August, Pao Siangliulue, Kyle Lo, Jonathan Bragg, Jeff Hammerbacher, Doug Downey, Joseph Chee Chang, and David Sontag

The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces

With the Semantic Scholar Team
Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Kinney, Aniket Kittur, Hyeonsu Kang, Egor Klevak, Bailey Kuehl, Michael Langan, Matt Latzke, Jaron Lochner, Kelsey MacMillan, Eric Marsh, Tyler Murray, Aakanksha Naik, Ngoc-Uyen Nguyen, Srishti Palani, Soya Park, Caroline Paulic, Napol Rachatasumrit, Smita Rao, Paul Sayre, Shannon Zejiang Shen, Pao Siangliulue, Luca Soldaini, Huy Tran, Madeleine van Zuylen, Lucy Lu Wang, Christopher Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Marti A Hearst, and Daniel S Weld

The semantic scholar open data platform

With the Semantic Scholar Team
Rodney Kinney, Chloe Anastasiades, Russell Authur, Iz Beltagy, Jonathan Bragg, Alexandra Buraczynski, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Arman Cohan, Miles Crawford, Doug Downey, Jason Dunkelberger, Oren Etzioni, Rob Evans, Sergey Feldman, Joseph Gorney, David Graham, Fangzhou Hu, Regan Huff, Daniel King, Sebastian Kohlmeier, Bailey Kuehl, Michael Langan, Daniel Lin, Haokun Liu, Kyle Lo, Jaron Lochner, Kelsey MacMillan, Tyler Murray, Chris Newell, Smita Rao, Shaurya Rohatgi, Paul Sayre, Shannon Zejiang Shen, Amanpreet Singh, Luca Soldaini, Shivashankar Subramanian, Amber Tanaka, Alex D Wade, Linda Wagner, Lucy Lu Wang, Chris Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Madeleine Van Zuylen, and Daniel S Weld


Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities Featured

Shannon Zejiang Shen, Kyle Lo, Lauren Yu, Nathan Dahlberg, Margo Schlanger, and Doug Downey

NeurIPS 2022 Datasets and Benchmarks Track

Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search

Daniel King, Shannon Zejiang Shen, Nishant Subramani, Daniel S. Weld, Iz Beltagy, and Doug Downey


Generating Object Stamps

Youssef Alami Mejjati, Shannon Zejiang Shen, Michael Snower,
Aaron Gokaslan, Oliver Wang, James Tompkin, and Kwang In Kim


Deep Learning based Framework for Automatic Damage Detection in Aircraft Engine Borescope Inspection

Shannon Zejiang Shen, Xili Wan, Feng Ye, Xinjie Guan, and Shuwen Liu

2019 International Conference on Computing, Networking and Communications (ICNC)

NLP Expert NLP LLM |
HAI AI-Assisted Writing |
Other Document Analysis Early Computer Vision Papers
Please click the tags above to show the papers.


Besides research, I've worked on various open source projects and here are a few of them:

Websites & Design

A platform for current and past grad students to share their statement of purposes during application to help future applicants. It is a full-fledged website based on notion, and we develop an automated submission system that connects the notion database with a google form (code available here).

The layout-parser project website is built based on jekyll and bulma. Most interestingly, the layout-parser platform subpage is rendered by live fetching the model metadata stored in Github issues.

Avalanche: a personal website theme for academics

Also based on jekyll and bulma, the Avalanche theme can be used out-of-the box for creating an academic site beautifully displaying personal research description, publications, as well as recent news.

Productivity & Utils


A JupyterLab extension that seamlessly connects GPT-4 to your coding environment. It features a code interpreter that can translate your natural language description into Python code and automatically execute it.


A Python package that seamlessly connects notion databases and pandas dataframe. It allows for easy uploading/downloading Notion databases to/from pandas dataframe.


An Obsidian plugin that streamlines bibliography management.


Whenever you have any questions regarding my research (or just want to say hi), the best email address to find me is zejiangshen^