Shannon Zejiang Shen

I am a fourth-year PhD student at MIT CSAIL , where I develop collaborative AI systems that augment human intelligence.

RESEARCH

At MIT, I am fortunate to be advised by David Sontag, and I work closely with Yoon Kim. During my PhD, I have interned at Meta FAIR, Ai2, and Microsoft Research. Previously, I was a predoctoral researcher at Ai2 and Harvard University, and I obtained my M.S. from Brown University.

My long-term research goal is to build AI that enables long-term collaboration with people to solve challenging, knowledge-intensive problems. To that end, I focus on three directions:

Understanding human LLM collaboration: How to scalably and quantitatively evaluate Human-LLM collaboration? What are the metrics/objectives to optimize for?
Improving the underlying LLM: How to train LLMs for effective collaboration? What are the needed algorithmic innovations?
Deploying collaborative AI systems in practice: What are the desired interactions for effective collaboration? How to efficiently collect user feedback and improve models?

Click on each headline to see featured projects below.

Collaborative Effort Scaling in Human-AI Collaboration

2025 Paper Code Website

We propose collaborative effort scaling to measure how agent utility grows with user involvement. We find that existing agents may struggle to sustain interaction or translate user effort into improved outputs.

LaText: Interleave Latent and Text Chain-of-Thought

2025 In preparation

We train an LM to interleave latent and text reasoning and keep critical tokens like math in context. It can achieve close performance to text-only CoT performance with 50% inference compute.

DR Tulu: RL with Evolving Rubrics for Deep Research

2025 Paper Code Website

We propose RLER, an RL method where LM-generated rubrics co-evolve with the policy during training. And we train DR Tulu-8B, a strong open model for long-form deep research.

Co-LLM: Training LLMs to Decode Collaboratively

2024 Paper Tweet Mit news

We train a latent variable model that learns to call other "expert" LLMs to decode some "hard" tokens during generation. We show improvements on expert tasks like math reasoning and medical QA.

Verifiable Text Generation via Symbolic References

2024 Paper Demo Mit news

We prompt LLMs to generate symbolic links for values in the input structured data among other regular text. We can then provide the provenance of the generated text which reduces human verification effort by 20%.

Chapyter: LLM coding assistant in JupyterLab

2023 Blogpost Github

Chapyter is a JupyterLab extension that seamlessly connects GPT-4 to your coding environment and allows transparent use of LLM for programming assistance.

LATEST

Check out the latest news about my research updates, talks & lectures, and more.

Scroll/drag right for more updates.

2026 January

Launching the Augmented Mind Podcast

2025 December

Awards at NeurIPS 2025 Workshops

Our recent papers are recognized at several NeurIPS 2025 workshops

2025 November

Talk at the Scale ML Seminar Series @ MIT

LaText: Interleave Latent and Text Chain-of-Thought for efficient reasoning

2025 October

Workshop Organizing

Co-organizing the LM4Sci Workshop at COLM 2025

2025 May

Talk at Stanford HCI Group Lunch Seminar

Rethinking the Design and Evaluation of Human and LLM Collaboration

2024 October

News

Co-LLM and SymGen are covered by MIT News

2024 September

Organizing a New Seminar Series at MIT

MIT NLP Meetings Seminar Series

2024 August

Talk at University of Washington

Co-LLM: Training LLMs to Decode Collaboratively

2024 Jul

News

Student Spotlight interview by CSAIL Alliances

2024 May

Talk at Google Research

Developing User-Friendly Language Language Model Systems

2024 May

RSAP panel at the American Literature Association conference

LayoutParser and Historical Document Image Processing

2024 March

Talk at Ranjay Krishna’s Group @ UW

Developing User-Friendly Language Language Model Systems

2024 March

Talk at MIT Sloan AI/ML Conference

Towards Verifiable Text Generation for Developing Trustworthy LLMs

2024 March

Discussion on Image Extraction, hosted by Thomas Smits at University of Amsterdam

LayoutParser and Historical Document Image Processing

2024 Jan

Instructor for an MIT IAP Class

Visual Design in Scholarly Communication

2023 July

Blog Post

Introducing Chapyter

2023 April

Talk at Nigam Shah’s Group Meeting @ Stanford

Redesigning Clinical Documentation

2022 Dec

Talk at Natural Legal Language Processing workshop @ EMNLP 2022

Multi-LexSum: Real-world Summaries of Civil Rights Lawsuits at Multiple Granularities

2022 Nov

Guest Lecture in CSE 599D @ UW, hosted by Prof. Jeff Heer

Visual Content Extraction for Scientific Documents

Link

Together with Yijia Shao and Michael Ryan, we started a new podcast series called Augmented Mind Podcast, focusing on technical human-centered AI work.

Link

Our recent papers are recognized at NeurIPS 2025:

The Collaborative Effort Scaling framework is recognized as the best paper at the NeurIPS 2025 Workshop on Socially Responsible and Trustworthy Foundation Models (ResponsibleFM).
The Hybrid CoT (LaText) paper is recognized as a spotlight paper at the NeurIPS 2025 Workshop on Efficient Reasoning.

Link to event

I gave a talk on our recent work on LaText, a novel approach to interleave latent and text chain-of-thought for efficient reasoning.

Link

I’m co-organizing the Workshop on Large Language Modeling for Scientific Discovery (LM4Sci) at COLM 2025 in Montreal.

Link to event

I shared an initial version of our collaborative effort scaling paper, and discussed the HCI aspects of our previous work on Symbolic Generation.

Link

Check the MIT News articles covering our recent projects:

(ACL ‘24) Learning to Decode Collaboratively with Multiple Language Models (article / paper)
(COLM ‘24) Towards Verifiable Text Generation with Symbolic References (article / paper).

Link

Pratyusha Sharma and I started to organize a new NLP seminar series at MIT. It features NLP researchers working on a diverse set of topics ranging from LLMs, interpretability, Human AI Collaboration, and more.

Link to event

This talk is hosted by Luke Zettlemoyer’s group. We go through the details of our ACL paper Co-LLM. You can find the slides here.

Link

In a recent interview by CSAIL Alliances, I shared our recent work on Co-LLM and SymGen and described my vision for building better language model or AIs with a human-centered perspective.

Link to event

This talk is hosted by Chiyuan Zhang and Yangsibo Hunag. We focused on the Co-LLM project and had a deep dive in the methodology and experiments. Slides available upon request.

Link to event

We reviewed the LayoutParser design and functionality, as well as approaches to tackle historical image processing and extraction in 2024. Slides available upon request.

Link to event

We start with the analogy between web interface development and llm development: LLM can produces raw text (as if htmls for the web pages) – what is the CSS and javascript in the context of LLMs? We then talk about two recent projects, Co-LLM and SymGen, drawing connections between our methods and web technologies like CSS, API calls, etc. Slides available upon request.

Link to event

In this short talk, we cover our latest research on SymGen, a novel approach to generating verifiable text for developing trustworthy LLMs. Slides available upon request.

Link to event

We reviewed the LayoutParser design and functionality, as well as approaches to tackle historical image processing and extraction in 2024. Slides available upon request.

Link

A series of lectures over the MIT IAP period, co-taught with Lucas Torroba Hennigen, focused on visual design in scholarly communication. Visual design is a crucial element in various forms of scientific communication, ranging from papers, slides, to even videos. While there is an increasing need for researchers to produce high-quality visuals, it remains to be a time-consuming and sometimes very challenging task. Despite the significant role they play, there is a noticeable lack of formal education dedicated to this aspect. This subject aims to cover several key topics about visual designs in scholarly communication.

Link

Chapyter is a JupyterLab extension that seamlessly connects GPT-4 to your coding environment. It features a code interpreter that can translate your natural language description into Python code and automatically execute it.

Link to event

We took the inspiration from our position paper on AI supported expository writing and discuss how to apply such ideas in clinical documentation. This is a joint presentation with Monica Agrawal and Hunter Lang.

Link to event

A presentation of our work on the Multi-LexSum dataset, containing real-world summaries of civil rights lawsuits at multiple granularities.

Link

We reviewed the general problem of visual content extraction in scientific documents, as well as the current state-of-the-art methods and challenges. Slides available upon request.

2025

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Featured

Paper, Code, Website

Rulin Shao^†, Akari Asai^†, Shannon Zejiang Shen^†, Hamish Ivison^†, Varsha Kishore, Jingming Zhuo^†, Xinran Zhao, Molly Park, Samuel Finlayson, David Sontag, Tyler Murray, Sewon Min, Pradeep Dasigi, Luca Soldaini, Faeze Brahman, Wen-tau Yih, Tongshuang Wu, Luke Zettlemoyer, Yoon Kim, Hannaneh Hajishirzi, Pang Wei Koh

LaText: Interleave Latent and Text Chain-of-Thought for efficient reasoning Featured

In preparation

Shannon Zejiang Shen, Rulin Shao, Chenyu Wang, Songlin Yang, Vincent-Pierre Berges, Gargi Ghosh, Pang Wei Koh, Luke Zettlemoyer, Yoon Kim, Jason E Weston, David Sontag, Wen-tau Yih

Completion ≠ Collaboration: Scaling Collaborative Effort with Agents Featured

Paper, Code, Website

Shannon Zejiang Shen^†, Valerie Chen^†, Ken Gu, Alexis Ross, Zixian Ma, Alex Gu, Chenglei Si, Jillian Ross, Jocelyn J Shen, Wayne Chi, Andi Peng, Ameet Talwalkar, Tongshuang Wu^†, David Sontag^†

2024

Learning to Decode Collaboratively with Multiple Language Models Featured

Paper, Code, Tweet, Poster, Slides, Mit news

Shannon Zejiang Shen, Hunter Lang, Bailin Wang, Yoon Kim, and David Sontag

ACL 2024

Towards Verifiable Text Generation with Symbolic References Featured

Preprint, Website, Poster, Tweet, Mit news

Lucas Torroba Hennigen^†, Shannon Zejiang Shen^†, Ani Nrusimha, Bernhard Gapp, David Sontag, and Yoon Kim

COLM 2024

Dolma: An Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Best Resource Paper

| Preprint, Code, Blog

Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E Peters, Abhilasha Ravichander, Kyle Richardson, Shannon Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, and Kyle Lo

ACL 2024

2023

PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents

Best Demo Paper

| Paper, Website, Code

Kyle Lo, Shannon Zejiang Shen, Benjamin Newman, Joseph Chang, Russell Authur, Erin Bransom, Stefan Candra, Yoganand Chandrasekhar, Regan Huff, Bailey Kuehl, Amanpreet Singh, Chris Wilhelm, Angele Zamarron, Marti A. Hearst, Daniel Weld, Doug Downey, and Luca Soldaini

EMNLP 2023 Demo Track

2022

Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities

VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups

Paper, Poster, Video, Code

Shannon Zejiang Shen, Kyle Lo, Lucy Lu Wang, Bailey Kuehl, Daniel S. Weld, and Doug Downey

Transactions of the Association for Computational Linguistics (TACL), Volume 10 2022

LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

3M+ total download on PyPI

| Website, Paper, Video, Code

Shannon Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, and Weining Li

International Conference on Document Analysis and Recognition (ICDAR) 2021 (Oral)

2025^[8]

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Featured

Paper, Code, Website

LaText: Interleave Latent and Text Chain-of-Thought for efficient reasoning Featured

In preparation

Shannon Zejiang Shen, Rulin Shao, Chenyu Wang, Songlin Yang, Vincent-Pierre Berges, Gargi Ghosh, Pang Wei Koh, Luke Zettlemoyer, Yoon Kim, Jason E Weston, David Sontag, Wen-tau Yih

Completion ≠ Collaboration: Scaling Collaborative Effort with Agents Featured

Paper, Code, Website

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Code, Paper

Chenyu Wang^†, Paria Rashidinejad, DiJia Su, Song Jiang, Sid Wang, Siyan Zhao, Cai Zhou, Shannon Zejiang Shen, Feiyu Chen, Tommi Jaakkola, Yuandong Tian, Bo Liu^†

OLMo 3

Paper

With the Ai2 OLMo Team
Allyson Ettinger, Amanda Bertsch, Bailey Kuehl, David Graham, David Heineman, Dirk Groeneveld, Faeze Brahman, Finbarr Timbers, Hamish Ivison, Jacob Morrison, Jake Poznanski, Kyle Lo, Luca Soldaini, Matt Jordan, Mayee Chen, Michael Noukhovitch, Nathan Lambert, Pete Walsh, Pradeep Dasigi, Robert Berry, Saumya Malik, Saurabh Shah, Scott Geng, Shane Arora, Shashank Gupta, Taira Anderson, Teng Xiao, Tyler Murray, Tyler Romero, Victoria Graf, Akari Asai, Akshita Bhagia, Alex Wettig, Alisa Liu, Aman Rangapur, Chloe Anastasiades, Costa Huang, Dustin Schwenk, Harsh Trivedi, Ian Magnusson, Jaron Lochner, Jiacheng Liu, Lj Miranda, Maarten Sap, Malia Morgan, Michael Schmitz, Michal Guerquin, Michael Wilson, Regan Huff, Ronan Le Bras, Rui Xin, Rulin Shao, Sam Skjonsberg, Shannon Zejiang Shen, Shuyue Stella Li, Tucker Wilde, Valentina Pyatkin, Will Merrill, Yapei Chang, Yuling Gu, Zhiyuan Zeng, Ashish Sabharwal, Luke Zettlemoyer, Pang Wei Koh, Ali Farhadi, Noah A. Smith, Hannaneh Hajishirzi

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Paper, Code

Yung-Sung Chuang, Benjamin Cohen-Wang, Shannon Zejiang Shen, Zhaofeng Wu, Hu Xu, Xi Victoria Lin, James Glass, Shang-Wen Li, Wen-tau Yih

ICML 2025

Retrieval-augmented systems can be dangerous medical communicators

Paper, Code

Lionel Wong, Ayman Ali, Raymond Xiong, Shannon Zejiang Shen, Yoon Kim, Monica Agrawal

ICML 2025 (Position Paper Track)

When One LLM Drools, Multi-LLM Collaboration Rules

Paper

Shangbin Feng, Wenxuan Ding, Alisa Liu, Zifeng Wang, Weijia Shi, Yike Wang, Shannon Zejiang Shen, Xiaochuang Han, Hunter Lang, Chen-Yu Lee, Tomas Pfister, Yejin Choi, Yulia Tsvetkov

2024^[7]

Learning to Decode Collaboratively with Multiple Language Models Featured

Paper, Code, Tweet, Poster, Slides, Mit news

Shannon Zejiang Shen, Hunter Lang, Bailin Wang, Yoon Kim, and David Sontag

ACL 2024

Towards Verifiable Text Generation with Symbolic References Featured

Preprint, Website, Poster, Tweet, Mit news

Lucas Torroba Hennigen^†, Shannon Zejiang Shen^†, Ani Nrusimha, Bernhard Gapp, David Sontag, and Yoon Kim

COLM 2024

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature

Preprint, Code, Dataset

David Wadden, Kejian Shi, Jacob Morrison, Aakanksha Naik, Shruti Singh, Nitzan Barzilay, Kyle Lo, Tom Hope, Luca Soldaini, Shannon Zejiang Shen, Doug Downey, Hannaneh Hajishirzi, and Arman Cohan

EMNLP 2025

Machine learning to predict notes for chart review in the oncology setting: a proof of concept strategy for improving clinician note-writing

Paper

Sharon Jiang, Barbara Lam, Monica Agrawal, Shannon Zejiang Shen, Nicholas Kurtzman, Steven Horng, David Karger, and David Sontag

Journal of the American Medical Informatics Association

A Design Space for Intelligent and Interactive Writing Assistants

Preprint, Tweet, Website

Mina Lee, Katy Ilonka Gero, John Joon Young Chung, Simon Buckingham Shum, Vipul Raheja, Hua Shen, Subhashini Venugopalan, Thiemo Wambsganss, David Zhou, Emad A. Alghamdi, Tal August, Avinash Bhat, Madiha Zahrah Choksi, Senjuti Dutta, Jin L.C. Guo, Md Naimul Hoque, Yewon Kim, Simon Knight, Seyed Parsa Neshaei, Antonette Shibani, Disha Shrivastava, Lila Shroff, Agnia Sergeyuk, Jessi Stark, Sarah Sterman, Sitong Wang, Antoine Bosselut, Daniel Buschek, Joseph Chee Chang, Sherol Chen, Max Kreminski, Joonsuk Park, Roy Pea, Eugenia Ha Rim Rho, Shannon Zejiang Shen, and Pao Siangliulue

Conference on Human Factors in Computing Systems (CHI) 2024

A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models

Preprint, Tweet, Code

Stefan Hegselmann, Shannon Zejiang Shen, Florian Gierse, Monica Agrawal, David Sontag, and Xiaoyi Jiang

Conference on Health, Inference, and Learning (CHIL) 2024

Dolma: An Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Best Resource Paper

| Preprint, Code, Blog

ACL 2024

2023^[7]

PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents

Best Demo Paper

| Paper, Website, Code

EMNLP 2023 Demo Track

American Stories: A Large-Scale Structured Text Dataset of Historical US Newspapers

Paper, Code

Melissa Dell, Jacob Carlson, Tom Bryan, Emily Silcock, Abhishek Arora, Shannon Zejiang Shen, Luca D’Amico-Wong, Quan Le, Pablo Querubin, and Leander Heldring

NeurIPS 2023 Datasets and Benchmarks Track

Conceptualizing Machine Learning for Dynamic Information Retrieval of Electronic Health Record Notes

Paper

Sharon Jiang, Shannon Zejiang Shen, Monica Agrawal, Barbara Lam, Nicholas Kurtzman, Steven Horng, David Karger, and David Sontag

Machine Learning for Healthcare 2023

Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents

Paper, Code

Catherine Chen, Shannon Zejiang Shen, Dan Klein, Gabriel Stanovsky, Doug Downey, and Kyle Lo

Findings of ACL 2023

Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks

Paper

Shannon Zejiang Shen, Tal August, Pao Siangliulue, Kyle Lo, Jonathan Bragg, Jeff Hammerbacher, Doug Downey, Joseph Chee Chang, and David Sontag

In2Writing Workshop at CHI 2023

The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces

Paper

With the Semantic Scholar Team
Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Kinney, Aniket Kittur, Hyeonsu Kang, Egor Klevak, Bailey Kuehl, Michael Langan, Matt Latzke, Jaron Lochner, Kelsey MacMillan, Eric Marsh, Tyler Murray, Aakanksha Naik, Ngoc-Uyen Nguyen, Srishti Palani, Soya Park, Caroline Paulic, Napol Rachatasumrit, Smita Rao, Paul Sayre, Shannon Zejiang Shen, Pao Siangliulue, Luca Soldaini, Huy Tran, Madeleine van Zuylen, Lucy Lu Wang, Christopher Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Marti A Hearst, and Daniel S Weld

Communications of the ACM

The semantic scholar open data platform

Paper

With the Semantic Scholar Team
Rodney Kinney, Chloe Anastasiades, Russell Authur, Iz Beltagy, Jonathan Bragg, Alexandra Buraczynski, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Arman Cohan, Miles Crawford, Doug Downey, Jason Dunkelberger, Oren Etzioni, Rob Evans, Sergey Feldman, Joseph Gorney, David Graham, Fangzhou Hu, Regan Huff, Daniel King, Sebastian Kohlmeier, Bailey Kuehl, Michael Langan, Daniel Lin, Haokun Liu, Kyle Lo, Jaron Lochner, Kelsey MacMillan, Tyler Murray, Chris Newell, Smita Rao, Shaurya Rohatgi, Paul Sayre, Shannon Zejiang Shen, Amanpreet Singh, Luca Soldaini, Shivashankar Subramanian, Amber Tanaka, Alex D Wade, Linda Wagner, Lucy Lu Wang, Chris Wilhelm, Caroline Wu, Jiangjiang Yang, Angele Zamarron, Madeleine Van Zuylen, and Daniel S Weld

2022^[3]

Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities

Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search

Paper

Daniel King^†, Shannon Zejiang Shen^†, Nishant Subramani, Daniel S. Weld, Iz Beltagy, and Doug Downey

The GEM Workshop at EMNLP 2022

OLALA: Object-Level Active Learning for Efficient Document Layout Annotation

Paper, Code

Shannon Zejiang Shen, Jian Zhao, Melissa Dell, Yaoliang Yu, and Weining Li

5th Workshop on NLP and Computational Social Science at EMNLP 2022

2021^[3]

VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups

Paper, Poster, Video, Code

Shannon Zejiang Shen, Kyle Lo, Lucy Lu Wang, Bailey Kuehl, Daniel S. Weld, and Doug Downey

Transactions of the Association for Computational Linguistics (TACL), Volume 10 2022

LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

3M+ total download on PyPI

| Website, Paper, Video, Code

Shannon Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, and Weining Li

International Conference on Document Analysis and Recognition (ICDAR) 2021 (Oral)

PAWLS: PDF Annotation With Labels and Structure

Website, Paper, Poster, Video, Code

Mark Neumann, Shannon Zejiang Shen, and Sam Skjonsberg

ACL-IJCNLP 2021, Demo Track

2020^[2]

A Large Dataset of Historical Japanese Documents with Complex Layouts

Website, Paper, Slides, Video

Shannon Zejiang Shen, Kaixuan Zhang, and Melissa Dell

CVPR 2020 Workshop on Text and Documents in the Deep Learning Era

Generating Object Stamps

Website, Paper, Code

Youssef Alami Mejjati, Shannon Zejiang Shen, Michael Snower,
Aaron Gokaslan, Oliver Wang, James Tompkin, and Kwang In Kim

CVPR 2020 AI for Content Creation Workshop

2019^[2]

Information Extraction from Text Regions with Complex Tabular Structure

Paper, Poster

Kaixuan Zhang, Shannon Zejiang Shen, Jie Zhou, and Melissa Dell

Workshop on Document Intelligence (DI 2019) at NeurIPS 2019

Deep Learning based Framework for Automatic Damage Detection in Aircraft Engine Borescope Inspection

Paper, Video

Shannon Zejiang Shen, Xili Wan, Feng Ye, Xinjie Guan, and Shuwen Liu

2019 International Conference on Computing, Networking and Communications (ICNC)

LLM Novel LLM Architectures Applications in Expert Domains |

HAI Collaboration Evaluation Verifiable Generations |

PROJECTS

Besides research, I've worked on various open source projects and here are a few of them:

Productivity & Utils

Chapyter

A JupyterLab extension that seamlessly connects GPT-4 to your coding environment. It features a code interpreter that can translate your natural language description into Python code and automatically execute it.

notion-df

A Python package that seamlessly connects notion databases and pandas dataframe. It allows for easy uploading/downloading Notion databases to/from pandas dataframe.

Obsidian-Scholar

An Obsidian plugin that streamlines bibliography management.

Websites & Design

cs-sop.org

A platform for current and past grad students to share their statement of purposes during application to help future applicants. It is a full-fledged website based on notion, and we develop an automated submission system that connects the notion database with a google form (code available here).

layout-parser.github.io

The layout-parser project website is built based on jekyll and bulma. Most interestingly, the layout-parser platform subpage is rendered by live fetching the model metadata stored in Github issues.

Avalanche: a personal website theme for academics

Also based on jekyll and bulma, the Avalanche theme can be used out-of-the box for creating an academic site beautifully displaying personal research description, publications, as well as recent news.

CONTACT

Whenever you have any questions regarding my research (or just want to say hi),
the best email address to find me is zejiangshen AT gmail.com.

You can also find me on Twitter, LinkedIn, and GitHub.