Posts

Yanjun Gao's PhD Thesis Defense

Title: Analysis of Text To Identify, Represent, And Group Distinct Propositions   Atomic propositions are the semantic building blocks of discursive text, and are organized into simple or complex sentences in diverse syntactic structures. They are critical for many NLP applications. In this thesis, we study the identification, representation and grouping of propositions for text analysis. At the beginning of the thesis, we will introduce NLP resources for identifying and representing propositions, which include a newly annotated corpus and two modified corpus from publicly released datasets. We will also present educational data collection and annotation for the purpose of developing educational technologies to analyze and assess student writing. Later we will present the NLP contributions of this thesis, which include EDUA, an algorithm to group propositions from different texts that mean essentially the same thing; ABCD, a neural model to learn edit operations to identify and ext...

One Paper in ACL 2021 Main Conference: ABCD, A Linguistic-Aware, Off-the-Shelf, General-Purpose Model to Decompose Complex Sentences into Simple Sentences!

Looking for a great preprocessor that decomposes complex sentences into atomic clauses? We have one!  Atomic clauses are fundamental text units for understanding complex sentences.  The ability to decompose complex sentences facilitates research that aims to identify, rank, or relate distinct predications, such as content selection in summarization (Fang et al., 2016; Peyrard and Eckle-Kohler,2017), labeling argumentative discourse units in argument mining (Jo et al., 2019) or elementary discourse units in discourse analysis (Mann and Thompson, 1986; Burstein et al., 1998; Demir et al.,2010), or extracting atomic propositions for question answering (Pyatkin et al., 2020).  Previous methods rely completely either on hand-crafted rules, or encoder-decoder models to extract clauses from complex sentences.  A paper by Yanjun Gao, "ABCD:  A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences ", is accepted to ACL 2021 Main con...

Yanjun's thesis gets accepted to AIED Doctoral Consortium 2021!

A 4-page thesis presentation from Yanjun Gao, "Automated Assessment of Quality and Coverage of Ideas in Students' Source-based Writing",  has been accepted to AIED Doctoral Consortium 2021! In this presentation, Yanjun has talked about how PyrEval, an automated summarization evaluation tool, has been applied in different sets of students' summaries. This work includes the efforts made by NLP lab members and instructors across universities to design the assignment and rubric, collect students' source-based writing submissions, and annotate rubric for reliability studies. PyrEval has shown to correlate well with rubrics in scoring students' summaries. In the case studies, the scoring justification also helped instructor correct their scores.  AIED will be held online from June 14-18. 

A TextGraphs Acceptance from Yanjun: Learning Clause Representation from Dependency-Anchor Graph for Connective Prediction

Clauses are the fundamental text units for complex sentences. A recent paper from Yanjun Gao, " Learning Clause Representation from Dependency-Anchor Graph for Connective Prediction", solves the problem of clause representation and has been accepted to NAACL Workshop on Graph-Based Natural Language Processing (TextGraphs 2021)! In this work, Yanjun proposed a dependency-anchor graph representation that encodes two different syntactic information and highlights the most important constituents, the subject and the verb phrase. A graph-based neural model, DAnCE, is designed to encode this graph representation and generates a "linguistic-aware" clause representation.   This paper focuses on clause representation for connective predictions. Discourse connective is a critical indicator for text coherence and connects clauses into complex sentences. Thus the study of connectives could facilitate many downstream applications, such as coherence modeling, writing assessments,...

NLP LAB Contribution to State College Area School District CEEL (Community Education Extended Learning) Program

  The NLP Lab has joined the efforts of several  EECS  departments to contribute to the  State College Area School District CEEL (Community Education Extended Learning) Program, an afterschool program with a focus on extending learning for students in grades K-5.  We contributed a CEEL unit called " Artificial Agents that Help Teachers Help Students in STEM Writing Activities ." Our unit presents an introduction to the ideas motivating our new NSF Award, " Supporting Science Learning and Teaching in Middle School Classrooms through Automated Analysis of Students' Writing " (in collaboration with Sadhana Puntambekar, U Wis, Madison.  It explains why good science writing goes hand in hand with good science, why teachers and students struggle with science writing, and how NLP can help. It gives a simple introductory lesson in distributional semantics, with activities for the students. You can find our unit on the  EECS CEEL  page, which has a downloa...

New NSF Award

We have an  announcement about a new project funded by NSF!  Becky Passonneau has a new 4-year  NSF project  on automated analysis of middle school students' STEM writing, as of 08/01/2020. This project is a collaboration among one of the leaders in the field of learning design,  Sadhana Puntambekar ,  Becky Passonneau  as a leader in the field of natural language processing technology applied to education, and  ChanMin Kim  (PSU, Education), a leader in the interface between learning design and educational technology.  The project will develop a collaborative "Writer's Notebook" for teaching middle school science, for use interactively in the classroom by students and teachers.  Scientific writing skills are an important part of communicating about science, and learning new science concepts. The Writers Notebook will be used in several middle schools in Wisconsin. The component piece of the Writers Notebook developed at PSU for ana...

Success Stories: Yanjun Gao's paper got accepted in CoNLL 2019 !

Image
A recent paper on pyramid summarization evaluation, " Automated Pyramid Summarization Evaluation " has appeared in CoNLL 2019, with Yanjun giving a presentation on Hongkong, Nov 3, 2019. This paper presents a new software package, PyrEval, to evaluate machine and human summarization and is proved to be accurate and efficient. You can find more information about this paper on github as well. The pyramid method is known to be NP-complete and a challenging problem due to linguistic and content complexity. This work proposes a novel set partition algorithm that allocates clause-like units to summary content units, with two variants that generate equally good pyramids.  PyrEval could be used in many scenarios, including educational applications (see BEA paper), multi-lingual summarization evaluation (a recent LREC paper has applied PyrEval in German text summarization evaluation, link will be released soon).  This paper also received posit...