Posts

Showing posts from May, 2021

Yanjun Gao's PhD Thesis Defense

Title: Analysis of Text To Identify, Represent, And Group Distinct Propositions   Atomic propositions are the semantic building blocks of discursive text, and are organized into simple or complex sentences in diverse syntactic structures. They are critical for many NLP applications. In this thesis, we study the identification, representation and grouping of propositions for text analysis. At the beginning of the thesis, we will introduce NLP resources for identifying and representing propositions, which include a newly annotated corpus and two modified corpus from publicly released datasets. We will also present educational data collection and annotation for the purpose of developing educational technologies to analyze and assess student writing. Later we will present the NLP contributions of this thesis, which include EDUA, an algorithm to group propositions from different texts that mean essentially the same thing; ABCD, a neural model to learn edit operations to identify and extract

One Paper in ACL 2021 Main Conference: ABCD, A Linguistic-Aware, Off-the-Shelf, General-Purpose Model to Decompose Complex Sentences into Simple Sentences!

Looking for a great preprocessor that decomposes complex sentences into atomic clauses? We have one!  Atomic clauses are fundamental text units for understanding complex sentences.  The ability to decompose complex sentences facilitates research that aims to identify, rank, or relate distinct predications, such as content selection in summarization (Fang et al., 2016; Peyrard and Eckle-Kohler,2017), labeling argumentative discourse units in argument mining (Jo et al., 2019) or elementary discourse units in discourse analysis (Mann and Thompson, 1986; Burstein et al., 1998; Demir et al.,2010), or extracting atomic propositions for question answering (Pyatkin et al., 2020).  Previous methods rely completely either on hand-crafted rules, or encoder-decoder models to extract clauses from complex sentences.  A paper by Yanjun Gao, "ABCD:  A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences ", is accepted to ACL 2021 Main conference.  In this paper,

Yanjun's thesis gets accepted to AIED Doctoral Consortium 2021!

A 4-page thesis presentation from Yanjun Gao, "Automated Assessment of Quality and Coverage of Ideas in Students' Source-based Writing",  has been accepted to AIED Doctoral Consortium 2021! In this presentation, Yanjun has talked about how PyrEval, an automated summarization evaluation tool, has been applied in different sets of students' summaries. This work includes the efforts made by NLP lab members and instructors across universities to design the assignment and rubric, collect students' source-based writing submissions, and annotate rubric for reliability studies. PyrEval has shown to correlate well with rubrics in scoring students' summaries. In the case studies, the scoring justification also helped instructor correct their scores.  AIED will be held online from June 14-18. 

A TextGraphs Acceptance from Yanjun: Learning Clause Representation from Dependency-Anchor Graph for Connective Prediction

Clauses are the fundamental text units for complex sentences. A recent paper from Yanjun Gao, " Learning Clause Representation from Dependency-Anchor Graph for Connective Prediction", solves the problem of clause representation and has been accepted to NAACL Workshop on Graph-Based Natural Language Processing (TextGraphs 2021)! In this work, Yanjun proposed a dependency-anchor graph representation that encodes two different syntactic information and highlights the most important constituents, the subject and the verb phrase. A graph-based neural model, DAnCE, is designed to encode this graph representation and generates a "linguistic-aware" clause representation.   This paper focuses on clause representation for connective predictions. Discourse connective is a critical indicator for text coherence and connects clauses into complex sentences. Thus the study of connectives could facilitate many downstream applications, such as coherence modeling, writing assessments,