Yanjun Gao's PhD Thesis Defense
Title: Analysis of Text To Identify, Represent, And Group Distinct Propositions Atomic propositions are the semantic building blocks of discursive text, and are organized into simple or complex sentences in diverse syntactic structures. They are critical for many NLP applications. In this thesis, we study the identification, representation and grouping of propositions for text analysis. At the beginning of the thesis, we will introduce NLP resources for identifying and representing propositions, which include a newly annotated corpus and two modified corpus from publicly released datasets. We will also present educational data collection and annotation for the purpose of developing educational technologies to analyze and assess student writing. Later we will present the NLP contributions of this thesis, which include EDUA, an algorithm to group propositions from different texts that mean essentially the same thing; ABCD, a neural model to learn edit operations to identify and ext...