One Paper in ACL 2021 Main Conference: ABCD, A Linguistic-Aware, Off-the-Shelf, General-Purpose Model to Decompose Complex Sentences into Simple Sentences!

Looking for a great preprocessor that decomposes complex sentences into atomic clauses? We have one! 


Atomic clauses are fundamental text units for understanding complex sentences.  The ability to decompose complex sentences facilitates research that aims to identify, rank, or relate distinct predications, such as content selection in summarization (Fang et al., 2016; Peyrard and Eckle-Kohler,2017), labeling argumentative discourse units in argument mining (Jo et al., 2019) or elementary discourse units in discourse analysis (Mann and Thompson, 1986; Burstein et al., 1998; Demir et al.,2010), or extracting atomic propositions for question answering (Pyatkin et al., 2020).  Previous methods rely completely either on hand-crafted rules, or encoder-decoder models to extract clauses from complex sentences. 

A paper by Yanjun Gao, "ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences", is accepted to ACL 2021 Main conference.  In this paper, Yanjun has formulated the clause identification problem as a graph decomposition problem and proposed a novel framework, ABCD, that constructs a graph given a sequence of words and their syntactic relations. The name ABCD stands for the four edit operations for a neural model that learns to operate on the graph: Accept, Break, Copy and Drop. A distant supervision label creator generates edit labels given ground truth clauses and provides supervision signals for the neural model. An engineered module constructs the graph to provide input and a postprocessor with DFS algorithm segments the graph by predicted labels and output clauses. 

ABCD is trained and evaluated on two datasets, one of which is DeSSE, a new dataset annotated through Amazon Mechanical Turk with a novel annotation for sentence rewriting, connective prediction, and sentence simplification tasks (also introduced in a TextGraph paper from Yanjun).  Another dataset contains text from Wikipedia. On both datasets, ABCD achieves competitive results compared to both the encoder-decoder model and the parsing-based model. ABCD performs even superior to encode-decoder baseline on DeSSE, where the text is more challenging as it covers wider types of linguistic phenomena. 

As the meta-reviews mentioned, "This task is important not only for discourse, but for semantic analysis as well, and so far as I am aware, currently there is no general-purpose, off-the-shelf system for performing this task in a consistent and linguistically-motivated way." , ABCD is "is well presented, linguistically motivated, well evaluated". 

We are preparing the camera-ready version now. We thank our collaborator Ting-hao Huang (IST) and all the reviewers for their helpful comments! 

ACL 2021 will be held online on August 2-4, 2021. The codes, paper, and dataset will be ready by then.

Comments

Popular posts from this blog

Fall 2023 NLP lab party!