Identifying the expressions in a text that refer to the same entity, or coreference resolution, is an important problem in natural language processing. Abstract anaphora are distinct from other types of reference because they refer to abstract entities in discourse such as events, facts, and propositions, and their antecedents can have non-nominal phrase structure. Non-nominal antecedents are an interesting challenge in coreference resolution because the pronoun provides little information about the syntactic structure or semantics of the antecedent. A great deal of work in corpus annotation for coreference and coreference resolution has focused on newspaper text, and the goal of this study is to investigate how patterns in the use of abstract pronominal anaphora vary in three text types. I compiled a corpus of newswire text, spontaneous dialog and planned speech and annotated all instances of the pronouns ‘it’, this’, and ‘that’. I also annotated any non-nominal antecedents used with these pronouns. I compared frequencies of these pronouns, their referential functions, and characteristics of their non-nominal antecedents. I found variation in the frequencies of referential functions, the choice of pronoun and its referential function, the grammatical structure of non-nominal antecedents and the difficulty of the annotation task. The results indicate that the range of pronominal reference, pronominal anaphora and non-nominal antecedents in spoken discourse may not be retrievable from even very large collections of newswire texts.



Creative Commons License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Persistent Identifier