Text reuse in the form of textual parallels within and between early Chinese transmitted texts is extensive, widespread, and typically unattributed, often reflecting complex textual histories involving repeated transcription, compilation, and editing spanning many centuries and involving contributions from multiple authors and editors. In later works, a related but distinct type of text reuse appears: intentional but unattributed allusion to the contents of earlier influential or well-known works. Identifying concrete instances of both types of reuse can assist in the interpretation of obscure or disputed passages, and for early texts in particular can also shed light upon difficult issues of authorship and textual history.
Digital methods not only offer the prospect of locating individual instances of both types of reuse automatically, but to the extent to which they can reliably do so also make possible the systematic study of text reuse across a corpus of works as a whole. I describe methods of identifying concrete instances of text reuse in the classical Chinese corpus, evaluate the degrees of accuracy achieved, and demonstrate how the data produced allow text reuse patterns to be explored at a corpus level.
DHAsia gratefully acknowledges support for Prof. Sturgeon's residency from the Center for Spatial and Textual Analysis (CESTA), the Center for Interdisciplinary Digital Research, the Confucius Institute, the Center for East Asian Studies, and other partners.
About the Speaker
Since 2005, he has managed the Chinese Text Project (http://ctext.org), an online digital library of pre-modern Chinese which is now the largest such library in the world and attracts tens of thousands of visitors and large numbers of crowd-sourced contributions every day. His current projects include large-scale Optical Character Recognition (OCR) of historical Chinese documents, the application of machine learning to the dating of pre-modern Chinese texts, and development and evaluation of automated methods for analyzing pre-modern Chinese documents and their relationship to the wider corpus of pre-modern Chinese writing.
Free and Open to the Public
Stanford University Libraries, East Asia Library, Program in Modern Thought and Literature, Center for Spatial and Textual Analysis (CESTA), History Department, Center for East Asian Studies, Department of East Asian Languages and Cultures