This INDIVIDUAL PAPER may be viewed by clicking the blue VIEW PRESENTATION button (located across from the presenter's name/below the title) OR the View Presentation in the footer of this pop-up.
Rectifying Names: Digital Approaches to Name Disambiguation for Chinese Historical Figures
Presenter Lightning Session(s)
Lik Hang Tsui
City University of Hong Kong, Hong Kong
When integrating biographical data extracted from 2,000+ local gazetteers (difangzhi) into the China Biographical Database (CBDB), we need to identify and link records of the same person--“disambiguating” them. Traditional Chinese naming customs pose big challenges to this, especially for the gazetteer dataset containing 0.12 million records and 90k unique names of imperial government officials. Also, useful variables are missing in numerous gazetteer entries. My presentation analyzes solutions to disambiguating identical personal names in Chinese script. First, we identified computationally the individuals who repeatedly took official posts in the same locality. Then, we cross-tabulated the overlap of content in multiple gazetteers. Finally, we corroborated the remaining data with external datasets e.g. CGED-Q. Through doing so we have disambiguated 51k personal names with optimal precision; such task is only possible if done digitally. The techniques will be useful for disambiguation and Named Entity Recognition of other large-scale unstructured data in non-Latin script.