This hands-on workshop introduces participants to a complete text and data mining workflow, from digital transcription and annotation of premodern works through to extraction of data derived from their contents. It consists of four parts:
1. Using the Chinese Text Project crowdsourced editing platform to create and obtain accurate, linked digital transcriptions of premodern Chinese texts.
2. Interactive text mining: extracting and visualizing statistical properties and relationships from transcribed texts. Types of analysis include pattern matching of words and phrases, identification of text reuse, and patterns of vocabulary usage; visualizations include summarization via networks, charts, and textual heatmaps.
3. Annotating, disambiguating, and linking references to entities (such as names of people, places, and eras) in a text to authority databases, and extracting knowledge claims about these entities (such as dates of birth, death, or appointment to a particular bureaucratic office) and contributing them to a crowdsourced knowledge base.
4. Interactive data mining: extracting and visualizing data from annotated texts and extracted knowledge claims. This includes querying the knowledge base for particular types of information, and summarizing results via networks, charts, and maps.
This workshop does not assume any prior background in digital methods, and requires only a web browser. Participants are encouraged to create a free account on ctext.org prior to the workshop: https://ctext.org/account.pl