A personalized database can be extremely valuable to humanists in Asian Studies, but building one seems daunting for most non-programmers. Common methods of database development usually require specialized knowledge of programming and data science. This workshop teaches database-building for non-programmers. We will walk participants through the important steps in database development with free, easy-to-use software. Regardless of their prior exposure to programming and data science, participants will learn how to acquire textual and visual data from Asian material; moreover, they will learn to manage these data in common file types, such as txt and jpeg, that are easily accessible on most computers.
Ashley Liu will teach participants how to legally convert Chinese eBooks, including ones locked by publishers, into plain text with Calibre. Due to the abundance of online eBooks, this method allows scholars to quickly develop a large personalized text corpus, which can be used for keyword searches or literary analysis. Jean-Baptiste Clais will discuss using common Windows tools and browser extensions to create a visual documentation system for artworks with Asian language texts. This method bypasses the issue of language compatibility by converting textual references in scanned images and screenshots into jpeg files. The data harvested through his method are easily accessible via Windows File Explorer. To improve the legibility of acquired data, Junting Huang will introduce participants to OpenRefine, an online platform that cleans up messy data. This method is particularly pertinent to processing large digital archives due to its ability to handle inconsistent spelling, format, and name order, all of which are common issues in dealing with material of Asian cultures. In summary, this workshop allows non-programmers to develop a database from scratch. It addresses specific challenges that Asian studies scholars may encounter in database development, such as country-specific copyright issues, language differences, and naming conventions. Upon completion, participants will be able to build a personalized database from a wide range of textual and visual sources.