This INDIVIDUAL PAPER may be viewed by clicking the blue VIEW PRESENTATION button (located across from the presenter's name/below the title) OR the View Presentation in the footer of this pop-up.
Non-Consumptive Research Services for Asia-Related Databases by Crossasia
Presenter Lightning Session(s)
Hou Ieong Ho
Staatsbibliothek zu Berlin, Germany
CrossAsia’s Full-text Search and N-gram is an experimental project by the Berlin State Library (Staatsbibliothek zu Berlin) that indexes full-text contents from CrossAsia-subscribed Asia-related databases and provides non-consumptive use research for scholars within and outside of Germany. As of July 2020, we have indexed 54.2million pages from 355,000 titles from 33 different data sources, mainly in Chinese (both traditional and simplified), English, and Japanese. Without exposing the licensed materials to unauthorized users, the Full-text Search enables scholars to observe the contexts and usages of keywords by showing relevant snippets and statistical overviews of metadata. Based upon pre-calculated N-grams of all the indexed contents, further large-scale natural language processing (NLP) methods, such as texts similarities, language models, and topic modeling, can be computed to analyze terminology changes and to trace knowledge dissemination using these databases as representative linguistic corpora.