Digital sustainability: relaunch of the MHDBDB

Project Details

Description

Relaunch der Mittelhochdeutschen Begriffsdatenbank (MHDBDB), orientiert an best practices und standards der Digital Humanities

Aus dem Projektantrag:

The Middle-high German Conceptual Database (MHDBDB) of the Univ. of Salzburg has been in operation since 1972. Today it is one of the most important tools of German Medieval Studies. The content of the DB amounts to over 10m tokens with annotations such as grammatical data, conceptual meanings or metadata. Its core elements are a complex search engine and a dictionary in which meanings of word articles are indexed by a conceptual system. Tokens are related to these word articles and the meaning valid in the context as well as to further annotations.
The administration of all data takes place in an ORACLE DB since 1998; functions are realised with Java. While this technology was state of the art in the 90s, there are now clear deficits: Networking with other resources, f.e. LOD, is hardly possible, nor is the import of metadata. The current frontend is past its prime and is in urgent need of general renovation.
Therefore preparations are already being made for a relaunch, for which, however, neither the required financial nor the human resources have been made available to date. The aim of the project is to provide the necessary IT and development basis to enable interoperability, accessibility, re-usability and sustainability of existing data.

The most important innovations of the new MHDBDB are the use of established standards, standardised data formats (esp. XML-TEI, RDF, SKOS, OWL), a connection to the Semantic Web and authority files. The new OA policy will be based on the FAIR principles: The text corpus will be readable and downloadable; all annotations such as conceptual system, name system, PoS, NER, word-, phrase- and sentence structure, word fields, metadata and links to authority files will be made available under a CC licence (presumably CC BY-NC-SA 3.0 AT).
The MHDBDB will use a data model based on various Semantic Web technologies such as RDF vocabularies and ontologies, as well as TEI for encoding texts. All 666 texts have already been fully converted into TEI. TEI texts must now be linked to LOD in a stand-off process. All other data will be available as RDF. It is directly related to the tokens of the texts. Cross-linking between RDF+TEI will be done by Web Annotation Vocabulary (recommended of the W3 consortium).

The new MHDBDB will focus on sustainability in all areas to ensure stable references for resources as well as interoperability:
Word articles have already been coded according to the specifications of the OntoLex Lemon Lexicography Module and are awaiting a new web presentation. Other ontologies/vocabularies to be used are BibFrame 2.0, GND Ontology and ONAMA, an ontology for the analysis of narratological patterns.
The old, hierarchically rigid conceptual system has been transformed into a SKOS thesaurus. The names will be detached from the conceptual system and redesigned into a complex name index based on SKOS. The functional implementation in the MHDBDB-frontend is part of this project proposal.
Further metadata to be considered within the framework of the proposed project:
· The existing text types are to be transformed as a joint venture of various working groups (e.g. "Netzwerk Offenes Mittelalter" on LOD in medieval studies) into a new genre typology, which will be described in SKOS. The final implementation in the frontend is still pending; the connecting of the typology as a controlled vocabulary to the Semantic Web is a goal of this project. The aim is to publish the entire typology on GitHub for further use.
· Poet biographies, which were created in the old DB as descriptive plain texts, are to be enriched with Wikidata/GND identifiers. Vice versa, MHDBDB's own annotations on poets and works are to be connected to the Semantic Web. For example, author-specific vocabulary, co-occurrences or phraseologisms can be queried.
· Bibliographic metadata on underlying editions and manuscripts (in collaboration with internat. working groups)
· Further standardised metadata on persons, time, places and events, based on CIDOC-CRM
· A RESTful interface will enable machine retrieval of word articles, text passages, annotations and metadata, thus providing a more low-threshold access than SPARQL.

The main aim of the project is to transfer the described methods and data into a secure DH infrastructure:
· Support, storage and computing power to process the data during the research process
· Long-term archiving of research data as part of the Austrian DH infrastructure landscape
· Definition of a technology stack for the web application
· Development of back-/frontend
· Store information about the web presentation and other access methods of the research data in terms of the OAIS RM
· Documentation that helps to make the used methods reproducible
· Support in preparing data + frontend: code maintenance, conversion/migration of data and code for a successive transfer into a trusted certified repository of the Austrian DH community (if the PLUS enters into a contract)

It is evident that sustainable DH work requires common standards, durable solutions and services such as LOD, FAIR, OAIS RM. Therefore, the project team wants to turn the new MHDBDB into an innovative tool for knowledge transfer and collaboration that can connect the environments of knowledge of the community. Existing data must be made accessible in order to be (re-)usable, for example by means of a web presentation and visualisations. With this in mind, the following guiding questions will be answered:
· How can the quality of existing data be improved through DH methods and how can the scientific community benefit from the workflow?
· How can the large data pool of this (very) long-term project provide new information for users with innovative approaches and open up follow-up research?
· How must information architecture and tool design be implemented so that advanced researchers and students both feel addressed?
Short titleMHDBDB-Relaunch
AcronymMHDBDB-Relaunch
StatusFinished
Effective start/end date1/11/2230/11/23