Using the Middle High German Conceptual Database

Revision 1.1 (May 27th, 2003)

Content


Introduction

Preamble

When using MHDBDB you ought to be aware that the entire database system is an ongoing construction site. That means that the dictionary as well as the entire text base is subject to continuous revision and expansion. Texts are being lemmatised and disambiguated, that is, homographs are being separated and ambiguous words are assigned to a specific meaning within their respective context. From time to time, we will inform you about the ongoing work on our News Page.
In addition, we are continuously working on the program system and the help tools. Therefore, we are urging you to send us comments and error reports so we may improve the system and make it more user friendly.

If you want to submit a query to MHDBDB, you first have to do three different things:

Text Selection

The box for text selection contains three different types you can choose from:

When you conduct a context search you may choose the size of your context between 1-99 words or lines before and after your keyword by changing the values in the smaller windows on the right hand side of your screen. In addition, you may adjust the settings for your output by selecting Varianten anzeigen (show variants) or Lemmas anzeigen (show lemmas), which will give you an output showing either lemmas with all their variants or the lemma forms only. Furthermore, you may determine whether your output shows the results for each text individually (Einzeltexte anzeigen) (show individual texts), or according to text groups (Textgruppen anzeigen), or simply as the sum total for all texts (Summe aller Texte).

Please, keep in mind that the speed of your returns is depending on two things:

You can find links to additional Help Pages for both the Dictionary and the Analyse Text Modules below the project image, as well as two help functions for finding conceptual categories, Browse Categories and Search Word in Category System, as well as a page explaining the grammatical categories that may be searched,Explain grammar tags.

Search Strings

Simple Searches

All queries that submit simple search strings only or combine such simple search strings through Boolean operators, are "simple searches." With a "simple search string" you may, for instance, submit a string of letters of which a lemma or one of its variants may consist. Through Boolean operators you may determine which search strings may be connected with each other or may exclude each other. The search possibilities with such "simple search strings" are explained in detail in the following paragraph.


Boolean Operators

By means of the Boolean operators 'und'(and) (&) and 'oder'(or) (|) you may connect individual search strings with each other. For instance, the combined search string $ich & #a retrieves all references for "ich" at the beginning of a text line; the combined string 1402* | 1403* will give you all words that belong to a subcategory of 1402 (birds) as well as to a subcategory of 1403 (fishes). In an extended search string, you may use as many Boolean operators as you wish in order to combine simple search strings (see above) with each other. Such a valid extended search string would be, for instance, haben | [;:!] | 1402* & #e. Whenever an extended search string contains the Boolean operator & as well as | the logical 'und' (and) takes priority over the logical 'as well as': e. g. 1402 & #e | $ha* is identical to the statement (1402 & #e) | $ha*. If you want to set a different priority, you have to use parentheses in your statement, for instance, 1402 & (#e | $ha*).

Please, be careful when using the Boolean 'and'-operator, since you may not always arrive at meaningful statements. For instance, the extended search string [*] & haben would not be meaningful, since a word cannot be a punctuation mark and a variant of "haben" at the same time. The Boolean operator 'und' (and) should generally be used in order to link otherwise unmarked search strings with conceptual categories or to link conceptual categories with text line positions.

Examples for meaningful uses of the Boolean 'und'(and)-operator are: haben & niht, by which you will find all word combinations of "haben" and "nicht", or 21071 & 2322, which searches for intransitive verbs within the conceptual area "Horse and Horsemanship", or <NOM> & <ADJ>, which will retrieve words that may be either nouns or adjectives , or # 2 & # <8 , which searches for all words within the positions 3 to 7 in a text line.

Serial Searches

Serial searches allow you to search for a series of two or more words within the text base. Such serial search statements consist of a series of simple search statements separated by commas, e. g. ich, 21071 | haben, #e. The words you search for must occur in the same text line and in the same sequence as stated in the serial search string. However, you may also allow for intervals of one or more words between your words within the search series by using wild cards, as for instance, in the query ich, *, 21071 | haben, *, *, #e. This query searches for the word 'ich' or any of its variants followed by any word, followed by a word within the conceptual catogory 21071 (intransitive verb) or the word 'haben' or any of its variants, followed by any two words, followed by the last word in a text line.

It is also possible to operate with an upper limit for the number of words in the interval between two words in a serial search. For instance, the query ich, {1}, 21071 | haben, *, *, {3}, #e will find all occurrences of the word 'ich' or any of its variants followed by a maximum of one word, followed by a word within the conceptual catogory 21071 (intransitive verb) or the word 'haben' or any of its variants, followed by at least two but no more than five words, followed by the last word in a text line.

Naturally, the maximum number of words within a serial search is limited by the context you have set for the search. For instance, if your context is set to "word" and the size of your context to 4, your serial search may not have more than two words between the first word in your query statement and the last word within a text line.

Context Queries

Context queries are the most common type of queries. They may include any number of simple searches or 'und/oder' (and/or) queries that must be separated by a plus sign. For instance, the statement ich, habe + <NAM> combines a serial search statement with a simple search statement.

A context query finds any context that combines words determined by the query statements within a given context frame. For example, when the context unit is set to "Zeilen" (lines) and the context scope to   "3", the query string alphart + dietrich will find all context areas, where these two names (or their variants) appear with no more than two lines in between.

However, you have to consider one special condition, when submitting a context search: If your context search contains simple search strings, it will retrieve only one of these criteria per word for each context frame. For instance, the query $a* + $*e searches for at least one word that begins with the character "a" and one that ends with the character "e" within the context frame. If a word happens to begin with the character "a" and to end with an "e", the query will search at least for one more word that ends with an "e". In other words, the word that fulfills both search criteria will be found only for the search for beginning $a*, not for the search for ending with $*e.

The limitation that simple search criteria may not overlap each other does not apply to serial searches. For example, when you submit the query $a*, $*e , you are searching for a word that begins with "a" followed by a word ending in "e". Since there might be too many overlappings, you must not combine serial searches with each other by using a + sign.

 The Representation of Query Results

Your query results are furnished either in form of a table (simple searches) or in form of a simple line (serial searches and context searches). In both cases, the figures in the second column and further columns to the right represent the sum of occurrences for each individual text or text group. The column on the far right contains the sum total for all texts or text groups. If a text belongs to more than one text group, there will be two columns for the same text, but the sum total will contain the frequency for this text only once. The columns for text groups and those for the sum total are highlighted in different colors.

In simple searches frequencies of occurrence are given for each variant separately. The lemma line, which is highlighted, contains the sum total for the entire lemma/variant group. If you click on the lemma, you will get to the entry for the entire lemma/variant group in the Dictionary. There you will find all variants, not only the ones that occur in the given text selection, along with all possible meanings of the lemma.

When clicking on frequency numbers, you will only arrive at text references, if the frequency number appears in blue. Any frequency number higher than 1000 will appear in black, which means you may not get directly at those text references, since the output would simply be too large. In other words, for those occurrences you will have to limit your search by setting the maximum of references to less than 1000. Depending on the number of references you retrieve you may either see a compacted list, containing one line for each reference or you may directly get full references within a larger context. In all text references the words fulfilling the search criteria are set off by a different color from the context. When your results appear in form of a compacted list you may get to the individual full text references by clicking on the blue text line numbers. You also have the option of selecting your own sublist by checking the little boxes next to the line numbers and then clicking the button "Auswahl anzeigen" (Show selection) at the bottom of the list.
 

Examples

Below you will find a number of query examples that were submitted for test purposes and which yield real results. In parentheses you find the short codes for the texts, which had been selected for the queries. In some cases we also give you the context parameters that had been set for the searches. We recommend that you actually submit these examples in order to get a feel for how to work with the information system.

Dictionary

You may work with the dictionary in the same way as with the Analyse Text module, except for all queries that deal with context or text idiosyncrasies. The dictionary provides you with lemmas and all their possible variants that have been accounted for within the text base up to the current state of the lemmatisation process. The frequencies behind each variant reflect this current state of lemmatisation, except for homographs that may not have been disambiguated. In addition, the dictionary provides you with all possible meanings, for each lemma selected. A "meaning" may consist of one or several conceptual categories (e.g. vuoz = 1. 2103 = Körper und Gliedmaßen (body and bodily parts); 2. 312412 = Formen (forms) / 315 = Raum (space); 3. 3134 = Maße und Gewichte (measures and weights); 4. 2512 = Literatur (literature)). You may enter into the search window for the dictionary character strings containing the wild cards * or ? as well as lemmas preceded by @ or simply words preceded by $. You will always retrieve a lemma or a selection of lemmas that meet the given search criteria. From the group of selected lemmas you may click on any given lemma to arrive at its full entry. In addition, you may also submit categories (e.g. 231125 = Ehe/Famile/Namen(marriage/family/names), and retrieve all lemmas (proper names) that have been included in the dictionary according to the current state of lemmatisation. Attention, the list of all proper names exceeds by far the maximum table space! Therefore, you may want to call off the names in groups according to the letter of the alphabet with which they begin. Thus, for instance, the entry 231125 & a* retrieves all names that begin with the letter "a".



Comments and questions:
Horst Pütz - puetz@germsem.uni-kiel.de or
Klaus M. Schmidt - schmidt@bgnet.bgsu.edu

Back to Text Analysis

Back to the help contents