**Important**: If you are reading this in GitHub the results are not shown. Please view it using this link: Tutorial: Greek Syntax Queries using PROIEL and Jupyter Notebooks.
This tutorial is based on an earlier tutorial that uses the Lowfat trees:
There is more than one treebank for the Greek New Testament, and each treebank has its own model and its own analysis, so it is useful to be able to compare multiple treebanks. It is also quite likely that one treebank may be better for analyzing a particular construction, and another treebank may be better for analyzing a different construction. The tutorial you are looking at illustrates some of the kinds of queries that can be done using the PROIEL treebanks for the Greek New Testament and Jupyter notebooks. It uses the greeksyntax
package, written to simplify the task of writing queries for this environment. It is aimed at someone who knows Greek fairly well but may not have experience with query languages or programming.
To allow the same kinds of queries used for Lowfat trees, PROIEL data has been converted to Lowfat format. The original PROIEL data can be found here, the Lowfat representation can be found here, the query used to convert PROIEL to Lowfat format can be found here. To understand the attributes in this treebank, see this document:
This tutorial does not cover installation. It assumes that you have installed BaseX and that the current PROIEL Lowfat Syntax Trees are installed in a database called "nestle1904lowfat". It also assumes that you are running a Jupyter notebook from the labnotes
subdirectory in the greek-new-testament repo from biblicalhumanities.org. See this post for details on installation: Exploring Greek Syntax with Jupyter Notebooks.
Jupyter notebooks allow headings, text, and query results to appear together. This document is a Jupyter notebook. If you have properly installed the software, you can run the queries in this notebook and see the results, or modify the queries to see different results.
The following code imports the functions we need and opens the database:
from greeksyntax.lowfat import *
q = lowfat("proiel-lowfat")
Let's make sure that we have successfully opened the database using a simple query. Since the PROIEL database does not contain book elements, let's count words.
q.xquery("count(//w)")
If the query works, you are up and running. Let's get on with the tutorial.
You should be aware that there are limits on the amount of data Jupyter allows a query to return. Queries can return large results, even entire books, but there are limits. If your query returns too much data, you will see the following error:
# This query attempts to return every word in the Greek New Testament. Jupyter returns an error.
q.xquery("//w")
The solution is to write a more specific query. You will see how to do that in the following sections.
Let's start by looking up specific texts. The following query returns the sentences in Matthew 5. If you hover a mouse over a word in the results, it displays morphological information about the word.
q.find(milestone("Matt.5"))
You can use the interlinear()
function to display morphological data for a given verse. (Because we do not have contextualized English glosses for the PROIEL dataset, no English translation is displayed.)
q.interlinear(milestone("Matt.5.6"))
The above query uses the milestone()
function, which generates a query that looks for sentences corresponding to a particular reference. You can execute it by itself to see the query it generates.
milestone("Matt.5.1")
If you use the milestone()
function inside of q.find()
, it finds the sentences specified by this query:
q.find(milestone("Matt.5.1"))
Milestones have the following structure:
Matt
- an entire bookMatt.5
- a chapterMatt.5.6
- a verseMatt.5.6!1
- a word - not implemented for PROIEL.If you specify a large result like a chapter, it will be displayed in a scrollable window.
For PROIEL, word-level milestones have not been implemented, but htey can be emulated by specifying the number of a descendant of a verse:
q.find(milestone("Matt.5.6") + "/descendant::w[1]")
Many queries are based on the characteristics of individual words. The Lowfat treebanks contain morphological data for each word. You can use the glosses() method to see this information in a user-friendly manner. (For the Nestle1904, they also contain contextualized glosses taken from the Berean Interlinear Bible, but we do not have glosses for the PROIEL treebanks.)
q.glosses(milestone("Matt.5.6"))
Many queries are based on the characteristics of individual words. Let's look at the structure of a word in our representation. First, let's look up an individual word the way we did previously:
q.find(milestone("Matt.5.6") + "/descendant::w[1]")
In this tutorial, most results are presented as readable text, but words have a rich structure that contains a great deal of information. Let's use the xquery()
function to see the raw structure of that same word:
q.xquery(milestone("Matt.5.6") + "/descendant::w[1]")
If you like color, you can use the pretty()
function to make that a little more readable:
pretty(q.xquery(milestone("Matt.5.6") + "/descendant::w[1]"))
We can use this information to look for specific characteristics of words. Let's take a look at the individual parts of this:
<w>
- Each word is wrapped in a w
element. You can count the words in the Greek New Testament with this query: count(//w)
.class="adjective"
- this word is a verb. You can count the verbs in the Greek New Testament with this query: count(//w[@class='verb'])
, which counts the w
elements that have class
attributes with the value verb
.role
- the grammatical role of the word within its clause, in this case the role is xobj
a defined in the PROIEL documentation. You can count individual words that occur in this role using this query: count(//w[@role='xobj'])
.n
- an integer that can be used to sort words into sentence order.lemma
- the dictionary form of the word. You can look up other instances of this word with this query: //w[@lemma='μακάριος']
.number
, gender
, case
, etc - morphology of the word. You can look up other adjectives that are plural, masculine, and nominative using this query: //w[@class='adj' and @number='plural' and @gender='masculine' and @case='nominative']
.You can play with the queries shown above by creating new cells with the + button in the menu bar and putting your conditions in a string like this:
query = "//w[@role='xobj' and @number='plural' and @gender='masculine' and @case='nominative']"
We can search for all instances by calling q.find()
like this:
q.find(query)
To search for instances in a given scope, we can use the milestone()
function to specify the scope like this:
q.find(milestone("Matt.5") + query)
Let's look for instances of this in Matthew 5.
q.highlight(milestone("Matt.5") + query)
The highlight()
function gives more useful output for queries like this, showing the result highlighted in context of the original sentence. Let's use highlight()
instead of find()
, using the same query.
q.highlight(milestone("Matt.5") + query)
A similar function, sentence()
, shows the matching item after the sentence. This can be useful for posting to some online forums that strip formatting.
q.sentence(milestone("Matt.5") + query)
We can search for results in a set of scopes by specifying each one in the same cell. Let's look for instances of our query in Luke 1 and Acts 1:
q.highlight(milestone("Luke.1") + query)
q.highlight(milestone("Acts.1") + query)
Syntax is largely about exploring relationships within a clause. The @role
attribute identifies these relationships. Clauses can contain other clauses and phrases in complex recursive structures.
Groups of words are found in <wg>
elements ("word group"). A clause is identified by the attribute class='cl'
. Like words, word groups can have role
attributes that identify their role in a clause.
The syntactic relationships for PROIEL are significantly different from those for Lowfat (see PROIEL Guidelines for Annotation for details), so let's take a look at a verse and think about what queries might be interesting.
pretty(q.xquery(milestone("Matt.5.6")))
Let's look at the different roles used in this sentence and see where other word groups with some of these roles occur in Matthew 5. Here are the roles we see above:
print(q.xquery(milestone("Matt.5.6")+"//wg/@role ! string(.)"))
q.highlight(milestone("Matt.5") + "//wg[@role='sub']")
q.highlight(milestone("Matt.5") + "//wg[@role='obj']")
q.highlight(milestone("Matt.5") + "//wg[@role='sub']")
Queries can combine conditions on individual words and conditions on word groups. Let's modify that query to show only clauses that contain participles and function as objects of other clauses. We will use role='v'
rather than class='verb
so that we find only clauses in which the participle governs the clause.
q.highlight(milestone("Acts") + "//wg[@role='obj' and w[@mood='participle']]")
The PROIEL Lowfat treebank does not have class attributes for wordgroups, but you can find particular kinds of phrases based on what a word group contains.
Let's look for prepositional phrases that contain the word πίστις. A prepositional phrase is a word group in which the first word is a preposition:
q.highlight(milestone("Acts") + "//wg[descendant::w[1]/@class='preposition' and .//w[@lemma='πίστις']]")
And let's narrow that to prepostitional phrases where the preposition is ἐν. But let's also broaden the scope, looking for all instances in the Greek New Testament instead of specifying a milestone.
q.highlight("//wg[descendant::w[1][@class='preposition' and @lemma='ἐν'] and .//w[@lemma='πίστις']]")
Now let's narrow these results further, showing only phrases where πίστις occurs in the same word group as ἐν or the word group immediately below it.
q.highlight("//wg[descendant::w[1][@class='preposition' and @lemma='ἐν'] and (w,wg/w)[@lemma='πίστις']]")
The transformation to Lowfat format preserves the dependency relationships from PROIEL in identifiers, so we can use them together with the Lowfat structure. Let's look at the ids in one of the verses we saw in the last query result:
pretty(q.xquery(milestone("1Cor.16.13")))
The head-id
of πίστει is 421479
, which is the same as the n
attribute of the preposition ἐν. We can use that relationship to identify other prepositional phrases containing πίστις.
query = """
for $pp in //wg[descendant::w[1][@class='preposition']]
let $prep := $pp/descendant::w[1]
let $pistis := $pp//w[@lemma='πίστις']
where $prep/@lemma = 'ἐν'
and $pistis/@head-id = $prep/@n
return $pp
"""
q.highlight(query)
This result presumably contains fewer false positives than the previous query.
This is only an introductory tutorial showing a small number of queries. It is meant to whet your appetite, to inspire you to think of queries that will teach you about aspects of biblical Greek you are interested in.
I plan to follow this up with more Jupyter notebooks, illustrating specific questions I would like to explore. I also expect to add more resources to the greeksyntax
package. If you want to follow this work, I encourage you to follow my blog.