Tutorial: Greek Syntax Queries using PROIEL and Jupyter Notebooks

**Important**: If you are reading this in GitHub the results are not shown. Please view it using this link: Tutorial: Greek Syntax Queries using PROIEL and Jupyter Notebooks.

This tutorial is based on an earlier tutorial that uses the Lowfat trees:

There is more than one treebank for the Greek New Testament, and each treebank has its own model and its own analysis, so it is useful to be able to compare multiple treebanks. It is also quite likely that one treebank may be better for analyzing a particular construction, and another treebank may be better for analyzing a different construction. The tutorial you are looking at illustrates some of the kinds of queries that can be done using the PROIEL treebanks for the Greek New Testament and Jupyter notebooks. It uses the greeksyntax package, written to simplify the task of writing queries for this environment. It is aimed at someone who knows Greek fairly well but may not have experience with query languages or programming.

To allow the same kinds of queries used for Lowfat trees, PROIEL data has been converted to Lowfat format. The original PROIEL data can be found here, the Lowfat representation can be found here, the query used to convert PROIEL to Lowfat format can be found here. To understand the attributes in this treebank, see this document:

This tutorial does not cover installation. It assumes that you have installed BaseX and that the current PROIEL Lowfat Syntax Trees are installed in a database called "nestle1904lowfat". It also assumes that you are running a Jupyter notebook from the labnotes subdirectory in the greek-new-testament repo from biblicalhumanities.org. See this post for details on installation: Exploring Greek Syntax with Jupyter Notebooks.

Jupyter notebooks allow headings, text, and query results to appear together. This document is a Jupyter notebook. If you have properly installed the software, you can run the queries in this notebook and see the results, or modify the queries to see different results.

Opening the Database

The following code imports the functions we need and opens the database:

In [1]:
from greeksyntax.lowfat import *

q = lowfat("proiel-lowfat")
Database 'proiel-lowfat' was opened in 2.24 ms.

Let's make sure that we have successfully opened the database using a simple query. Since the PROIEL database does not contain book elements, let's count words.

In [2]:
q.xquery("count(//w)")
Out[2]:
'140757'

If the query works, you are up and running. Let's get on with the tutorial.

Don't Try to Return the Whole Database

You should be aware that there are limits on the amount of data Jupyter allows a query to return. Queries can return large results, even entire books, but there are limits. If your query returns too much data, you will see the following error:

In [3]:
# This query attempts to return every word in the Greek New Testament.  Jupyter returns an error.
q.xquery("//w")
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

The solution is to write a more specific query. You will see how to do that in the following sections.

Book, Chapter, Verse, Word

Let's start by looking up specific texts. The following query returns the sentences in Matthew 5. If you hover a mouse over a word in the results, it displays morphological information about the word.

In [4]:
q.find(milestone("Matt.5"))

Ἰδὼν δὲ τοὺς ὄχλους ἀνέβη εἰς τὸ ὄρος·

καὶ καθίσαντος αὐτοῦ προσῆλθαν αὐτῷ οἱ μαθηταὶ αὐτοῦ·

καὶ ἀνοίξας τὸ στόμα αὐτοῦ ἐδίδασκεν αὐτοὺς λέγων,

μακάριοι οἱ πτωχοὶ τῷ πνεύματι, ὅτι αὐτῶν ἐστιν βασιλεία τῶν οὐρανῶν.

μακάριοι οἱ πραεῖς, ὅτι αὐτοὶ κληρονομήσουσιν τήν γῆν.

μακάριοι οἱ πενθοῦντες, ὅτι αὐτοὶ παρακληθήσονται.

μακάριοι οἱ πεινῶντες καὶ διψῶντες τὴν δικαιοσύνην, ὅτι αὐτοὶ χορτασθήσονται.

μακάριοι οἱ ἐλεήμονες, ὅτι αὐτοὶ ἐλεηθήσονται.

μακάριοι οἱ καθαροὶ τῇ καρδίᾳ, ὅτι αὐτοὶ τὸν θεὸν ὄψονται.

μακάριοι οἱ εἰρηνοποιοί, ὅτι υἱοὶ θεοῦ κληθήσονται.

μακάριοι οἱ δεδιωγμένοι ἕνεκεν δικαιοσύνης, ὅτι αὐτῶν ἐστιν βασιλεία τῶν οὐρανῶν.

μακάριοί ἐστε ὅταν ὀνειδίσωσιν ὑμᾶς καὶ διώξωσιν καὶ εἴπωσιν πᾶν πονηρὸν καθ’ ὑμῶν ψευδόμενοι ἕνεκεν ἐμοῦ·

χαίρετε καὶ ἀγαλλιᾶσθε, ὅτι μισθὸς ὑμῶν πολὺς ἐν τοῖς οὐρανοῖς·

οὕτως γὰρ ἐδίωξαν τοὺς προφήτας τοὺς πρὸ ὑμῶν.

Ὑμεῖς ἐστε τὸ ἅλας τῆς γῆς·

ἐὰν δὲ τὸ ἅλας μωρανθῇ, ἐν τίνι ἁλισθήσεται;

εἰς οὐδὲν ἰσχύει ἔτι εἰ μὴ βληθὲν ἔξω καταπατεῖσθαι ὑπὸ τῶν ἀνθρώπων.

Ὑμεῖς ἐστε τὸ φῶς τοῦ κόσμου.

οὐ δύναται πόλις κρυβῆναι ἐπάνω ὄρους κειμένη·

οὐδὲ καίουσιν λύχνον καὶ τιθέασιν αὐτὸν ὑπὸ τὸν μόδιον ἀλλ’ ἐπὶ τὴν λυχνίαν, καὶ λάμπει πᾶσιν τοῖς ἐν τῇ οἰκίᾳ.

οὕτως λαμψάτω τὸ φῶς ὑμῶν ἔμπροσθεν τῶν ἀνθρώπων, ὅπως ἴδωσιν ὑμῶν τὰ καλὰ ἔργα καὶ δοξάσωσιν τὸν πατέρα ὑμῶν τὸν ἐν τοῖς οὐρανοῖς.

Μὴ νομίσητε ὅτι ἦλθον καταλῦσαι τὸν νόμον τοὺς προφήτας·

οὐκ ἦλθον καταλῦσαι ἀλλὰ πληρῶσαι.

ἀμὴν γὰρ λέγω ὑμῖν,

ἕως ἂν παρέλθῃ οὐρανὸς καὶ γῆ, ἰῶτα ἓν μία κεραία οὐ μὴ παρέλθῃ ἀπὸ τοῦ νόμου ἕως ἂν πάντα γένηται.

ὃς ἐὰν οὖν λύσῃ μίαν τῶν ἐντολῶν τούτων τῶν ἐλαχίστων καὶ διδάξῃ οὕτως τοὺς ἀνθρώπους, ἐλάχιστος κληθήσεται ἐν τῇ βασιλείᾳ τῶν οὐρανῶν·

ὃς δ’ ἂν ποιήσῃ καὶ διδάξῃ, οὗτος μέγας κληθήσεται ἐν τῇ βασιλείᾳ τῶν οὐρανῶν.

λέγω γὰρ ὑμῖν ὅτι ἐὰν μὴ περισσεύσῃ ὑμῶν δικαιοσύνη πλεῖον τῶν γραμματέων καὶ Φαρισαίων, οὐ μὴ εἰσέλθητε εἰς τὴν βασιλείαν τῶν οὐρανῶν.

Ἠκούσατε ὅτι ἐρρέθη τοῖς ἀρχαίοις,

οὐ φονεύσεις·

ὃς δ’ ἂν φονεύσῃ, ἔνοχος ἔσται τῇ κρίσει.

ἐγὼ δὲ λέγω ὑμῖν ὅτι πᾶς ὀργιζόμενος τῷ ἀδελφῷ αὐτοῦ ἔνοχος ἔσται τῇ κρίσει·

ὃς δ’ ἂν εἴπῃ τῷ ἀδελφῷ αὐτοῦ, ῥακά, ἔνοχος ἔσται τῷ συνεδρίῳ·

ὃς δ’ ἂν εἴπῃ, μωρέ, ἔνοχος ἔσται εἰς τὴν γέενναν τοῦ πυρός.

ἐὰν οὖν προσφέρῃς τὸ δῶρόν σου ἐπὶ τὸ θυσιαστήριον καὶ ἐκεῖ μνησθῇς ὅτι ἀδελφός σου ἔχει τι κατὰ σοῦ, ἄφες ἐκεῖ τὸ δῶρόν σου ἔμπροσθεν τοῦ θυσιαστηρίου, καὶ ὕπαγε πρῶτον διαλλάγηθι τῷ ἀδελφῷ σου, καὶ τότε ἐλθὼν πρόσφερε τὸ δῶρόν σου.

ἴσθι εὐνοῶν τῷ ἀντιδίκῳ σου ταχὺ ἕως ὅτου εἶ μετ’ αὐτοῦ ἐν τῇ ὁδῷ, μήποτέ σε παραδῷ ἀντίδικος τῷ κριτῇ, καὶ κριτὴς τῷ ὑπηρέτῃ, καὶ εἰς φυλακὴν βληθήσῃ·

ἀμὴν λέγω σοι,

οὐ μὴ ἐξέλθῃς ἐκεῖθεν ἕως ἂν ἀποδῷς τὸν ἔσχατον κοδράντην.

Ἠκούσατε ὅτι ἐρρέθη,

οὐ μοιχεύσεις.

ἐγὼ δὲ λέγω ὑμῖν ὅτι πᾶς βλέπων γυναῖκα πρὸς τὸ ἐπιθυμῆσαι ἤδη ἐμοίχευσεν αὐτὴν ἐν τῇ καρδίᾳ αὐτοῦ.

εἰ δὲ ὀφθαλμός σου δεξιὸς σκανδαλίζει σε, ἔξελε αὐτὸν καὶ βάλε ἀπὸ σοῦ·

συμφέρει γάρ σοι ἵνα ἀπόληται ἓν τῶν μελῶν σου καὶ μὴ ὅλον τὸ σῶμά σου βληθῇ εἰς γέενναν.

καὶ εἰ δεξιά σου χεὶρ σκανδαλίζει σε, ἔκκοψον αὐτὴν καὶ βάλε ἀπὸ σοῦ·

συμφέρει γάρ σοι ἵνα ἀπόληται ἓν τῶν μελῶν σου καὶ μὴ ὅλον τὸ σῶμά σου εἰς γέενναν ἀπέλθῃ.

Ἐρρέθη δέ,

ὃς ἂν ἀπολύσῃ τὴν γυναῖκα αὐτοῦ, δότω αὐτῇ ἀποστάσιον.

ἐγὼ δὲ λέγω ὑμῖν ὅτι πᾶς ἀπολύων τὴν γυναῖκα αὐτοῦ παρεκτὸς λόγου πορνείας ποιεῖ αὐτὴν μοιχευθῆναι, καὶ ὃς ἐὰν ἀπολελυμένην γαμήσῃ μοιχᾶται.

Πάλιν ἠκούσατε ὅτι ἐρρέθη τοῖς ἀρχαίοις,

οὐκ ἐπιορκήσεις,

ἀποδώσεις δὲ τῷ κυρίῳ τοὺς ὅρκους σου.

ἐγὼ δὲ λέγω ὑμῖν μὴ ὀμόσαι ὅλως·

μήτε ἐν τῷ οὐρανῷ, ὅτι θρόνος ἐστὶν τοῦ θεοῦ·

μήτε ἐν τῇ γῇ, ὅτι ὑποπόδιόν ἐστιν τῶν ποδῶν αὐτοῦ·

μήτε εἰς Ἱεροσόλυμα, ὅτι πόλις ἐστὶν τοῦ μεγάλου βασιλέως·

μήτε ἐν τῇ κεφαλῇ σου ὀμόσῃς, ὅτι οὐ δύνασαι μίαν τρίχα λευκὴν ποιῆσαι μέλαιναν.

ἔστω δὲ λόγος ὑμῶν ναὶ ναί, οὒ οὔ·

τὸ δὲ περισσὸν τούτων ἐκ τοῦ πονηροῦ ἐστιν.

Ἠκούσατε ὅτι ἐρρέθη,

ὀφθαλμὸν ἀντὶ ὀφθαλμοῦ καὶ ὀδόντα ἀντὶ ὀδόντος.

ἐγὼ δὲ λέγω ὑμῖν μὴ ἀντιστῆναι τῷ πονηρῷ·

ἀλλ’ ὅστις σε ῥαπίζει εἰς τὴν δεξιὰν σιαγόνα, στρέψον αὐτῷ καὶ τὴν ἄλλην·

καὶ τῷ θέλοντί σοι κριθῆναι καὶ τὸν χιτῶνά σου λαβεῖν, ἄφες αὐτῷ καὶ τὸ ἱμάτιον·

καὶ ὅστις σε ἀγγαρεύσει μίλιον ἕν, ὕπαγε μετ’ αὐτοῦ δύο.

τῷ αἰτοῦντί σε δός, καὶ τὸν θέλοντα ἀπὸ σοῦ δανίσασθαι μὴ ἀποστραφῇς.

Ἠκούσατε ὅτι ἐρρέθη,

ἀγαπήσεις τὸν πλησίον σου καὶ μισήσεις τὸν ἐχθρόν σου.

ἐγὼ δὲ λέγω ὑμῖν,

ἀγαπᾶτε τοὺς ἐχθροὺς ὑμῶν καὶ προσεύχεσθε ὑπὲρ τῶν διωκόντων ὑμᾶς, ὅπως γένησθε υἱοὶ τοῦ πατρὸς ὑμῶν τοῦ ἐν οὐρανοῖς, ὅτι τὸν ἥλιον αὐτοῦ ἀνατέλλει ἐπὶ πονηροὺς καὶ ἀγαθοὺς καὶ βρέχει ἐπὶ δικαίους καὶ ἀδίκους.

ἐὰν γὰρ ἀγαπήσητε τοὺς ἀγαπῶντας ὑμᾶς, τίνα μισθὸν ἔχετε;

οὐχὶ καὶ οἱ τελῶναι τὸ αὐτὸ ποιοῦσιν;

καὶ ἐὰν ἀσπάσησθε τοὺς ἀδελφοὺς ὑμῶν μόνον, τί περισσὸν ποιεῖτε;

οὐχὶ καὶ οἱ ἐθνικοὶ τὸ αὐτὸ ποιοῦσιν;

ἔσεσθε οὖν ὑμεῖς τέλειοι ὡς πατὴρ ὑμῶν οὐράνιος τέλειός ἐστιν.

You can use the interlinear() function to display morphological data for a given verse. (Because we do not have contextualized English glosses for the PROIEL dataset, no English translation is displayed.)

In [5]:
q.interlinear(milestone("Matt.5.6"))
μακάριοι adjective : μακάριος plural masculine nominative
οἱ article : ὁ plural masculine nominative
πεινῶντες verb : πεινάω plural masculine nominative present active participle
καὶ conjunction : καί
διψῶντες verb : διψάω plural masculine nominative present active participle
τὴν article : ὁ singular feminine accusative
δικαιοσύνην noun : δικαιοσύνη singular feminine accusative
ὅτι subjunction : ὅτι
αὐτοὶ pronoun : αὐτός plural masculine nominative
χορτασθήσονται verb : χορτάζω plural future passive indicative
:

The above query uses the milestone() function, which generates a query that looks for sentences corresponding to a particular reference. You can execute it by itself to see the query it generates.

In [6]:
milestone("Matt.5.1")
Out[6]:
"//sentence[milestone[@id='Matt.5.1']]"

If you use the milestone() function inside of q.find(), it finds the sentences specified by this query:

In [7]:
q.find(milestone("Matt.5.1"))

Ἰδὼν δὲ τοὺς ὄχλους ἀνέβη εἰς τὸ ὄρος·

καὶ καθίσαντος αὐτοῦ προσῆλθαν αὐτῷ οἱ μαθηταὶ αὐτοῦ·

Milestones have the following structure:

  • Matt - an entire book
  • Matt.5 - a chapter
  • Matt.5.6 - a verse
  • Matt.5.6!1 - a word - not implemented for PROIEL.

If you specify a large result like a chapter, it will be displayed in a scrollable window.

For PROIEL, word-level milestones have not been implemented, but htey can be emulated by specifying the number of a descendant of a verse:

In [8]:
q.find(milestone("Matt.5.6") + "/descendant::w[1]")

μακάριοι

Words, Lemmas, and Morphology

Many queries are based on the characteristics of individual words. The Lowfat treebanks contain morphological data for each word. You can use the glosses() method to see this information in a user-friendly manner. (For the Nestle1904, they also contain contextualized glosses taken from the Berean Interlinear Bible, but we do not have glosses for the PROIEL treebanks.)

In [9]:
q.glosses(milestone("Matt.5.6"))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-7e7e293e8af4> in <module>()
----> 1 q.glosses(milestone("Matt.5.6"))

AttributeError: 'lowfat' object has no attribute 'glosses'

Many queries are based on the characteristics of individual words. Let's look at the structure of a word in our representation. First, let's look up an individual word the way we did previously:

In [ ]:
q.find(milestone("Matt.5.6") + "/descendant::w[1]")

In this tutorial, most results are presented as readable text, but words have a rich structure that contains a great deal of information. Let's use the xquery() function to see the raw structure of that same word:

In [ ]:
q.xquery(milestone("Matt.5.6") + "/descendant::w[1]")

If you like color, you can use the pretty() function to make that a little more readable:

In [ ]:
pretty(q.xquery(milestone("Matt.5.6") + "/descendant::w[1]"))

We can use this information to look for specific characteristics of words. Let's take a look at the individual parts of this:

  • <w> - Each word is wrapped in a w element. You can count the words in the Greek New Testament with this query: count(//w).
  • class="adjective" - this word is a verb. You can count the verbs in the Greek New Testament with this query: count(//w[@class='verb']), which counts the w elements that have class attributes with the value verb.
  • role - the grammatical role of the word within its clause, in this case the role is xobj a defined in the PROIEL documentation. You can count individual words that occur in this role using this query: count(//w[@role='xobj']).
  • n - an integer that can be used to sort words into sentence order.
  • lemma - the dictionary form of the word. You can look up other instances of this word with this query: //w[@lemma='μακάριος'].
  • number, gender, case, etc - morphology of the word. You can look up other adjectives that are plural, masculine, and nominative using this query: //w[@class='adj' and @number='plural' and @gender='masculine' and @case='nominative'].

You can play with the queries shown above by creating new cells with the + button in the menu bar and putting your conditions in a string like this:

In [ ]:
query = "//w[@role='xobj' and @number='plural' and @gender='masculine' and @case='nominative']"

We can search for all instances by calling q.find() like this:

q.find(query)

To search for instances in a given scope, we can use the milestone() function to specify the scope like this:

q.find(milestone("Matt.5") + query)

Let's look for instances of this in Matthew 5.

In [ ]:
q.highlight(milestone("Matt.5") + query)

The highlight() function gives more useful output for queries like this, showing the result highlighted in context of the original sentence. Let's use highlight() instead of find(), using the same query.

In [ ]:
q.highlight(milestone("Matt.5") + query)

A similar function, sentence(), shows the matching item after the sentence. This can be useful for posting to some online forums that strip formatting.

In [ ]:
q.sentence(milestone("Matt.5") + query)

We can search for results in a set of scopes by specifying each one in the same cell. Let's look for instances of our query in Luke 1 and Acts 1:

In [ ]:
q.highlight(milestone("Luke.1") + query)
q.highlight(milestone("Acts.1") + query)

Syntax

Syntax is largely about exploring relationships within a clause. The @role attribute identifies these relationships. Clauses can contain other clauses and phrases in complex recursive structures.

Groups of words are found in <wg> elements ("word group"). A clause is identified by the attribute class='cl'. Like words, word groups can have role attributes that identify their role in a clause.

The syntactic relationships for PROIEL are significantly different from those for Lowfat (see PROIEL Guidelines for Annotation for details), so let's take a look at a verse and think about what queries might be interesting.

In [ ]:
pretty(q.xquery(milestone("Matt.5.6")))

Let's look at the different roles used in this sentence and see where other word groups with some of these roles occur in Matthew 5. Here are the roles we see above:

In [ ]:
print(q.xquery(milestone("Matt.5.6")+"//wg/@role ! string(.)"))
In [ ]:
q.highlight(milestone("Matt.5") + "//wg[@role='sub']")
In [ ]:
q.highlight(milestone("Matt.5") + "//wg[@role='obj']")
In [ ]:
q.highlight(milestone("Matt.5") + "//wg[@role='sub']")

Queries can combine conditions on individual words and conditions on word groups. Let's modify that query to show only clauses that contain participles and function as objects of other clauses. We will use role='v' rather than class='verb so that we find only clauses in which the participle governs the clause.

In [ ]:
q.highlight(milestone("Acts") + "//wg[@role='obj' and w[@mood='participle']]")

The PROIEL Lowfat treebank does not have class attributes for wordgroups, but you can find particular kinds of phrases based on what a word group contains.

Let's look for prepositional phrases that contain the word πίστις. A prepositional phrase is a word group in which the first word is a preposition:

In [ ]:
q.highlight(milestone("Acts") + "//wg[descendant::w[1]/@class='preposition' and .//w[@lemma='πίστις']]")

And let's narrow that to prepostitional phrases where the preposition is ἐν. But let's also broaden the scope, looking for all instances in the Greek New Testament instead of specifying a milestone.

In [ ]:
q.highlight("//wg[descendant::w[1][@class='preposition' and @lemma='ἐν'] and .//w[@lemma='πίστις']]")

Now let's narrow these results further, showing only phrases where πίστις occurs in the same word group as ἐν or the word group immediately below it.

In [ ]:
q.highlight("//wg[descendant::w[1][@class='preposition' and @lemma='ἐν'] and (w,wg/w)[@lemma='πίστις']]")

Using Dependency Relationships

The transformation to Lowfat format preserves the dependency relationships from PROIEL in identifiers, so we can use them together with the Lowfat structure. Let's look at the ids in one of the verses we saw in the last query result:

In [ ]:
pretty(q.xquery(milestone("1Cor.16.13")))

The head-id of πίστει is 421479, which is the same as the n attribute of the preposition ἐν. We can use that relationship to identify other prepositional phrases containing πίστις.

In [ ]:
query = """
    for $pp in //wg[descendant::w[1][@class='preposition']]
    let $prep := $pp/descendant::w[1]
    let $pistis := $pp//w[@lemma='πίστις']
    where $prep/@lemma = 'ἐν'
      and $pistis/@head-id = $prep/@n
    return $pp
"""

q.highlight(query)

This result presumably contains fewer false positives than the previous query.

Next Steps

This is only an introductory tutorial showing a small number of queries. It is meant to whet your appetite, to inspire you to think of queries that will teach you about aspects of biblical Greek you are interested in.

I plan to follow this up with more Jupyter notebooks, illustrating specific questions I would like to explore. I also expect to add more resources to the greeksyntax package. If you want to follow this work, I encourage you to follow my blog.