An example of fact extraction as dialogue

This example attempts to construct the Julio-Claudian family tree.

I derive arguments from historical source text. I quote extracts from the text and use them as evidence to support claims. I present all this in a 'workbook' style. This needs to be human readable, but have sufficient structure for it to be possible (ideally simple) to extract an AIF argument map. The idea is that the process of developing knowledge (by elaborating the arguments) is iterative: the 'workbook' can be updated/extended an re-evaluated repeatedly. This document is an example of such a workbook. Sections with coloured backgrounds represent dialogue, with different colours representing different agents (human or machine). The rest is narrative to expound the process.

My aim is to construct very simple arguments as might reasonably be produced by natural language processing. These will be about entities and relationships. They are justified by the source text. They may or may not be correct. They can be rebutted or confirmed (or questioned) by other agents 'reading' the same text.

Suet. Aug. 4:

After quitting Macedonia, before he could declare himself a candidate for the consulship, he died suddenly, leaving behind him a daughter, the elder Octavia, by Ancharia; and another daughter, Octavia the younger, as well as Augustus, by Atia, who was the daughter of Marcus Atius Balbus, and Julia, sister to Caius Julius Caesar.

An entity and relationship extraction process might give:

... and perhaps also resolve pronouns:

The above translates into this argument map:

argument map for Suet. Aug. 4

At this stage, I'm identifying entities by the names expressed in the text. This is fine in the context of this single source reference, but is likely to be a problem when I try to relate entities across a number of source references.

If I later run into problems resolving the above entities, I can come back and add more claims. For example, after I've processed the next source reference below, it becomes apparent that there are at least three different Julias; so I might come back and add:

"Julia" refers to Julia Minor.

... and later still, I'll find that I want to 'tidy up' the Octavia references:

"Octavia the younger" refers Octavia the Younger.

"the elder Octavia" refers to Octavia the Elder.

These claims let me rewrite the earlier entity relationship arguments. The result is:

argument map for Suet. Aug. 4, with rewrite

The argument maps so far have no conflicts. I can assume the claims they make are facts and try to build some data structure, such as a family tree.

I can take the claims of familial relationships, rewritten or not, and make a graph of people linked parent to child. The results can be analysed to check the facts. For example, there's something wrong if the graph is not a tree, or if a child node has more than two parents. So far, the family tree has the same structure whether claims are rewritten or not. The result (rewritten) is:

family tree from Suet. Aug. 4

Next ...

Suet. Aug. 62:

He had three grandsons by Agrippa and Julia, namely, Caius, Lucius, and Agrippa; and two granddaughters, Julia and Agrippina.

Here we have parents naming their children after themselves, so ...

"Agrippa" is ambigous, referring to Marcus Agrippa as parent and Agrippa Postumus as child.

"Julia" is ambigous, referring to Julia the Elder as parent and Julia the Younger as child.

Now it does make a significant difference whether I construct a family tree from raw or rewritten claims. In the former case I'll get cycles: Julia and Agrippa each link to themselves, and the Julia in this quote is treated as the same Julia as above (Suet. Aug. 4), which creates a cycle in the graph. Rewriting sorts this out, and gives:

family tree, extended

The nodes labelled with question marks come from the grandparent relationship and indicate an unknown intermediate between grandparent and grandchild. It's clear from the graph that Augustus is either the father of Marcus Agrippa or Julia the elder, but I haven't established which yet. Once I have, these nodes become redundant and can be deleted.

I can ask a question ...

Is Augustus the father of Marcus Agrippa, or of Julia the Elder?

This becomes part of the dialogue, but doesn't necessarily become part of the argument. It can if you want it to though. For example, you can argue that you're not prepared to believe any of the claims derived from the source quote until all questions about it have been answered.

Next ...

Suet. Aug. 61:

By Scribonia he had a daughter named Julia, but no children by Livia, although extremely desirous of issue.

"Julia" refers to Julia the Elder.

Adding these relationships to the family tree gives:

family tree, child of Augustus resolved

... and I'm now able to answer the question above:

Augustus is the father of Julia the Elder.

Suet. Cal. 7:

Germanicus married Agrippina, the daughter of Marcus Agrippa and Julia, by whom he had nine children, two of whom died in their infancy, and another a few years after; ...

"Julia" refers to Julia the Elder.

The rest survived their father; three daughters, Agrippina, Drusilla, and -Livilla, who were born in three successive years; and as many sons, Nero, Drusus, and Caius Caesar.

I may not get this from NLP, but it's clears from that context that I can also claim:

And I spot the ambiguity immediately ...

"Agrippina" is ambigous, referring to Agrippina the Elder as parent and Agrippina the Younger as child.

At this point, I might go back to the claims above (Suet. Aug. 62) and add "Agrippina refers to Aggripina the Elder". If I don't (and I won't for time being), the family tree looks like this:

family tree, two Agrippina's

... and it might prompt me to ask the question:

Is Aggripina (Suet. Aug. 62) the same person as Agrippina the Elder (Suet. Cal. 7)?

Later, I'll find that "Drusus" is ambiguous, and I'll come back here and add:

"Drusus" refers to Drusus Caesar.

So far, a few ambigous names aside, the claims made by information extraction agents are all acceptable. Next, let's consider what happens when an agents make a mistakes ...

Tacitus, Annals 1.33:

Meantime Germanicus, while, as I have related, he was collecting the taxes of Gaul, received news of the death of Augustus. He was married to the granddaughter of Augustus, Agrippina, by whom he had several children, and though he was himself the son of Drusus, brother of Tiberius, and grandson of Augusta, he was troubled by the secret hatred of his uncle and grandmother, the motives for which were the more venomous because unjust.
... he was himself the son of Drusus, brother of Tiberius, and grandson of Augusta, ...
The phrase is ambiguous:

This says that some of the agent claims made above contradict each other. These contradictions get added to the argument map:

argument map for Tacitus, Annals, 1.33

I may (perhaps later) be in a position to resolve the ambiguity. If so, there are various ways I might express my choice. One option is to simply "cross out" the wrong answers (as shown here). This doesn't take the false claim out of the argument map, it just adds a "this is wrong" claim that contradicts it:

argument map for Tacitus, Annals, 1.33, with corrections

And so on ...

The content of this workbook is structured to make it easy to manipulate. Each main source quote, and its accompanying claims, is contained in a HTML article element. I can evaluate each article independently, or evaluate sets of articles together. Evaluations generate conflicts and questions that lead to additions or amendments, and that focus and direct further analysis. When I have 'agreement', I can treat the acceptable claims as facts that can be added to a knowledge base.

The story so far is:

Julio-Claudian family tree, the story so far

The unknown parent of Drusus and Tiberius looks like a good next target ...

Rather than making this worked example more complicated, I'll continue the effort elsewhere: See my Julio-Claudian Family Tree page for further details.