M. Buliga version: 21.06.2019
chemlambda and hapax
This page explains the following:
The mol notation
Is a very compact notation for the kind of graphs used in all these graph-rewriting formalisms. In order to understand the notation, we need first to define these graphs.
Graphs
The most usual way to define a graph is the following:
- we have a set "N" of nodes (non empty)
- and a set "E" of edges.
- to any edge "e" from "E" we associate two nodes, say "a" and "b". These nodes are the extremities of the edge.
The valence of a node is the number of times the node appears at an extremity of an edge. There is no particular order of the edges which have this node at one of the extremities. We think that the node connects with an edge via a node port. So, in this definition there is no particular order of the node ports of a node.
Also, remark that in this definition the extremities of an edge are given in no particular order. We say this is a graph with un-oriented or undirected edges.
These graphs don't have enough structure for our needs. That is why we shall add some structure:
- we ask that each node of the graph has a type. We first define a family of node types, think about a node type like a prototype of a node.
In chemlambda the node type is defined by one of the names "A", "L", "FI", "FO", "FOE", "Arrow", "FRIN", "FROUT", "T".
In interaction combinators the node type is "γ", "δ", "ε".
- Each node type defines the valence of a node (i.e. the number of ports) and an order of the ports of the node. Therefore, compared to the general case of a graph, we do have now a particular order of the ports of a node
The mol notation of a graph
The mol notation for such a graph is a list of nodes of the graph, where for each node we have a line (item) of the list. Each line represents a node, in the following format: ( "LS" means line separator and "FS" means "field separator")
[type of node] FS [list of ports]
The [type of node] is, in chemlambda "A", "L", "FI", "FO", "FOE", "Arrow", "FRIN", "FROUT", "T", in interaction combinators is "γ", "δ", "ε".
The [list of ports] is an array (a list) with FS as field separator, where the element "i" of the list is the name of the edge connected at the edge port "i".
Let's revisit the definitions we have until now, by adding to them a bit of details, in order to pass from the mathematical notation to a notation we can use with computers.
A graph is:
- a finite set of nodes, enumerated by numbering them in an arbitrary order, from 1 to Nodes_number
- a finite set of edges tags (i.e. names). It is understood that each edge is referred by it's tag, so any two different edges have different tags.
- each node has a node type and connects to the edges via node ports.
The mol notation for such a graph is an array of node lines, where each line is in the mol format.
For example, in chemlambda we use the LS as "newline" and FS as " " and the mol notation
A a b c
L c b a
describes a graph which has 2 nodes and 3 edges. In JSON notation we would write:
graphMol = [["A", "a", "b", "c"],
["L", "c", "b", "a"]
]
In javascript we could define this as
graphMol = [{type:"A", ports:["a", "b", "c"]},
{type:"L", ports:["c", "b", "a"]}
]
This describes a graph with 2 nodes and 3 edges, where:
- node 1 is of type "A". The port 1 is connected to the edge "a", port 2 is connected to the edge "b", port "3" is connected to the edge "c"
- node 2 is of type "L". The port 1 is connected to the edge "c", port 2 is connected to the edge "b", port "3" is connected to the edge "a"
In the class of graphs that we use, this graph is different from the graph with the mol notation
A b a c
L c b a
because even if this second graph has also 2 nodes, of types "A" and "L", also 3 edges, named "a", "b", "c", there is a difference between the way edges are connected to ports. While for the first graph:
- edge "a" has the extremities: node 1, connected at port 1, and node 2, connected at port 3
- edge "b" has the extremities: node 1, connected at port 2, and node 2, connected at port 2
- edge "c" has the extremities: node 1, connected at port 3, and node 2, connected at port 1
for the second graph we have:
- edge "a" has the extremities: node 1, connected at port 2, and node 2, connected at port 3
- edge "b" has the extremities: node 1, connected at port 1, and node 2, connected at port 2
- edge "c" has the extremities: node 1, connected at port 3, and node 2, connected at port 1
therefore even if we rename the edges and even if we renumber the nodes, we can't transform a graph into the other.
Similarly, in interaction combinators the following mol notation
δ b a c
δ a b c
describes a graph with 2 nodes and 3 edges, etc.
As you see, the mol notation of such a graph does introduce a supplimentary structure, namely the ordering of the nodes of the graph. For the purposes of graph rewriting, this ordering is irrelevant. Otherwise we would not have a graph-rewriting formalism, we would have a term-rewriting formalism.
The mol notation abstractly
If we start from the mol notation of a graph, how can we know that there is a graph which has this as a mol notation?
There are two conditions:
- the [list of ports] should have length compatible with the [type of node]
- each edge name should appear exactly twice in the mol notation. Because each edge has two extremities.
However, we may work with mol files, which are mol notations for graph which may have free edges. In a mol file an edge name appears at most twice. Why? because this way a sub-array of a mol notation array is a mol file. This is important because when we shall discuss about graph rewrites, we shall need to define patterns, which are sets of nodes in a graph with certain properties. We shall denote patterns as mol files.
Also, we shall accept graphs which have some edge extremities "free", that is dangling without being connected to a node port. The mol notation for such a graph has the property that each edge name appears at most twice. All in all a mol file describes the graphs we need, as well as their subgraphs.
Oriented graphs, the first difference between chemlambda and interaction combinators
In chemlambda (and generally in hapax) we use graphs as previously, along with a supplimentary structure: the edges are oriented, that is
- each edge has a source [node port] and a target [node port]
- which implies that each node port has a type itself:
- "in" if the node port is a target of an edge
- "out" if the node port is a source of an edge
The "in" and "out" are with respect to the node: a target of an edge is an "in" of a node...
This supplimentary structure, which is not present in usual interaction combinators, forces us to modify the type of a node, namely, a node type specifies, as previously, the name of the type ("A", "L", "FI", etc.), the number of ports, but also the type "in" or "out" of each port.
For example, the nodes of type "A" and "FI" have 3 ports, with the types ["in", "in", "out"]. The nodes "L", "FO", FOE" have 3 ports, with the types ["in", "out", "out"]. In the hapax repository this is specified in the hapax-nodes.js.
In general, in hapax we shall strive to impose for 3-valent node types, the following: always the 1st port has type "in" and the 3rd port has type "out", while the middle port has either "in" or "out" type.
The second modification we have, due to the oriented edges, is that an admissible mol file for a graph with oriented edges has the supplimentary property:
- each edge name appears at most twice, and if it appears twice then it appears once in a node port of type "in" and once in a node port of type "out"
That is why, for example, the list
δ b a c
γ a b c
is a valid mol file of an interaction combinator graph, but the list
A b a c
L a b c
is not a valid mol file for chemlambda, because the edge "c" appears twice in a port of type "out" and the edge "a" appears twice in a port of type "in".
In chemlambda (in general in hapax) there is an unique way to add nodes to an oriented graph which has some edges with free extremities. We simply define two new node types:
- "FRIN" (i.e. "free in") which has only 1 port, of type "out"
- "FROUT" (i.e. "free out") which has only 1 port, of type "in"
Then, if in a mol file the edge "e" appears only once, in a node port of type "in", it means that the edge has a free source, therefore we add a node
FRIN e
which becomes the source of the edge "e".
In the same way, if the edge "e" appears only once, in a node port of type "out", it means that the edge has a free target, therefore we add a node
FROUT e
which becomes the target of the edge "e".
(We could do the same in the un-oriented case, by using only one new node type, say "FREE" with only one port.)
Rewrites in chemlambda vs. rewrites in hapax
The rewrites of chemlambda are explained in [1], [2]. With the mol notiation we can equally describe the rewrites from interaction combinators, or those from other small graph rewrite systems.
First we explain the way we do rewrites in chemlambda v2 and interaction combinators. Then we explain how we do the chemlambda rewrites in hapax, because we do them in a different way.
Rewrites in chemlambda v2 and in interaction combinators, in mol notation
With the mol notation, each rewrite consists in the following:
- the rewrite has a LEFT pattern and a RIGHT pattern, both are mol files
- the rewrite replaces the LEFT pattern with the RIGHT pattern
Here is a chemlambda rewrite ("A-L" or "beta" rewrite):
LEFT = [["L", "a", "b", "c"], ["A", "c", "d", "e"]] becomes RIGHT = [["Arrow", "a", "e"], ["Arrow", "d", "b"]]
And here is an interaction combinators rewrite, where we add the node types "Arrow", as in chemlambda (and we'll have to add the COMB rewrites as well, but this is for later):
LEFT = [["γ", "a", "b", "c"], ["γ", "d", "e", "c"]] becomes RIGHT = [["Arrow", "a", "e"], ["Arrow", "b", "d"]]
In chemlambda and in interaction combinators the LEFT and RIGHT patterns, as mol files, have certain properties:
- in the LEFT and RIGHT mol files the edges names which appear only once are the same. That is because in graph terms the LEFT and RIGHT patterns represents sub-graphs and the edges which appear only once in the patterns are those which connect the patterns with the rest of the graph.
- the patterns are simple, made of two nodes and there is an active edge, in the examples given is the edge "c", which connects them. This simplifies very much the search for LEFT patterns in the graph.
There are some problems with this kind of rewrites:
- they are not conservative, in the sense that some nodes and edges are replaced with others, in a way which does not conserve the number of (typed) nodes, nor the edges (number or name)
- in the case of rewrites which increase the number of edges and nodes, we need a mechanism of invention of new edge names, so that there is no clash with the edge names already present elsewhere in the graph (or mol file)
These problems are significant, especially if we want to use small graph rewrite systems as artificial chemistries, after all, even in an open system, a chemical reaction (which implements a graph rewrite) is conservative.
Chemlambda rewrites in hapax
In hapax we see a rewrite as a pair of permutations of the edges sources and targets. This way, any rewrite is conservative. Let's exemplify with the "A-L" rewrite from chemlambda.
In hapax the LEFT and RIGHT patterns are mol files of connected graphs, with a specified edge name. See
hapax-mol.js for a list of paterns used in chemlambda, as done in hapax.
Any rewrite needs a pattern and a "token", and outputs a pattern. The tokens are small graphs (patterns). In hapax these are specified in hapax-chem.js. For example the "A-L" rewrite, as done in hapax, is:
{kind:"A-L", needs:"Arrow-Arrow", gives:["Arrow", "Arrow", "L-A"], pisource:[4,0,3,2,1], pitarget:[3,4,0,2,1]}
which means:
LEFT pattern "A-L" + token "Arrow-Arrow" becomes Pattern "Arrow" + Pattern "Arrow" + token "L-A"
by using the following permutations of sources and targets: pisource:[4,0,3,2,1], pitarget:[3,4,0,2,1]
Let's explain the same in mol notation. Instead of the LEFT pattern from chemlambda v2, we have a pattern
{named:"A-L",
edge:"a",
mol:[["L", "c", "b", "a"],
["A", "a", "d", "e"]]
}
and a token
{named:"Arrow-Arrow",
edge:"a",
mol:[["Arrow", "b", "a"],
["Arrow", "a", "b"]]
}
which corresponds (i.e. recognizes the pattern in the graph) to the mol file:
L c b a
A a d e
Arrow b' a'
Arrow a' b'
Now, according to the type "in" or "out" of the ports, we see that we have 5 sources, colored red, and five targets, colored yellow:
L c b a
A a d e
Arrow b' a'
Arrow a' b'
This is transformed into another pattern, by permuting the sources and the targets. We get:
L b' a b'
A a' a a'
Arrow c e
Arrow d b
which is, as the rewrite claims, recognized as two patterns "Arrow"
{named:"Arrow",
edge:"a",
mol:[["Arrow", "a", "b"]]
}
and a token "L-A"
{named:"L-A",
edge:"a",
mol:[["A", "c", "a", "c"],
["L", "b", "a", "b"]]
}
The permutation of sources is described by the array pisources and the permutation of targets is described by the array pitarget. In order to understand how this is done, first recall that (in javascript in particular) arrays elements are numbered starting with 0, therefore the array pisource = [4,0,3,2,1], for example, represents a permutation in the sense that the element 0 goes to 4, 1 goes to 0, 2 goes to 3, etc.
Then, in order to apply the permutations pisource and pitarget correctly, we need to be sure in what order the pattern recognition function sees the sources and targets. The problem is that we could easily pick an order by hand of the sources and targets, but how does the program picks them? We need to know this only once, when we define the rewrites. For this we apply the pattern recognition program to the patterns themselves. (At some point in the future this has to be automatized). The page [9] tells us exactly this, namely that for the pattern recognition program the order of the sources for this rewrite is:
a, b, e, a', b'
and the order of the targets for this rewrite is:
a, c, d, a', b'
You can check by hand that pisource and pitarget permutation do indeed what they are expected to. For this see that, for example a which is source 0, appears in position 1 in pisource, therefore it will go in the place of source 1, which is b, and so on.
The hapax way of doing rewrites is more general and more flexible than the usual way. It can be applied to interaction combinators easily (one needs a way to add the sources-targets supplimentary information, in a way which respects the original formalism), but more interestingly, it can be applied to a variety of other graph rewrite systems which appear in science or mathematics. Even more, it can be used for random systems of graph rewrites, which are in vast numbers. This justifies the "hapax" name, which means "only once". See first descriptions of this idea in [10], [11].
References:
[1] chemlambda moves
[2] chemlambda v2 page
[3] Y. Lafont, Interaction Combinators
[4] hapax repository
[5] small graph rewrite systems
[6] Molecular computers
[7] hapax-mol.js
[8] hapax-chem.js
[9] hapax-explore.html
[10] 14400 alternatives to the beta rewrite
[11] What is the purpose of the project hapax?