We now concentrate on the proof of Buchi-Bruyer Theorem, probably the most important theorem concerning automatic structures, which says that the set of regular relations coincides with the set of relations definable in M
univ (defined below). By the way, this theorem is folklore in the sense that its proof is unpublished, but well-known to everybody in the community.
We first need some definitions. Suppose that
s := (s
1, ...,s
n) ∈ (Σ*)
n. Then, define a string [
s] over the alphabet (Σ∪ {#})
n whose length is max{s
1,...,s
n}, and whose ith symbol is (a
1,...,a
n) where a
j is the ith symbol of s
j, if i ≤ |s
j|, and a
j is #, otherwise. One might visualize [
s] as the string obtained by placing s
1,...,s
n in a left-aligned column and pad each string s
i with # so that each of the resulting rows is of equal length. After that, we consider this matrix as a string [
s] whose jth position is the jth column of the string. A subset S of (Σ*)
n is said to be
regular if the set { [
s] :
s ∈ S } is regular.
Fix an alphabet of size at least two. Consider the infinite structure M
univ := (Σ*, ≤, (L
a)
a ∈ Σ, el) where
- the universe is the set of all Σ-strings,
- x ≤ y iff x is a prefix of y,
- La(x) is true iff the rightmost symbol of x is a, and
- el(x,y) is true iff |x| = |y| (|x| denotes the length of x).
What properties are (first-order) definable in M
univ? Here are some simple ones:
- |x| ≤ |y| (i.e. the string x is no longer than the string y),
- |x| = |y| + k for some fixed constant k,
- im-pref(x,y) (i.e. x = y.a for some letter a ∈ Σ), and
- the kth symbol of x is a (where k is fixed and a ∈ Σ).
Apologize for the overloading of the symbol '≤' because of the lack of HTML symbols. For example, the first property above can be expressed as
∃ s( s ≤ y ∧ el(x,s) ).
The second property is also easily expressible, but you might need more quantifiers and make use of the relation im-pref. The third property can be expressed by saying that |x| ≤ |y| and there is no z with x < z < y (here '<' is the irreflexive version of the prefix relation ≤).
Now, a subset S of (Σ*)
n is said to be definable in M
univ if there exists a first-order formula φ(x
1,...,x
n) in the vocabulary of M
univ such that
S = { s : Muniv |= φ(s) }.
Theorem (Buchi-Bruyer): A subset S of (Σ*)
n is definable in M
univ iff S is regular.
Proof sketch:
(<=) Suppose that S is recognized by the automaton A = (Q,q
0,F,δ: Q x Σ -> Q), where Q = {q
0,...,q
l} is a finite set of states, q
0 ∈ Q is the initial state, F ⊆ Q is the set of final states, and δ the transition function. So, for all
s ∈ (Σ*)
n,
s ∈ S iff the string s
1...s
k = [
s] is accepted by A,i.e., there exists a run p
0...p
k such that p
0 = q
0, p
k ∈ F, and δ(p
i,s
i+1) = p
i+1. The defining formula for S is
φ(x1,...,xn) = ∃v0,...,vl( ψlen ∧ ψchar ∧ ψstart ∧ ψend ∧ ψtrans )
where:
- ψlen asserts that |vi| = max{|xj| : 1 ≤ j ≤ n} + 1.
- ψchar asserts that [(v1,...,vl)] = w0...wk is a characteristic sequence, i.e., each vi is a string of 0s and 1s, and that, for each position h, exactly one of the strings vis have value 1. [Intuitively, we want each wj to represent the state pj in our accepting run.]
- ψstart asserts that the first position of v0 has value 1.
- ψend asserts that the last position of some vj, where qj ∈ F, has value 1.
- ψtrans asserts that the transition from wj to wj+1 respects δ. [This will be a huge table, quite tedious to write down.]
The reader should convince herself that all of the above sentences are definable in M
univ.
(=>) The proof is by induction on the formula φ(x
1,...,x
n) defining S. The base case (i.e. atomic formulas) is very easy (left to the reader). Further, the case where φ is of the form ψ
1 ∨ ψ
2 or ¬ ψ follows immediately from the properties that regular languages are closed under union and complementation. What remains is to prove this for the case where φ is of the form ∃x
n+1 ψ(x
1,...,x
n+1). Suppose that the n+1-ary regular relation R defined by ψ is recognized by the automaton A = (Q,q
0,F,δ). To construct an automaton A' for S, one first applies the pumping lemma to show that: there exists a number K such that if (s
1,...,s
n+1) ∈ R, where |s
n+1| > |s
j| for 1 ≤ j ≤ n, then there exists another string s'
n+1 such that (s
1,...,s'
n+1) ∈ R and |s'
n+1| ≤ max{|s
j| : 1 ≤ j ≤ n} + K. We construct A' as follows. First, make K+1 isomorphic copies of A (with different labels), where the ith copy is denoted by A
i = (Q
i = {q
0i,...,q
li},q
0i,F
i,δ
i). The states of A' are the union of the Q
is. The start states consist of all the q
0is. The final states are the union of the F
is. The transition function δ' works as follows: whenever δ
i(q
ji,(c
1,...,c
n+1)) = q
hi, where c
m ∈ (Σ &cup {#}), we put δ'(q
ji,(c
1,...,c
n) = q
hi. Note that A' is non-deterministic. It is easy to check that A' recognizes S. (QED)