Tuesday, May 17, 2005

Unary-conjunctive-view Query Language (1)

Last December I completed my BSc(Honours) degree, the Australian equivalent of American BSc degree, at the University of Melbourne (Australia). So, before I am leaving for North America for graduate studies, I thought it might be a good idea to write something about the finite-model-theory related problem I worked on for my undergraduate thesis. It might also be a good topic to get this blog started.

The story is somewhat lengthy. So, I will divide it into a series of bite-size chunks.

To get under way, recall that pure first-order logic is undecidable. [When we say that a logic is (un)decidable, we mean that its satisfiability problem is (un)decidable.] Here pure first-order logic basically consists of all first-order sentences without equality and function symbols. Turing proved this by reducing the halting problem for Turing machines to deciding the satisfiability of first-order logic. We know on the other hand that pure monadic first-order logic (MFO) is decidable, actually complete for NEXPTIME. Here the term 'monadic' simply means that the relation symbols in each formula in MFO are of arity 1. This decidability result actually lies at the border of the decidable and the undecidable because permitting merely one binary relation symbol to MFO immediately results in undecidability.

So, the question boils down to how much more can we extend MFO before we get an undecidable logic? Strictly speaking, this is not really a mathematical question, and so has no single correct answer. I'll mention three known successful attempts to extend the decidability of MFO:

  • MFO with equality. Lowenheim, who proved the decidability of MFO, proved this too.
  • MFO with any number of function symbols. This is Lob-Gurevich theorem.
  • MFO with equality and one function symbol. This is proved by Rabin in one of his groundbreaking papers.

All of these results are optimum. For example, adding one more function symbol to the third logic in this list results in undecidability. For more information on these logics, the reader is referred to the reference book for the classical decision problem. We will however explore a different avenue to answer this question.

Database theorists often talk about conjunctive queries: pure first-order formulas that can be written as

F(x1,,xn):=(y1,,yk)(R1(u1)Rm(um))

where the x's are the free variables, the y's are the bound variables, and the u's are tuples of variables appropriate for the (arities of the) R's. Following notations from logic programming, we will write the formula F as

F(x1,,xn)R1(u1),,Rm(um)

The set of all these F's forms the conjunctive query language. Now, it is easy to see that any conjunctive query is satisfiable --- so its satisfiability problem is trivial. Here is the twist. We can combine MFO and a restricted version of conjunctive query language to make a decidable logic.

Let U be the set of all conjunctive queries with exactly one free variable (a.k.a. unary conjunctive queries). We call U a view vocabulary (signature). Let UCV be the smallest set that satisfies the following conditions:

  • U is a subset of UCV.
  • UCV is closed under boolean operations and first-order quantifications.


The Unary-Conjunctive-View query language is the logic UCV. To get a feel for UCV, let us start speaking in it. For example, this logic can assert the following graph-theoretic sentence "any vertex has a 4-walk but no self-loop". How? Consider the following two conjunctive queries:

  • CW4(x0)E(x0,x1),E(x1,x2),E(x2,x3),E(x3,x4)
  • Loop(x)E(x,x)

Then, the desired sentence is $AA x ( text{CW_4}(x) ^^ not text{Loop}(x) )$. We can readily see that this logic is quite expressive as it permits any number of relation symbols of any arity. This also shows that UCV is strictly more expressive than MFO and the conjunctive query language, considered individually. Nonetheless, the logic is much less expressive than the first-order logic. For example, UCV cannot even assert simple query like "there exists a path (ie, walks where none of the vertices are visited twice) of length k", for any fixed k > 0. [We'll prove this later.]

In the next posting, I will sketch a proof that UCV is decidable, actually solvable in 2-NEXPTIME but is NEXPTIME-hard. Our proof taps into the fact that UCV has the bounded model property, i.e., every satisfiable query Q in UCV has a model of size at most f(|Q|) for some computable function f:N->N where |Q| denotes the size of the encoding of Q. Note that bounded model property immediately implies decidability in SPACE(f(|Q|)), assuming that f is a proper complexity function, as a Turing machine can start by computing f(|Q|), enumerate every possible structure A appropriate for Q one-by-one, and for each A test whether A satisfies Q (which can be also be done in SPACE(f(|Q|)) ).

Later on we will explore the expressive power of UCV and a general method for proving inexpressibility in UCV.

1 comment:

SEO tools said...

Thanks for sharing your info. I really appreciate your efforts and I will be waiting for your further write ups thanks once again.