Abstract :
We study logics expressing properties of paths in graphs that are tailored to querying graph databases: a data model for new applications such as social networks, the Semantic Web, biological data, crime detection, and others. The basic construct of such logics, a regular path query, checks for paths whose labels belong to a regular language. These logics fail to capture two commonly needed features: counting properties, and the ability to compare paths. It is known that regular path-comparison relations (e.g., Prefix or equality) can be added without significant complexity overhead, however, adding common relations often demanded by applications (e.g., Sub word, subsequence, suffix) results in either undecidability or astronomical complexity. We propose, as a way around this problem, to use automata with counting functionalities, namely Parikh automata. They express many counting properties directly, and they approximate many relations of interest. We prove that with Parikh automata defining both languages and relations used in queries, we retain the low complexity of the standard path logics for graphs. In particular, this gives us efficient approximations to queries with prohibitively high complexity. We extend the best known decidability results by showing that even more expressive classes of relations are possible in query languages (sometimes with restriction on the shape of formulae). We also show that Parikh automata admit two convenient representations by analogs of regular expressions, making them usable in real-life querying.
Keywords :
automata theory; computational complexity; database management systems; decidability; graph theory; query languages; query processing; Parikh automata; astronomical complexity; biological data; complexity overhead; counting functionalities; counting properties; crime detection; graph database querying; logics expressing properties; path checks; path logics; path-comparison relations; query languages; regular path query; semantic Web; social networks; undecidability; Approximation methods; Automata; Complexity theory; Database languages; Query processing; Radiation detectors; RPQ; graph databases; parikh automata; rational relations;