Site icon Premium Researchers

Quality-Aware Subgraph Matching Over Inconsistent Probabilistic Graph Databases

Do You Have New or Fresh Topic? Send Us Your Topic

Quality-Aware Subgraph Matching Over Inconsistent Probabilistic Graph Databases

Abstract

Quality-Aware Subgraph Matching Over Inconsistent Probabilistic Graph Databases management report in data mining. Resource Description Framework (RDF) has been widely used in the Semantic Web to describe resources and their relationships.

The RDF graph is one of the most commonly used representations for RDF data. However, in many real applications such as the data extraction/integration,

RDF graphs integrated from different data sources may often contain uncertain and inconsistent information (e.g., uncertain labels or that violate facts/rules), due to the unreliability of data sources.

In this paper, we formalize the RDF data by inconsistent probabilistic RDF graphs, which contain both inconsistencies and uncertainty. With such a probabilistic graph model, we focus on an important problem, quality-aware subgraph matching over inconsistent probabilistic RDF graphs (QA-gMatch),

which retrieves subgraphs from inconsistent probabilistic RDF graphs that are isomorphic to a given query graph and with high quality scores (considering both consistency and uncertainty).

In order to efficiently answer QA-gMatch queries, we provide two effective pruning methods, namely adaptive label pruning and quality score pruning, which can greatly filter out false alarms of subgraphs.

We also design an effective index to facilitate our proposed pruning methods, and propose an efficient approach for processing QA-gMatch queries. Finally, we demonstrate the efficiency and effectiveness of our proposed approaches through extensive experiments.

INTRODUCTION

Graphs have been used to model various data in a wide range of applications, such as bioinformatics, social network analysis, and RDF data management. Furthermore, in these real applications, due to noisy measurements, inference models, ambiguities of data integration, and privacy-preserving mechanisms, uncertainties are often introduced in the graph data.

For example, in protein interaction (PPI) network, the pairwise interaction is derived from statistical models, and the STRING database is such a public data source that contains PPIs with uncertain edges provided by statistical predications.

In a social network, probabilities can be assigned to edges to model the degree of influence or trust between two social entities. In a RDF graph, uncertainties/ inconsistencies are introduced in data integration where various data sources are integrated into RDF graphs.

To model the uncertain graph data, a probabilistic graph model is introduced. In this model, each edge is associated with an edge existence probability to quantify the likelihood that this edge exists in the graph, and edge probabilities are independent of each other.

However, the proposed probabilistic graph model is invalid in many real scenarios. For example, for uncertain protein-protein interaction (PPI) networks, authors in first establish elementary interactions with probabilities between proteins, then use machine learning tools to predict other possible interactions based on the elementary links.

The predictive results show that interactions are correlated, especially with high dependence of interactions at the same proteins. Given another example, in communication networks or road networks, an edge probability is used to quantify the reliability of link or the degree of traffic jam.

RELATED WORK

Inconsistent databases: An inconsistent database contains those data that violate some integrity constraints (e.g., key constraints, functional dependencies, etc.), rules, or facts. Previous works often considered inconsistencies in relational databases or probabilistic databases where tuples are associated with probabilities.

In contrast, our QA-gMatch problem involves inconsistent vertex labels in probabilistic graphs (rather than tuples). Thus, previous techniques cannot be directly used in our problem. To resolve inconsistencies, there are 3 repair models: X-repair  that allows tuple deletions only, S-repair that performs both tuple insertions and deletions, and Unrepair that considers tuple value modifications.

Our QA-gMatch repair model is different, in that we delete graph edges (rather than tuples in relational tables).Different from the repair that changes data in databases, previous works also studied the consistent query answering (CQA) over inconsistent data, which does not update the database, but returns the aggregated query answers over (minimal or all) repaired databases.

The investigated query types include relational operations (e.g., selection, projection, and join) and spatial operations (e.g., range query, spatial join, and top-k) . Specific pruning methods are proposed for different CQA query types to reduce the search space.

In contrast, our QA-gMatch problem considers a different query type (i.e., subgraph matching) and different data model (i.e., graph data rather than relational data), which thus cannot borrow existing techniques for querying tuples or spatial objects.

RDF graph databases: RDF data can have different formats, such as triple store, column store, property tables, or graphs. In literature, Tran et al.  studied the keyword search query over certain RDF graph, which retrieves subgraphs that contain keywords with high ranking scores.

In contrast, we consider a different subgraph matching query (instead of keyword search) over a probabilistic graph model (rather than a certain one). Different from certain general graphs,

inconsistent probabilistic RDF graph in our QA-gMatch problem needs to consider inconsistent/probabilistic features, and has much more possible labels (to encode) or incurs high degrees in vertices, which are thus more challenging to tackle.

Moreover, there are some existing works that model probabilistic RDF data. However, they either focused on data modeling for probabilistic RDF data, or considered query types over consistent graphs, other than the quality-aware subgraph matching query over inconsistent probabilistic graphs. Yuan et al.

considered probabilistic consistent graphs with vertex/edge uncertainties, and studied the subgraph similarity search that obtains matching subgraphs with a given query graph with high probabilities. Moustafa et al. proposed a model for probabilistic entity graphs (PEGs), which incorporates identity, node attribute, and edge existence uncertainties.

This model also assumes that possible worlds of PEGs are consistent, and the subgraph pattern matching is conducted over such consistent PEGs to find the matching subgraphs with high confidence.

In contrast, our QA-gMatch problem models the graph by an inconsistent probabilistic graph (with vertex label uncertainties and edge repair confidences), which allows inconsistent labels in possible worlds that violate rules/facts (instead of consistent labels).

Moreover, when we answer QA-gMatch queries, we need to consider resolving inconsistencies, and retrieve subgraphs with high quality scores via repairs (rather than graph existence probabilities). Thus, QA-gMatch differs from prior works on consistent probabilistic graphs, in terms of data models and query types.

Probabilistic databases: A probabilistic database consists of x-tuples, and each x-tuple contains one or multiple mutually exclusive alternatives, associated with existence probabilities. It can represent an exponential number of possible worlds, where each possible world is a materialized instance of the database that can appear in the real world.

As a consequence, the query answering over probabilistic databases is equivalent to issuing queries over all possible worlds, and aggregating the returned query answers, which is quite inefficient.

Many existing works study various queries such as top-k queries to improve the query efficiency by avoiding enumerating possible worlds. In contrast, our QA-gMatch query is conducted on a

probabilistic RDF graph (rather than probabilistic relational tables), and thus prior techniques in probabilistic databases cannot be directly applied. This inspires us to design specific pruning techniques for inconsistent probabilistic RDF graphs.

 

Do You Have New or Fresh Topic? Send Us Your Topic 

Quality-Aware Subgraph Matching Over Inconsistent Probabilistic Graph Databases

Not What You Were Looking For? Send Us Your Topic

INSTRUCTIONS AFTER PAYMENT

After making payment, kindly send the following:

» Send the above details to our email; contact@premiumresearchers.com or to our support phone number; (+234) 0813 2546 417 . As soon as details are sent and payment is confirmed, your project will be delivered to you within minutes.