Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Comp Stat

Bayesian Vertex Nomination Using Content and Context

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Using attributed graphs to model network data has become an attractive approach for various graph inference tasks. Consider a network containing a small subset of interesting entities whose identities are not fully known and that discovering them will be of some significance. Vertex nomination, a subclass of recommender systems relying on the exploitation of attributed graphs, is a task which seeks to identify the unknown entities that are similarly interesting or exhibit analogous latent attributes. This task is a specific type of community detection and is increasingly becoming a subject of current research in many disciplines. Recent studies have shown that information relevant to this task is contained in both the structure of the network and its attributes, and that jointly exploiting them can provide superior vertex nomination performance than either one used alone. We adopt this new approach to formulate a Bayesian model for the vertex nomination problem. Specifically, the goal here is to construct a ‘nomination list’ where entities that are truly interesting are concentrated at the top of the list. Inference with the model is conducted using a Metropolis‐within‐Gibbs algorithm. Performance of the model is illustrated by a Monte Carlo simulation study and on the well‐known Enron email dataset. WIREs Comput Stat 2015, 7:400–416. doi: 10.1002/wics.1365 This article is categorized under: Statistical and Graphical Methods of Data Analysis > Bayesian Methods and Theory Statistical and Graphical Methods of Data Analysis > Markov Chain Monte Carlo (MCMC)
Illustrative attributed graph with 12 vertices. Here, m′ = 2 vertices are observed to be red, mm′ = 3 are latent red vertices and nm = 7 are latent green vertices. Edges represent communication between connected vertices and edge attributes, denoting content of communication, are assumed to be binary: green or red (1 or 2, respectively). The frequency of communication and distribution of content among red vertices are governed by q = (q0, q1, q2), while p = (p0, p1, p2) quantifies these for the rest of the graph, i.e., among green vertices as well as between a red vertex and a green one. Assuming that all edges and their attributes are observed, these are used together with the observed red vertices to find the remaining red vertices.
[ Normal View | Magnified View ]
Conditional probability of correct nomination given that the marginal posterior probability that the nominated vertex is red exceeds p, with equal‐tail 95% BCA bootstrap confidence intervals. This is obtained from 1000 graphs, each with 1000 MCMC iterations from the two parallel chains after convergence.
[ Normal View | Magnified View ]
Kernel densities fitted to posterior means for p1, p2, and q2, obtained from the 252 combinations in Experiment 1.
[ Normal View | Magnified View ]
Kernal densities fitted to posterior means for p1, p2, q2, and ϕ, obtained from 1000 graph realizations. Red points on the horizontal axis indicate the true parameter values.
[ Normal View | Magnified View ]
Conditional probability of correct nomination given that the marginal posterior probability that the nominated vertex is red exceed p, with equal‐tail 95% BCA bootstrap confidence intervals for 1000 graph realizations.
[ Normal View | Magnified View ]
Marginal prior densities (dashed curves) and posterior densities (solid curves) for p1, p2, q2, and ψ. Red points on the horizontal axis indicate the true parameter values.
[ Normal View | Magnified View ]
Trace plots of the moving average estimates of the marginal posterior probabilities that each of the latent vertices is red. The top‐three ranking vertices (3, 4, and 7) are labeled as shown; the others (5, 6, 12, 9, 11, 10, and 8) are clustered together at the bottom. Recall that the three latent red vertices are 3, 4, and 5, and so we have a correct nomination in this case.
[ Normal View | Magnified View ]

Browse by Topic

Statistical and Graphical Methods of Data Analysis > Markov Chain Monte Carlo (MCMC)
Statistical and Graphical Methods of Data Analysis > Bayesian Methods and Theory

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts