This Title All WIREs
How to cite this WIREs title:
WIREs Comp Stat

# Scan statistics on graphs

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Scan statistics are used in spatial statistics and image analysis to detect regions of unusual or anomalous activity. A scan statistic is a maximum (or minimum) of a local statistic—one computed on a local region of the data. This is sometimes called ‘moving window analysis’; in the Engineering literature. The idea is to ‘slide’ a window around the image (or map or whatever spatial structure the data have), compute a statistic within each window, and look for outliers—anomalously high (or low) statistics. We discuss extending this idea to graphs, in which case the local region is defined in terms of the connectivity of the graph—the neighborhoods of vertices. WIREs Comput Stat 2012 doi: 10.1002/wics.1217 This article is categorized under: Data: Types and Structure > Graph and Network Data Data: Types and Structure > Social Networks

Three different definitions of local regions that could be used in computing the locality statistic. In all of these, the region is centered on the central vertex and indicated by the solid gray dots. The top plot uses N1(v;Gs) for s ∈ {t − 2,t − 1,t}. The middle plot uses N1(v;Gt) for all the graphs. The bottom plot uses ∪sN1(v;Gs).

[ Normal View | Magnified View ]

The central vertex v, its neighbors N1(v), and their neighbors N2(v) at the time of the detection. The bottom plot shows the neighborhood for the prior week (dark edges); the light edges in this plot are the other edges between these individuals in this week.

[ Normal View | Magnified View ]

The size and scan statistics for the Enron data. By convention, scan0 refers to using the degree of the vertex rather than the size of an induced neighborhood. On the left is the un‐normalized version, also showing the size of the graph, whereas on the right the statistics have been normalized as discussed in the text (Eq. (5)).

[ Normal View | Magnified View ]

The results from a 100‐sample Monte Carlo experiment in which each run is an experiment as depicted in Figure 2, with varying q as indicted by the horizontal axis. The vertical axis corresponds to the difference between the scan statistic at time t = 50, when the anomaly occurs, and the maximum scan statistic over all other times.

[ Normal View | Magnified View ]

Scan statistics for a series of random graphs. These are independent, and for time t≠1 these are independent Erdös–Renyí with n = 100 vertices and P = 0.05. At time t = 50, a small group (m = 10) of vertices have the probability of within‐group edges increased to q = 0.85.

[ Normal View | Magnified View ]

### Browse by Topic

Data: Types and Structure > Social Networks
Data: Types and Structure > Graph and Network Data