This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.111

A survey of Web crawlers for information retrieval

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Performance of any search engine relies heavily on its Web crawler. Web crawlers are the programs that get webpages from the Web by following hyperlinks. These webpages are indexed by a search engine and can be retrieved by a user query. In the area of Web crawling, we still lack an exhaustive study that covers all crawling techniques. This study follows the guidelines of systematic literature review and applies it to the field of Web crawling. We used the standard procedure of carrying out a systematic literature review on 248 studies from a total of 1488 articles published in 12 leading journals and other premier conferences and workshops. Existing literature about the Web crawler is classified into different key subareas. Each subarea is further divided according to the techniques being used. We analyzed the distribution of various articles using multiple criteria and depicted conclusions. Various studies that use open source Web crawlers are also reported. We have highlighted future areas of research. We call for an increased awareness in various fields of the Web crawler and identify how techniques from other domains can be used for crawling the Web. Limitations and recommendations for future are also discussed. WIREs Data Mining Knowl Discov 2017, 7:e1218. doi: 10.1002/widm.1218

Architecture of a Web crawler.
[ Normal View | Magnified View ]
Implementation language used by various studies.
[ Normal View | Magnified View ]
Different techniques used by various categories of focused crawler.
[ Normal View | Magnified View ]
A taxonomy of Web crawler.
[ Normal View | Magnified View ]

Related Articles

A survey of fuzzy web mining
Data mining‐based tag recommendation system: an overview

Browse by Topic

Fundamental Concepts of Data and Knowledge > Information Repositories
Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining
Algorithmic Development > Web Mining

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts