Brin, S, Page, L. Reprint of: the anatomy of a large‐scale hypertextual Web search engine. Comput Networks 2012, 56:3825–3833. https://doi.org/10.1016/j.comnet.2012.10.007.
Kitchenham, B. Systematic literature reviews in software engineering—a systematic literature review. Inf Softw Technol 2009, 51:7–15. https://doi.org/10.1016/j.infsof.2008.09.009.
Budgen, D, Brereton, P. Performing systematic literature reviews in software engineering. In: ICSE `06 Proceedings of the 28th International Conference on Software Engineering, Shanghai, China, May 20 – 28, 2006, 1051–1052. https://doi.org/10.1145/1134285.1134500.
Brereton, P, Kitchenham, B, Budgen, D, Turner, M, Khalil, M. Lessons from applying the systematic literature review process within the software engineering domain. J Syst Softw 2007, 80:571–583. https://doi.org/10.1016/j.jss.2006.07.009.
Rattan, D, Bhatia, R, Singh, M. Software clone detection: a systematic review. Elsevier; 2013, 1165–1199. https://doi.org/10.1016/j.infsof.2013.01.008.
Chakrabarti, S, Van Den Berg, M, Dom, B. Focused crawling: a new approach to topic‐specific Web resource discovery. Comput Networks 1999, 31:1623–1640. https://doi.org/10.1016/S1389‐1286(99)00052‐3.
Yu, HL, Bingwu, L, Fang, Y. Similarity computation of web pages of focused crawler. In: 2010 International Forum on Information Technology and Applications, 2010, 2, 70–72. https://doi.org/10.1109/IFITA.2010.308.
Gravano, L, Ipeirotis, PG, Sahami, M. QProber: a system for automatic classification of hidden‐web databases. ACM Trans Inf Syst 2003, 21:1–41. https://doi.org/10.1145/635484.635485.
Hammer, J, Fiedler, J. Using mobile crawlers to search the Web efficiently. Int J Comput Inf Sci 2000, 1:36–58.
Badawi, M, Mohamed, A, Hussein, A, Gheith, M. Maintaining the search engine freshness using mobile agent. Egypt Informatics J 2013, 14:27–36. https://doi.org/10.1016/j.eij.2012.11.001.
Koster, M. A Standard for Robot Exclusion. 1994. Available at: http://ftp.nada.kth.se/pub/hacks/src3/linkchecker/norobots‐rfc.html.
Abiteboul, S. Querying semi‐structured data. In: International Conference on Database Theory. Berlin and Heidelberg: Springer; 1997, 1–18.
Olston, C, Najork, M. Web crawling. Found Trends Inf Retr 2010, 4:175–246. https://doi.org/10.1561/1500000017.
Turek, W, Opalinski, A, Kisiel‐Dorohinicki, M. %22Extensible Web crawler—towards multimedia material analysis%22. In: Multimedia Communications Services and Security. Dziech A, Czyżewski A, eds. Berlin and Heidelberg: Springer; 2011, 183–190. https://doi.org/10.1007/978‐3‐642‐21512‐4_22.
Zheng, Q, Wu, Z, Cheng, X, Jiang, L, Liu, J. Learning to crawl deep Web. Inf Syst 2013, 38:801–819. https://doi.org/10.1016/j.is.2013.02.001.
Arasu, A, Cho, J, Garcia‐Molina, H, Paepcke, A, Raghavan, S. Searching the Web. ACM Trans Internet Technol 2001, 1:2–43. https://doi.org/10.1145/383034.383035.
Rungsawang, A, Suebchua, T, Manaskasemsak, B. Thai related foreign language‐specific website segment crawler. In: 2014—The 28th IEEE International Conference on Advanced Information Networking and Applications, 2014, 293–298. https://doi.org/10.1109/WAINA.2014.56.
Abbasi, A, Fu, T, Zeng, D, Adjeroh, D. Crawling credible online medical sentiments for social intelligence. In: 2013 International Conference on IEEE Computer Society, 2013, 254–263. https://doi.org/10.1109/SocialCom.2013.43.
Achsan, HTY, Wibowo, WC. A fast distributed focused‐Web crawling. Procedia Eng 2014, 69:492–499. https://doi.org/10.1016/j.proeng.2014.03.017.
Agarwal, S, Sureka, A. A focused crawler for mining hate and extremism promoting videos on YouTube. In: Proceedings of the 25th ACM Conference on Hypertext and social media—HT ’14, New York, NY, ACM Press, 2014, 294–296. https://doi.org/10.1145/2631775.2631776.
Ahlers, D, Boll, S. Adaptive geospatially focused crawling. In: Proceedings of the 18th ACM Conference on Information Knowledge Management—CIKM ’09, New York, NY, ACM Press, 2009: 445. https://doi.org/10.1145/1645953.1646011.
Ahmadi‐Abkenari, F, Selamat, A. An architecture for a focused trend parallel Web crawler with the application of clickstream analysis. Inf Sci 2012, 184:266–281. https://doi.org/10.1016/j.ins.2011.08.022.
Almpanidis, G, Kotropoulos, C, Pitas, I. Combining text and link analysis for focused crawling—an application for vertical search engines. Inf Syst 2007, 32:886–908. https://doi.org/10.1016/j.is.2006.09.004.
Altingovde, IS, Ulusoy, O. Exploiting interclass rules for focused crawling. IEEE Intell Syst 2004, 19:66–73. https://doi.org/10.1109/MIS.2004.62.
Altingovde, IS, Ozcan, R, Cetintas, S, Yilmaz, H, Ulusoy, Ö. An automatic approach to construct domain‐specific Web portals. In: Proceedings of the Sixth ACM Conference on Information Knowledge Management—CIKM ’07, New York, NY, ACM Press, 2007, 849. https://doi.org/10.1145/1321440.1321558.
Avraam, I, Anagnostopoulos, I. A comparison over focused Web crawling strategies. In: 15th Panhellenic Conference on Informatics, 2011, 245–249. https://doi.org/10.1109/PCI.2011.53.
Babaria, R, Nath, JS, Krishnan, S, Sivaramakrishnan, KR, Bhattacharyya, C, Murty, MN. Focused crawling with scalable ordinal regression solvers. In: 24th International Conference on Machine Learning—ICML ’07, New York, NY, ACM Press, 2007, 57–64. https://doi.org/10.1145/1273496.1273504.
Barbosa, L, Bangalore, S. Focusing on novelty a crawling strategy to build diverse language models. In: Proceedings of the 20th ACM International Conference on Information Knowledge Management—CIKM ’11, New York, NY, ACM Press, 2011, 755. https://doi.org/10.1145/2063576.2063687.
Barros, R, Rodrigues Nt, JA, Xexéo, GB, de Souza, JM. A collaborative approach to build evaluated web page datasets. Futur Gener Comput Syst 2011, 27:119–126. https://doi.org/10.1016/j.future.2010.06.007.
Batzios, A, Dimou, C, Symeonidis, AL, Mitkas, PA. BioCrawler: an intelligent crawler for the semantic Web. Expert Syst Appl 2008, 35:524–530. https://doi.org/10.1016/j.eswa.2007.07.054.
Baykan, E, Henzinger, M, Weber, I. A comprehensive study of techniques for URL‐based web page language classification. ACM Trans Web 2013, 7:1–37. https://doi.org/10.1145/2435215.2435218.
Bedi, P, Thukral, A, Banati, H, Behl, A, Mendiratta, V. A multi‐threaded semantic focused crawler. J Comput Sci Technol 2012, 27:1233–1242. https://doi.org/10.1007/s11390‐012‐1299‐8.
Boanjak, M, Oliveira, E, Martins, J, Mendes Rodrigues, E, Sarmento, L. TwitterEcho—a distributed focused crawler to support open research with twitter data. In: Proceedings of the 21st International Conference Companion on World Wide Web—WWW ’12 Companion. New York, NY, ACM Press, 2012, 1233. https://doi.org/10.1145/2187980.2188266.
Caliskan, K, Ozcan, R. Comparing classification methods for link context based focused crawlers. In: 2013 International Conference on Electronics, Computer and Computation (ICECCO), IEEE, 2013, 143–146. https://doi.org/10.1109/ICECCO.2013.6718249.
Campos, R, Rojas, O, Marin, M, Mendoza, M. Distributed ontology‐driven focused crawling. In: Proceedings of the 2013 21st Euromicro International Conference on Parallel, Distributed and Network‐Based Processing (PDP 2013), IEEE, 2013, 108–115. https://doi.org/10.1109/PDP.2013.23.
Chen, D, Liying, F, Jianzhuo, Y, Shi, B. Semantic focused crawler based on Q‐learning and Bayes classifier. In: 2010 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT 2010), IEEE, 2010, 420–423. https://doi.org/10.1109/ICCSIT.2010.5563878.
Chen, R, Desai, BC. An enhanced Web robot for the CINDI system. In: Proceedings of the C3S2E `08 Canadian Conference on Computer Science %26 Software Engineering, Montreal, QC, Canada, May 12 – 13, 2008. New York, NY, ACM Press, 2008, 133. https://doi.org/10.1145/1370256.1370278.
Chen, X, Zhang, X. HAWK: a focused crawler with content and link analysis. In: 2008 I.E. International Conference on e‐Business Engineering, IEEE, 2008, 677–680. https://doi.org/10.1109/ICEBE.2008.46.
Chen, Z, Ma, J, Han, X, Zhang, D. An effective relevance prediction algorithm based on hierarchical taxonomy for focused crawling. In: Information Retrieval Technology, Berlin and Heidelberg, Springer, 2008, 613–619. https://doi.org/10.1007/978‐3‐540‐68636‐1_72.
Chen, Z, Liu, J, Zhai, H, Jiang, L, Cao, B. Web Page Recognition Algorithm Based on Link Analysis in Theme Search Engine. In: 2012 Second International Conference on Cloud and Green Computing, IEEE, 2012, 405–409. https://doi.org/10.1109/CGC.2012.42.
Cheng, Q, Beizhan, W, Pianpian, W. Efficient focused crawling strategy using combination of link structure and content similarity. In: 2008 I.E. International Symposium on IT Medical Education, 2008, 1045–1048. https://doi.org/10.1109/ITME.2008.4744029.
Wu, C, Hou, W, Shi, Y, Liu, T. A Web search contextual crawler using ontology relation mining. In: 2009 International Conference on Computational Intelligence and Software Engineering, IEEE, 2009, 1–4. https://doi.org/10.1109/CISE.2009.5365842.
Cho, J. Efficient crawling through URL ordering. Comput Networks ISDN Syst 1998, 30:161–172. https://doi.org/10.1016/S0169‐7552(98)00108‐1.
Chy, AN. Bangla news classification using Naive Bayes classifier. In: 16th International Conference on Computing and Information Technology, 2014, 8–10. https://doi.org/10.1109/ICCITechn.2014.6997369.
De Groc, C. Babouk: focused Web crawling for corpus compilation and automatic terminology extraction. In: Proceedings of the 2011 IEEE/WIC/ACM International Conference on Web Intelligence—WI 2011, 2011, 1, 497–498. https://doi.org/10.1109/WI‐IAT.2011.253.
Dong, H, Hussain, FK. Focused crawling for automatic service discovery, annotation, and classification in industrial digital ecosystems. IEEE Trans Ind Electron 2011, 58:2106–2116. https://doi.org/10.1109/tie.2010.2050754.
Dong, H, Hussain, FK. SOF: a semi‐supervised ontology‐learning‐based focused crawler. Concurr Comput Pract Exp 2013, 25:1755–1770. https://doi.org/10.1002/cpe.2980.
Dong, H, Hussain, FK. Self‐adaptive semantic focused crawler for mining services information discovery. IEEE Trans Ind Informatics 2014, 10:1616–1626. https://doi.org/10.1109/TII.2012.2234472.
Dong, H, Hussain, FK, Chang, E. A transport service ontology‐based focused crawler. In: 2008 Fourth International Conference on Semantics, Knowledge and Grid, IEEE, 2008, 49–56. https://doi.org/10.1109/SKG.2008.56.
Dong, H, Hussain, FK, Chang, E. A framework for discovering and classifying ubiquitous services in digital health ecosystems. J Comput Syst Sci 2011, 77:687–704. https://doi.org/10.1016/j.jcss.2010.02.009.
Dong, Q. Search‐engine‐oriented theme crawler design. In: 2010 International Conference on Systems Science, Engineering Design and Manufacturing Informatization, IEEE, 2010, 303–306. https://doi.org/10.1109/ICSEM.2010.169.
Du, Y, Pen, Q, Gao, Z. A topic‐specific crawling strategy based on semantics similarity. Data Knowl Eng 2013, 88:75–93. https://doi.org/10.1016/j.datak.2013.09.003.
Du, Y, Hai, Y, Xie, C, Wang, X. An approach for selecting seed URLs of focused crawler based on user‐interest ontology. Appl Soft Comput J 2014, 14:663–676. https://doi.org/10.1016/j.asoc.2013.09.007.
Fan, H, Zeng, G, Li, X. Crawling strategy of focused crawler based on niche genetic algorithm. In: Proceedings of the 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing, IEEE, 2009, 591–594. https://doi.org/10.1109/DASC.2009.49.
Filipowski, K. Comparison of scheduling algorithms for domain specific Web crawler. In: 2014 Eur. Netw. Intell. Conf., IEEE, 2014, 69–74. https://doi.org/10.1109/ENIC.2014.14.
Fu, T, Abbasi, A, Zeng, D, Chen, H. Sentimental Spidering Leveraging Opinion Information in Focused Crawlers. ACM Trans Inf Syst 2012, 30:1–30. https://doi.org/10.1145/2382438.2382443.
Gao, K, Yonggen, Gu. Analyzing an agent‐based selective information retrieval. In: IEEE International Conference on Services Computing 2004. (SCC 2004). Proceedings 2004, IEEE, 2004, 427–430. https://doi.org/10.1109/SCC.2004.1358035.
Gao, W, Lee, HC, Miao, Y. Geographically focused collaborative crawling. In: Proceedings of the 15th international conference on World Wide Web—WWW ’06, New York, NY, ACM Press, 2006, 287. https://doi.org/10.1145/1135777.1135822.
Gao, Z, Du, Y, Yi, L, Peng, Q, Yang, Y. Incrementally updating concept context graph (CCG) for focused Web crawling based on FCA. In: 2009 Asia‐Pacific Conference on Information Processing, IEEE, 2009, 40–43. https://doi.org/10.1109/APCIP.2009.146.
Ghozia, A, Sorour, H, Aboshosha, A. Improved focused crawling using bayesian object based approach. In: 2008 National Radio Science Conference (NRSC), IEEE, 2008, 1–8. https://doi.org/10.1109/NRSC.2008.4542363.
Gonzlez, I, Marcus, A, Meredith, DN, Nguyen, LA. Effective Web‐scale crawling through website analysis. In: Proceedings of the 15th international conference on World Wide Web—WWW ’06, New York, NY, ACM Press, 2006, 1041. https://doi.org/10.1145/1135777.1136005.
Gouriten, G, Maniu, S, Senellart, P. Scalable, generic, and adaptive systems for focused crawling. In: Proceedings of the 25th ACM conference on Hypertext and Social Media—HT ’14, New York, NY, ACM Press, 2014, 35–45. https://doi.org/10.1145/2631775.2631795.
Guerriero, A, Ragni, F, Martines, C. A dynamic URL assignment method for parallel Web crawler. In: 2010 I.E. International Conference on Computer Intelligent Measurement Systems and Application, IEEE, 2010, 119–123. https://doi.org/10.1109/CIMSA.2010.5611764.
Hao, H, Mu, C, Yin, X, Li, S, Wang, Z. An improved topic relevance algorithm for focused crawling. In: 2011 I.E. International Conference on Systems, Man and Cybernetics—SMC, IEEE, 2011, 850–855. https://doi.org/10.1109/ICSMC.2011.6083759.
Hijazi, HW, Itmazi, JA. Crawler based context aware model for distributed e‐courses through ubiquitous computing at higher education institutes. In: 2013 Fourth International Conference on e‐Learning "Best Practices in Management, Design and Development of e‐Courses: Standards of Excellence and Creativity,” IEEE, 2013, 9–14. https://doi.org/10.1109/ECONF.2013.28.
Liu, H, Milios, E. Probabilistic models for focused Web crawling. Comput Intell 2012, 28:289–328. https://doi.org/10.1111/j.1467‐8640.2012.00411.x.
Hu, K, Wong, WS. A probabilistic model for intelligent Web crawlers. In: Proceedings of the 27th Annu. Int. Comput. Softw. Appl. Conf. COMPAC 2003, IEEE Comput. Soc, 2003, 278–282. https://doi.org/10.1109/CMPSAC.2003.1245354.
Huang, R, Lin, F. Focused crawling with heterogeneous semantic information. In: International Conference of the Web Intell. Intell. Agent Technol. WI‐IAT, 2008, 525–531. https://doi.org/10.1109/WIIAT.2008.87.
Huang, W, Zhang, L, Zhang, J, Zhu, M. Focused crawling for retrieving e‐commerce information based on learnable ontology and link prediction. In: 2009 Int. Symp. Inf. Eng. Electron. Commer., IEEE, 2009, 574–579. https://doi.org/10.1109/IEEC.2009.127.
Huang, X, Zhou, L, Wang, C. Design and implementation of digital products vertical search engine based on android client. 2013 Information Science and Technology International Conference 2013, 828–831. https://doi.org/10.1109/ICIST.2013.6747669.
Huang, Y, Ye, Y. %22wHunter: a focused Web crawler – a tool for digital library%22. In: Chen, Z, Chen, H, Miao, Q, Fu, Y, Fox, E, Lim, E, eds. Lecture Notes in Computer Science (LNCS) 3334. Berlin Heidelberg: Springer; 2004, 519–522. https://doi.org/10.1007/978‐3‐540‐30544‐6_59.
Jamali, M, Sayyadi, H, Hariri, B, Abolhassani, H. A method for focused crawling using combination of link structure and content similarity. In: 2006 IEEE/WIC/ACM International Conf. Web Intell. (WI 2006 Main Conf. Proceedings)(WI’06), IEEE, 2006, 753–756. https://doi.org/10.1109/WI.2006.19.
Jannach, D, Shchekotykhin, K, Friedrich, G. Automated ontology instantiation from tabular Web sources—the allright system⋆. Web semantics: science, services and agents on World Wide Web 2009, 7:136–153. https://doi.org/10.1016/j.websem.2009.04.002.
Jung, JJ. Towards open decision support systems based on semantic focused crawling. Expert Syst Appl 2009, 36:3914–3922. https://doi.org/10.1016/j.eswa.2008.02.057.
Ji, L, Yan, J, Liu, N, Zhang, W, Fan, W, Chen, Z. ExSearch. In: Proceeding 18th ACM Conference on Information Knowledge Management.—CIKM ’09, New York, NY, ACM Press, 2009, 1357. https://doi.org/10.1145/1645953.1646125.
Jiang, J, Song, X, Yu, N, Lin, C. FoCUS: learning to crawl Web forums. IEEE Trans Knowl Data Eng 2013, 25:1293–1306. https://doi.org/10.1109/TKDE.2012.56.
Khalilian, M, Boroujeni, FZ, Mustapha, N. %22Improving performance in constructing specific Web directory using focused crawler: an experiment on botany domain%22. In: Elleithy, K, ed. Advanced Technology in Computer Science and Software Engineering. Dordrecht: Springer Netherlands; 2010, 461–466. https://doi.org/10.1007/978‐90‐481‐3660‐5_79.
Kozanidis, L. %22An ontology‐based focused crawler%22. In: Kapetanios, E, Sugumaran, V, Spiliopoulou, M, eds. Natural Language Processing and Information Systems. Berlin, Heidelberg: Springer; 2008, 376–379. https://doi.org/10.1007/978‐3‐540‐69858‐6_48.
Kumar, M, Vig, R. Learnable focused meta crawling through Web. Procedia Technol 2012, 6:606–611. https://doi.org/10.1016/j.protcy.2012.10.073.
Kumar, M, Vig, R. %22Term‐frequency inverse‐document frequency definition semantic (TIDS) based focused Web crawler%22. In: Krishna, VP, Babu Rajasekhara, M, Ariwa, E, eds. Global Trends in Information Systems and Software Applications, Communications in Computer and Information Science. Berlin and Heidlberg: Springer; 2012, 31–36. https://doi.org/10.1007/978‐3‐642‐29216‐3_5.
Lawless, S, Hederman, L, Wade, V. OCCS: enabling the dynamic discovery, harvesting and delivery of educational content from open corpus sources. In: 2008 Eighth IEEE International Conference of the Adv. Learn. Technol., IEEE, 2008, 676–678. https://doi.org/10.1109/ICALT.2008.28.
Li, J, Furuse, K, Yamaguchi, K. Focused crawling by exploiting anchor text using decision tree. In: Spec. Interes. Tracks Posters 14th International Conference on World Wide Web—WWW ’05, New York, NY, ACM Press, 2005, 1190. https://doi.org/10.1145/1062745.1062933.
Li, X, Xing, M, Zhang, J. A comprehensive prediction method of visit priority for focused crawler. In: 2011 2nd Int. Symp. Intell. Inf. Process. Trust. Comput., IEEE, 2011, 27–30. https://doi.org/10.1109/IPTC.2011.14.
Li, Y, Wang, Y, Du, J. E‐FFC: an enhanced form‐focused crawler for domain‐specific deep Web databases. J Intell Inf Syst 2013, 40:159–184. https://doi.org/10.1007/s10844‐012‐0221‐8.
Li, Y, Wang, Y, Tian, E. A new architecture of an intelligent agent‐based crawler for domain‐specific deep Web databases. In: 2012 IEEE/WIC/ACM International Conf. Web Intell. Intell. Agent Technol., IEEE, 2012, 656–663. https://doi.org/10.1109/WI‐IAT.2012.103.
Liu, H, Milios, E, Janssen, J. Focused crawling by learning hmm from user`s topic‐specific browsing. In: IEEE/WIC/ACM Int. Conf. Web Intell., IEEE, 2004, 732–732. https://doi.org/10.1109/WI.2004.10057.
Liu, H, Janssen, J, Milios, E. Using HMM to learn user browsing patterns for focused Web crawling. Data Knowl Eng 2006, 59:270–291. https://doi.org/10.1016/j.datak.2006.01.012.
Luo, N, Zuo, W, Yuan, F, Zhang, C. A new method for focused crawler cross tunnel. In: First Int. Conf. Rough Sets Knowl. Technol., Berlin and Heidelberg, Springer, 2006, 632–637. https://doi.org/10.1007/11795131_92.
Luong, HP, Gauch, S, Wang, Q. Ontology‐based focused crawling. In: 2009 International Conference of the Information, Process. Knowl. Manag., IEEE, 2009, 123–128. https://doi.org/10.1109/eKNOW.2009.26.
Van de Maele, F, Spyns, P, Meersman, R. %22An ontology‐based crawler for the semantic Web%22. In: Meersman, R, Tari, Z, Herrero, P, eds. Expert Systems with Applications. Berlin and Heidelberg: Springer; 2008, 1056–1065. https://doi.org/10.1007/978‐3‐540‐88875‐8_133.
Makris, C, Panagis, Y, Sakkopoulos, E, Tsakalidis, A. Category ranking for personalized search. Data Knowl Eng 2007, 60:109–125. https://doi.org/10.1016/j.datak.2005.11.006.
Mali, S, Meshram, BB. Focused Web crawler with revisit policy. In: Proceedings of the International Conference of the Work. Emerg. Trends Technol.—ICWET ’11, New York, New York, ACM Press, 2011, 474. https://doi.org/10.1145/1980022.1980125.
Mangaravite, V, Assis, GT, Ferreira, AA. Improving the efficiency of a genre‐aware approach to focused crawling based on link context. In: 2012 Eighth Lat. Am. Web Congr., IEEE, North Latin American, 2012, 17–23. https://doi.org/10.1109/LA‐WEB.2012.24.
Menczer, F, Pant, G, Srinivasan, P, Ruiz, ME. Evaluating topic‐driven Web crawlers. In: Proceedings of the 24th Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr.—Special Interest Group on Information Retrieval ’01, New York, NY, ACM Press, 2001, 241–249. https://doi.org/10.1145/383952.383995.
Meusel R, Mika P, Blanco R. Focused crawling for structured data. In: Proceedings of the 23rd ACM International Conf. Conference on Information Knowledge Management.—CIKM ’14, New York, NY, ACM Press, 2014, 1039–1048. https://doi.org/10.1145/2661829.2661902.
Kc, M, Hagenbuchner, M, Tsoi, AC. Quality information retrieval for the world wide Web. In: 2008 IEEE/WIC/ACM International Conf. Web Intell. Intell. Agent Technol., IEEE, 2008, 655–661. https://doi.org/10.1109/WIIAT.2008.378.
Navrat, P, Jastrzembska, L, Jelinek, T. Bee hive at work: story tracking case study. In: 2009 IEEE/WIC/ACM International Jt. Conf. Web Intell. Intell. Agent Technol., IEEE, 2009, 117–120. https://doi.org/10.1109/WI‐IAT.2009.244.
Neunerdt, M, Niermann, M, Mathar, R, Trevisan, B. Focused crawling for building Web comment corpora. In: 2013 I.E. 10th Consum. Commun. Netw. Conf., IEEE, 2013, 685–688. https://doi.org/10.1109/CCNC.2013.6488526.
Nhan, NQ, Son, VT, Binh, HTT, Khanh, TD. Crawl topical vietnamese Web pages using genetic algorithm. In: 2010 Second International Conference of the Knowl. Syst. Eng., IEEE, 2010, 217–223. https://doi.org/10.1109/KSE.2010.25.
Ning, H, Wu, H, He, Z, Tan, Y. Focused crawler URL analysis model based on improved genetic algorithm, 2011 I.E. Int. Conf. Mechatronics Autom., 2011, 2159–2164. https://doi.org/10.1109/ICMA.2011.5986315.
Özel, SA. A web page classification system based on a genetic algorithm using tagged‐terms as features. Expert Syst Appl 2011, 38:3407–3415. https://doi.org/10.1016/j.eswa.2010.08.126.
Özmutlu, HC, Özmutlu, S. An architecture for SCS: a specialized Web crawler on the topic of security. Proceedings of the Am Soc Inf Sci Technol 2005, 41:317–326. https://doi.org/10.1002/meet.1450410138.
Pant, G, Srinivasan, P. Learning to crawl: comparing classification scheme. ACM Trans Inf Syst 2005, 23:430–462. https://doi.org/10.1145/1095872.1095875.
Peng, L, Wen‐Da, T. A focused Web crawler face stock information of financial field. In: 2010 I.E. Int. Conf. Intell. Comput. Intell. Syst., IEEE, 2010, 512–516. https://doi.org/10.1109/ICICISYS.2010.5658277.
Peng, T, Liu, L. Focused crawling enhanced by CBP–SLC. Knowledge‐Based Syst 2013, 51:15–26. https://doi.org/10.1016/j.knosys.2013.06.008.
Peng, T, Zhang, C, Zuo, W. Tunneling enhanced by web page content block partition for focused crawling. Concurr Comput Pract Exp 2008, 20:61–74. https://doi.org/10.1002/cpe.1211.
Pesaranghader, A, Pesaranghader, A, Mustapha, N, Sharef, NM. Improving multi‐term topics focused crawling by introducing term Frequency‐Information Content (TF‐IC) measure. In: 2013 Int. Conf. Res. Innov. Inf. Syst., IEEE, 2013, 102–106. https://doi.org/10.1109/ICRIIS.2013.6716693.
Priyatam, PN, Vaddepally, SR, Varma, V. Domain specific search in indian languages. In: Proceedings of the First Work. Inf. Knowl. Manag. Dev. Reg.—IKM4DR ’12, New York, NY, ACM Press, 2012, 23. https://doi.org/10.1145/2389776.2389782.
Putra, WE, Akbar, S. Focused crawling using dictionary algorithm with breadth first and by page length methods for Javanese and Sundanese corpus construction. Procedia Technol 2013, 11:870–876. https://doi.org/10.1016/j.protcy.2013.12.270.
Qin, J, Chen, H. Using genetic algorithm in building domain‐specific collections: an experiment in the nanotechnology domain. In: Proceedings of the 38th Annu. Hawaii International Conference of the Syst. Sci., IEEE, 2005, 102b–102b. https://doi.org/10.1109/HICSS.2005.659.
Qin, JQJ, Zhou, Y, Chau, M. Building domain‐specific Web collections for scientific digital libraries: a meta‐search enhanced focused crawling method. Proceedings of the 2004 Jt. ACM/IEEE Conf. Digit. Libr. 2004, 2004, 135–141. https://doi.org/10.1109/JCDL.2004.1336110.
Radu, I, Rebedea, T. A focused crawler for Romanian words discovery. In: 2014 RoEduNet Conf. 13th Ed. Netw. Educ. Res. Jt. Event RENAM 8th Conf., IEEE, 2014, 1–6. https://doi.org/10.1109/RoEduNet‐RENAM.2014.6955323.
Ravakhah, M, Kamyar, M. Semantic similarity based focused crawling. In: 2009 First International Conference of the Comput. Intell. Commun. Syst. Networks, IEEE, 2009, 448–453. https://doi.org/10.1109/CICSYN.2009.92.
Rocco D, Caverlee J, Liu L, Critchlow T. Domain‐specific Web service discovery with service class descriptions. In: IEEE Int. Conf. Web Serv., IEEE, 2005, 1–8. https://doi.org/10.1109/ICWS.2005.49.
Safran, MS, Althagafi, A, Che, D. Improving relevance prediction for focused Web crawlers. In: 2012 IEEE/ACIS 11th International Conference of the Comput. Inf. Sci., IEEE, 2012, 161–166. https://doi.org/10.1109/ICIS.2012.61.
Samarawickrama, S, Jayaratne, L. Automatic text classification and focused crawling. In: 2011 Sixth Int. Conf. Digit. Inf. Manag., IEEE, 2011, 143–148. https://doi.org/10.1109/ICDIM.2011.6093329.
Schuh, G, Brakling Apfel, AK. Identification of requirements for focused crawlers in technology intelligence. In: Portland International Conference on Management of Engineering %26 Technology (PICMET), 2014, 2918–2923.
Selamat, A, Ahmadi‐Abkenari, F. Application of clickstream analysis as Web page importance metric in parallel crawlers. In: 2010 Int. Symp. Inf. Technol., IEEE, 2010, 1–6. https://doi.org/10.1109/ITSIM.2010.5561354.
Selamat, A, Ahmadi‐Abkenari, F. %22Architecture for a parallel focused crawler for clickstream analysis%22. In: Intelligent Information and Database Systems. Nguyen, NT, Kim, C‐G, Janiak, A, eds. Berlin and Heidelberg: Springer; 2011, 27–35. https://doi.org/10.1007/978‐3‐642‐20039‐7_3.
Shchekotykhin, K, Jannach, D, Friedrich, G. xCrawl: a high‐recall crawling method for Web mining. In: 2008 Eighth IEEE International Conference of the Data Min., IEEE, 2008, 550–559. https://doi.org/10.1109/ICDM.2008.121.
Yang, S‐Y. A focused crawler with ontology‐supported website models for information agents. Expert Syst Appl 2010, 37:5381–5389. https://doi.org/10.1016/j.eswa.2010.01.018.
Shokouhi, M, Chubak, P, Raeesy, Z. Enhancing focused crawling with genetic algorithms. In: International Conference of the Inf. Technol. Coding Comput,—Vol. II, IEEE, 2005, Vol. 2, 503–508. https://doi.org/10.1109/ITCC.2005.145.
Sirisha Gadiraju, NVG, Krishna Chaitanya, R, Padma Raju, G. Effect of feature selection method on the performance of focused crawlers—a case study on traditional and accelerated focused crawlers. In: 2010 Int. Conf. Netw. Inf. Technol., IEEE, 2010, 482–487. https://doi.org/10.1109/ICNIT.2010.5508468.
Sizov, S, Siersdorfer, S, Theobald, M, Weikum, G. The BINGO! focused crawler: from bookmarks to archetypes. In: Proceedings of the 18th International Conference of the Data Eng., IEEE Comput. Soc, 2002, 337–338. https://doi.org/10.1109/ICDE.2002.994746.
Su, C, Gao, Y, Yang, J, Luo, B. An efficient adaptive focused crawler based on ontology learning. In: Fifth International Conference of the Hybrid Intell. Syst., IEEE, 2005, 6 pp. https://doi.org/10.1109/ICHIS.2005.19.
Sun, Y, Jin, P, Yue, L. A framework of a hybrid focused Web crawler. In: 2008 Second International Conference of the Futur. Gener. Commun. Netw. Symp., IEEE, 2008, 50–53. https://doi.org/10.1109/FGCNS.2008.73.
Tang, TT, Hawking, D, Craswell, N, Griffiths, K. Focused crawling for both topical relevance and quality of medical information. In: Proceedings of the 14th ACM International Conference on Information Knowledge Management.—CIKM ’05, New York, NY, ACM Press, 2005, 147. https://doi.org/10.1145/1099554.1099583.
Thukral, A, Mendiratta, V, Behl, A, Banati, H, Bedi, P. %22FCHC: a social semantic focused crawler%22. In: Advances in Computing and Communications. Abraham, A, Mauri, JL, Buford, JF, Suzuki, J, Thampi, SM, eds. Berlin and Heidelberg: Springer; 2011, 273–283. https://doi.org/10.1007/978‐3‐642‐22714‐1_29.
Tsay, J‐J, Shih, C‐Y, Wu, B‐L. AuToCrawler: an integrated system for automatic topical crawler. In: Fourth Annu. ACIS International Conference of the Comput. Inf. Sci., IEEE, 2005, 462–467. https://doi.org/10.1109/ICIS.2005.33.
Uzun, E, Serdar Güner, E, Kılıçaslan, Y, Yerlikaya, T, Agun, HV. An effective and efficient Web content extractor for optimizing the crawling process. Softw Pract Exp 2014, 44:1181–1199. https://doi.org/10.1002/spe.2195.
Wang H, Wang X, Wang Y, Bhattacharjee A. , S.K. Basireddy, A. Cherian, Preliminary study on design and development of a journal focused crawler system using EBD methodology: Part I; Design task and environment analysis. In: Proceedings of the 2014 Int. Conf. Innov. Des. Manuf., IEEE, 2014, 59–64. https://doi.org/10.1109/IDAM.2014.6912671.
Wang H, Wang X, Wang Y, Bhattacharjee A, Basireddy SK, Cherian A. A preliminary study on design and development of a journal focused crawler system using EBD methodology. Part II—conflict identification and solution generation. In: Proceedings of the 2014 Int. Conf. Innov. Des. Manuf., 2014, 123–128.
Wang, N. Design and implementation of a crawling system in shopping search engine. In: 2009 Second Int. Work. Comput. Sci. Eng., IEEE, 2009, 212–216. https://doi.org/10.1109/WCSE.2009.798.
Wang, W, Chen, X, Zou, Y, Wang, H, Dai, Z. A focused crawler based on naive bayes classifier. In: 2010 Third Int. Symp. Intell. Inf. Technol. Secur. Informatics, IEEE, 2010, 517–521. https://doi.org/10.1109/IITSI.2010.30.
Wei, B, Liu, J, Ma, J, Zheng, Q, Zhang, W, Feng, B. DFT‐extractor: a system to extract domain‐specific faceted taxonomies from Wikipedia. In: Proceedings of the 22Nd Int. Conf. World Wide Web Companion, International World Wide Web Conferences Steering Committee, Switzerland, Republic and Canton of Geneva, 2013, 277–280. http://dl.acm.org/citation.cfm?id=2487788.2487922 (accessed March 18, 2015).
Wu, Y, Shou, L, Hu, T, Chen, G. Query triggered crawling strategy: build a time sensitive vertical search engine. In: 2008 Int. Conf. Cyberworlds, IEEE, 2008, 422–427. https://doi.org/10.1109/CW.2008.35.
Wu, Z, Wu, J, Khabsa, M, Williams, K, Chen, H, Huang, W, et al., Towards building a scholarly big data platform: challenges, lessons and opportunities. In: IEEE/ACM Jt. Conf. Digit. Libr., IEEE, 2014, 117–126. https://doi.org/10.1109/JCDL.2014.6970157.
Xi, S, Sun, F, Wang, J. %22A cognitive crawler using structure pattern for incremental crawling and content extraction%22. In: Intergovernmental Panel on Climate Change (Ed.), 9th IEEE International Conference on Cognitive Informatics %26 Cognitive Computing. Cambridge: IEEE; 2010, 238–244. https://doi.org/10.1109/COGINF.2010.5599733.
Xiang L, Meng X. A data mining approach to topic‐specific Web resource discovery. In: 2009 Second International Conference of the Intell. Comput. Technol. Autom., IEEE, 2009, 595–599. https://doi.org/10.1109/ICICTA.2009.378.
Xin, C, Yong, Z, Fuyan, Z, Changbao, N. Architecture design of subject‐oriented Web crawler. In: 2013 Fourth International Conference of the Intell. Syst. Des. Eng. Appl., IEEE, 2013, 174–177. https://doi.org/10.1109/ISDEA.2013.444.
Xu, L, Eli, S, Xu, H. A method of focused crawling for software components. In: Proceedings of the 2011 Int. Conf. Transp. Mech. Electr. Eng., IEEE, 2011, 1560–1563. https://doi.org/10.1109/TMEE.2011.6199506.
Xu, Q, Zuo, W. First‐order focused crawling. In: Proceedings of the 16th International Conference of the World Wide Web—WWW ’07, New York, NY, ACM Press, 2007, 1159. https://doi.org/10.1145/1242572.1242744.
Ya‐jun, D, Yang, X, Zhan‐shen, L, Dong‐mei Qi. Discussion on interest spider`s algorithm of search engine. In: Proceedings of the 2004 IEEE International Conference of Information Reuse and Integration. 2004. IRI 2004., IEEE, n.d., 588–593. https://doi.org/10.1109/IRI.2004.1431525.
Yang, J, Kang, J, Choi, J. A focused crawler with document segmentation. In: International Conference on Intelligent Data Engineering and Automated Learning, 2005, 94–101. https://doi.org/10.1007/11508069_13.
Yang, S‐Y. An ontological website models‐supported search agent for Web services. Expert Syst Appl 2008, 35:2056–2073. https://doi.org/10.1016/j.eswa.2007.09.024.
Yifeng, C, Hengkai, Z, Xiaoqing, Y, Wanggen, W. Research of theme crawling strategy based on genetic algorithm. In: IET Int. Commun. Conf. Wirel. Mob. Comput. (CCWMC 2009), IET, 2009, 472–475. https://doi.org/10.1049/cp.2009.1993.
Ying, L, Zhou, X, Yuan, J, Huang, Y. A novel focused crawler based on breadcrumb navigation. Adv Swarm Intell 2012, 7332:264–271. https://doi.org/10.1007/978‐3‐642‐31020‐1_31.
Yuan, F, Yin, C, Liu, J. Improvement of pagerank for focused crawler. Proceedings of the Eighth ACIS International Conference of the Softw. Eng. Artif. Intell. Netw. Parallel Distributed Comput. 2007, Vol. 02. 3, 797–802. https://doi.org/10.1109/SNPD.2007.314.
Yuan, F, Yin, C, Liu, J, Zhang, Y. An integrated crawling strategy for domain‐specific resource discovery. 2007 Third Int. IEEE Conf. Signal‐Image Technol. Internet‐Based Syst., 2007, 329–336. https://doi.org/10.1109/SITIS.2007.70.
Zhang, XZX, Li, ZLZ, Hu, CHC. Adaptive focused crawler based on tunneling and link analysis. In: 2009 11th Int. Conf. Adv. Commun. Technol.2009, 03, 2225–2230.
Zhang, Z, Nasraoui, O, Van Zwol, R. Exploiting tags and social profiles to improve focused crawling. In: 2009 IEEE/WIC/ACM International Jt. Conf. Web Intell. Intell. Agent Technol., IEEE, 2009, 136–139. https://doi.org/10.1109/WI‐IAT.2009.27.
ZHENG, H, KANG, B, KIM, H. An ontology‐based approach to learnable focused crawling. Inf Sci 2008, 178:4512–4522. https://doi.org/10.1016/j.ins.2008.07.030.
Zheng, S. Genetic and ant algorithms based focused crawler design. In: 2011 Second International Conference of the Innov. Bio‐Inspired Comput. Appl., IEEE, 2011, 374–378. https://doi.org/10.1109/IBICA.2011.98.
Zheng, X, Zhou, T, Yu, Z, Chen, D. URL rule based focused crawler. In: 2008 IEEE International Conference of the e‐Business Engineering IEEE, 2008, 147–154. https://doi.org/10.1109/ICEBE.2008.61.
Zhou, B, Xiao, B, Lin, Z, Zhang, C. A distributed vertical crawler using crawling‐period based strategy. Proceedings of the 2010 2nd International Conference of the Futur. Comput. Commun. ICFCC 2010, 2010, 1, 306–311. https://doi.org/10.1109/ICFCC.2010.5497780.
Zhu, Q. An algorithm OFC for the focused Web crawler. Proceedings of the Sixth International Conference of the Mach. Learn. Cybern. ICMLC 2007, 2007, 7, 4059–4063. https://doi.org/10.1109/ICMLC.2007.4370856.
Zhuang, Z, Wagle, R, Giles, CL. What`s there and what`s not? In: Proceedings of the 5th ACM/IEEE‐CS Joint conference on digital library JCDL ’05, New York, NY, ACM Press, 2005, 301. https://doi.org/10.1145/1065385.1065455.
Zunino, R, Bisio, F, Peretti, C, Surlinelli, R, Scillia, E, Ottaviano, A, et al., An analyst‐adaptive approach to focused crawlers. In: Proceedings of the 2013 IEEE/ACM International Advances in Social Networks Analysis and Mining—ASONAM ’13, New York, NY, ACM Press, 2013, 1073–1077. https://doi.org/10.1145/2492517.2500328.
Diligenti, M, Coetzee, FM, Lawrence, S, Giles, CL, Gori, M. Focused crawling using context graphs. 26th Int. Conf. Very Large Databases, 2000, 527–534.
Batsakis, S, Petrakis, EGM, Milios, E. Improving the performance of focused Web crawlers. Data Knowl Eng 2009, 68:1001–1013. https://doi.org/10.1016/j.datak.2009.04.002.
Maimunah, S, Widyantoro, DH, Kuspriyanto,, Sastramihardja, HS. Co‐citation & co‐reference concepts to control focused crawler exploration. In: Proceedings of the 2011 Int. Conf. Electr. Eng. Informatics, IEEE, 2011, 1–7. https://doi.org/10.1109/ICEEI.2011.6021677.
Pappas, N, Katsimpras, G, Stamatatos, E. An agent‐based focused crawling framework for topic‐ and genre‐related Web document discovery. In: 2012 I.E. 24th International Conference of the Tools with Artif. Intell., IEEE, 2012, 508–515. https://doi.org/10.1109/ICTAI.2012.75.
Cho, J, Garcia‐Molina, H, Parallel crawlers. In: Proceedings of the Elev. International Conference of the World Wide Web—WWW ’02, New York, NY, ACM Press, 2002, 124. https://doi.org/10.1145/511446.511464.
Akilandeswari, J, Gopalan, NP. A novel design of hidden Web crawler using reinforcement learning based agents. In: Adv. Parallel Process. Technol., Berlin and Heidelberg, Springer, 2007, 433–440. https://doi.org/10.1007/978‐3‐540‐76837‐1_47.
Álvarez, M, Raposo, J, Pan, A, Cacheda, F, Bellas, F, Carneiro, V. DeepBot: a focused crawler for accessing hidden Web content. In: ACM International Conference of the Proceeding Ser. Vol. 236, New York, NY, ACM, 2007, 18. https://doi.org/10.1145/1278380.1278385.
An, YJ, Geller, J, Wu, Y‐T, Chun, SA. Automatic generation of ontology from the deep Web. In: 18th International Conference of the Database Expert Syst. Appl. (DEXA 2007), IEEE, 2007, 470–474. https://doi.org/10.1109/DEXA.2007.43.
An, YJ, Geller, J, Wu, Y‐T, Chun, SA. Semantic deep Web: automatic attribute extraction from the deep Web data sources. In: Proceedings of the 2007 ACM Symp. Appl. Comput.—SAC ’07, New York, NY, ACM, 2007, 1667. https://doi.org/10.1145/1244002.1244355.
An, YJ, Chun, SA, Huang, K, Geller, J. Assessment for ontology‐supported deep Web search. In: 2008 10th IEEE Conf. E‐Commerce Technol. Fifth IEEE Conf. Enterp. Comput. E‐Commerce E‐Services, IEEE, 2008, 382–388. https://doi.org/10.1109/CECandEEE.2008.117.
Arya, KVV, Vadlamudi, BR, An ontology‐based topical crawling algorithm for accessing deep Web content. In: 2012 Third Int. Conf. Comput. Commun. Technol., 2012, 1–6. https://doi.org/10.1109/ICCCT.2012.10.
Barbosa, L, Freire, J. An adaptive crawler for locating hiddenwebentry points. In: Proceedings of the 16th International Conference of the World Wide Web—WWW ’07, New York, NY, ACM Press, 2007, 441. https://doi.org/10.1145/1242572.1242632.
Bergholz, A, Childlovskii, B. Crawling for domain‐specific hidden Web resources. In: Proceedings of the 7th International Conference of the Prop. Appl. Dielectr. Mater. (Cat. No.03CH37417), IEEE Comput. Soc, 2003, 125–133. https://doi.org/10.1109/WISE.2003.1254476.
Chandramouli A, Gauch S. A co‐operative Web services paradigm for supporting crawlers. In: Large Scale Semant. Access to Content (Text, Image, Video, Sound), LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE, Paris, 2007, 475–489. https://doi.org/10.1.1.106.2411.
Cho, J, Garcia‐Molina, H, Haveliwala, T, Lam, W, Paepcke, A, Raghavan, S, et al. Stanford WebBase components and applications. ACM Trans Internet Technol 2006, 6:153–186. https://doi.org/10.1145/1149121.1149124.
El‐desoky, AI, Abd El‐Gwad, AO, Okasha, ME. Exploiting ontology for retrieving data behind searchable Web forms. In: 2009 Int. Conf. Netw. Media Converg., IEEE, 2009, 97–102. https://doi.org/10.1109/ICNM.2009.4907197.
El‐Desouky, AI, Ali, HA, El‐Ghamrawy, SM. A new framework for domain‐specific hidden Web crawling based on data extraction techniques. In: 2006 ITI 4th International Conference of the Inf. Commun. Technol., IEEE, 2006, 1–1. https://doi.org/10.1109/ITICT.2006.358295.
El‐desouky, A, Ali, H, El‐ghamrawy, S. An automatic label extraction technique for domain‐specific hidden Web crawling (LEHW). In: 2006 Int. Conf. Comput. Eng. Syst., IEEE, 2006, 454–459. https://doi.org/10.1109/ICCES.2006.320490.
Fontes, ADC, Silva, FS. SmartCrawl: a new strategy for the exploration of the hidden Web. In: Proceedings of the 6th Annu. ACM International Work. Web Inf. Data Manag.—WIDM ’04, New York, NY, ACM Press, 2004, 9. https://doi.org/10.1145/1031453.1031457.
Ipeirotis, PG, Agichtein, E, Jain, P, Gravano, L. Towards a query optimizer for text‐centric tasks. ACM Trans Database Syst 2007, 32:21–es. https://doi.org/10.1145/1292609.1292611.
Ipeirotis, PG, Gravano, L, Sahami, M. Probe, count, and classify. In: Proceedings of the 2001 ACM SIGMOD International Conference of the Manag. Data—SIGMOD ’01, New York, NY, ACM Press, 2001, 67–78. https://doi.org/10.1145/375663.375671.
Li, H, Guo, M, Cai, L, Yang, Y. An incremental update strategy in deep Web. In: 2010 Sixth Int. Conf. Nat. Comput., IEEE, 2010, 131–134. https://doi.org/10.1109/ICNC.2010.5583330.
Liang, H, Ren, F, Zuo, W. The preliminary process of modeling in deep Web information fusion system. In: 2009 Int. Forum Inf. Technol. Appl., IEEE, 2009, 723–726. https://doi.org/10.1109/IFITA.2009.27.
Liang, H, Zuo, W, Ren, F, Sun, C. Accessing deep Web using automatic query translation technique. In: 2008 Fifth International Conference of the Fuzzy Syst. Knowl. Discov., IEEE, 2008, 267–271. https://doi.org/10.1109/FSKD.2008.18.
Liang, H, Zuo, W, Ren, F, Wang, J. Translating query for deep Web using ontology. In: 2008 International Conference of the Comput. Sci. Softw. Eng., IEEE, 2008, 427–430. https://doi.org/10.1109/CSSE.2008.630.
Liu, X, Maly, K, Zubair, M, Nelson, ML. DP9: an OAI geteway service for Web crawlers. In: Proceedings of the Second ACM/IEEE‐CS Jt. Conf. Digit. Libr.—JCDL ’02, New York, NY, ACM Press, 2002, 283. https://doi.org/10.1145/544220.544284.
Ma, W, Chen, X, Shang, W. Advanced deep Web crawler based on dom. In: 2012 Fifth Int. Jt. Conf. Comput. Sci. Optim., IEEE, 2012, 605–609. https://doi.org/10.1109/CSO.2012.138.
Madaan, R, Dixit, A, Sharma, AK, Bhatia, KK. A framework for domain specific incremental hidden Web crawler. Int J Comput Sci Eng 2010, 02:753–758.
Mesbah, A, van Deursen, A, Lenselink, S. Crawling Ajax‐based Web applications through dynamic analysis of user interface state changes. ACM Trans Web 2012, 6:1–30. https://doi.org/10.1145/2109205.2109208.
Moraes, MC, Heuser, CA, Moreira, VP, Barbosa, D. Prequery discovery of domain‐specific query forms: a survey. IEEE Trans Knowl Data Eng 2013, 25:1830–1848. https://doi.org/10.1109/TKDE.2012.111.
Mundluru, D, Xia, X. Experiences in crawling deep Web in the context of local search. In: Proceeding 2nd Int. Work. Geogr. Inf. Retr.—GIR ’08, New York, NY, ACM Press, 2008, 35. https://doi.org/10.1145/1460007.1460016.
Myllymaki, J. Effective Web data extraction with standard XML technologies. Comput Networks 2002, 39:635–644. https://doi.org/10.1016/S1389‐1286(02)00214‐1.
Nguyen, H, Kang, EY, Freire, J. Automatically extracting form labels. In: 2008 I.E. 24th International Conference of the Data Eng., IEEE, 2008, 1498–1500. https://doi.org/10.1109/ICDE.2008.4497602.
Nguyen, H, Nguyen, T, Freire, J. Learning to extract form labels. Proc VLDB Endow 2008, 1:684–694. https://doi.org/10.14778/1453856.1453931.
Nguyen, TH, Nguyen, H, Freire, J, PruSM: a prudent schema matching approach for Web forms. In: Proceedings of the 19th ACM International Conference on Information Knowledge Management.—CIKM ’10, New York, NY, ACM Press, 2010, 1385. https://doi.org/10.1145/1871437.1871627.
Peisu, X., Ke, T, Qinzhen, H. A framework of deep Web crawler. In: 2008 27th Chinese Control Conf., IEEE, 2008, 582–586. https://doi.org/10.1109/CHICC.2008.4604881.
Rajaraman, A. Kosmix: high‐performace topic exploration using the deep Web. Proc VLDB Endow 2009, 2:1524–1529. https://doi.org/10.14778/1687553.1687581.
Singh, L, Sharma, DK. An approach for accessing data from hidden Web using intelligent agent technology. In: 2013 3rd IEEE Int. Adv. Comput. Conf., IEEE, 2013, 800–805. https://doi.org/10.1109/IAdCC.2013.6514329.
Singh, L, Sharma, DK. An architecture for extracting information from hidden Web databases using intelligent agent technology through reinforcement learning. In: 2013 IEEE conference on Information %26 Communication Technologies IEEE, 2013, 292–297. https://doi.org/10.1109/CICT.2013.6558108.
Taylan, D, Poyraz, M, Akyokus, S, Ganiz, MC. Intelligent focused crawler: learning which links to crawl. In: 2011 Int. Symp. Innov. Intell. Syst. Appl., IEEE, 2011, 504–508. https://doi.org/10.1109/INISTA.2011.5946150.
Furche, T, Gottlob, G, Grasso, G, Guo, X, Orsi, G, Schallhart, C. The ontological key: automatically understanding and integrating forms to access the deep Web. Very Large Databases Journal (VLDB) 2013, 22:615–640. https://doi.org/10.1007/s00778‐013‐0323‐0.
Wang, X, Wang, L, Wei, G, Zhang, D, Yang, Y. Hidden Web crawling for SQL injection detection. In: 2010 3rd IEEE Int. Conf. Broadband Netw. Multimed. Technol., IEEE, 2010, 14–18. https://doi.org/10.1109/ICBNMT.2010.5704860.
Wang, Z, Hu, R, Hu, J. Research of a traffic advisory system based on deep web. In: 2009 International Conference of the Commun. Softw. Networks, IEEE, 2009, 537–540. https://doi.org/10.1109/ICCSN.2009.64.
Yan, Z, Li, Q, Dong, Y, Ding, Y. An ontology‐based integration of Web query interfaces for house search. In: 2008 Int. Conf. Inf. Autom., IEEE, 2008, 190–194. https://doi.org/10.1109/ICINFA.2008.4607994.
Yu, H, Guo, J, Yu, Z, Xian, Y, Yan, X. A novel method for extracting entity data from deep Web precisely. In: 26th Chinese Control Decis. Conf. (2014 CCDC), 2014, 5049–5053. https://doi.org/10.1109/CCDC.2014.6853078.
Ntoulas, A, Pzerfos, P, Cho, JCJ. Downloading textual hidden Web content through keyword queries, Proceedings of the 5th ACM/IEEE‐CS Jt. Conf. Digit. Libr. (JCDL ’05), 2005, 100–109. https://doi.org/10.1145/1065385.1065407.
Zhang, Z, Dong, G, Peng, Z, Yan, Z. %22A framework for incremental deep Web crawler based on URL classification%22. In: Gong, Z, Luo, X, Chen, J, Lei, J, Wang, FL, eds. International Journal of Computational Science and Engineering. Berlin and Heidelberg: Springer; 2011, 302–310. https://doi.org/10.1007/978‐3‐642‐23982‐3_37.
Huang, Q, Li, Q, Li, H, Yan, Z. An approach to incremental deep Web crawling based on incremental harvest model. Procedia Eng 2012, 29:1081–1087. https://doi.org/10.1016/j.proeng.2012.01.093.
Raghavan, S, Garcia‐Molina, H. Crawling the Hidden Web. San Francisco, CA: Morgan Kaufmann Publishers Inc.; 2001. http://ilpubs.stanford.edu:8090/456/.
Aggarwal, CC. Collaborative crawling: mining user experiences for topical resource discovery. In: Proceedings of the Eighth ACM SIGKDD International Conference of the Knowl. Discov. Data Min.—KDD ’02, New York, NY, ACM Press, 2002, 423. https://doi.org/10.1145/775047.775108.
Aggarwal, CC, Al‐Garawi, F, Yu, PS. On the design of a learning crawler for topical resource discovery. ACM Trans Inf Syst 2001, 19:286–309. https://doi.org/10.1145/502115.502119.
Baykan, E, Henzinger, M, Marian, L, Weber, I. Purely URL‐based topic classification. In: Proceedings of the 18th International Conference of the World Wide Web—WWW ’09, New York, NY, ACM Press, 2009, 1109–1110. https://doi.org/10.1145/1526709.1526880.
Can, AB, Baykal, N. MedicoPort: a medical search engine for all. Comput Methods Programs Biomed 2007, 86:73–86. https://doi.org/10.1016/j.cmpb.2007.01.007.
Chung, C, Clarke, CLA. Topic‐oriented collaborative crawling. In: Proceedings of the Elev. Int. Conference on Information Knowledge Management.—CIKM ’02, New York, NY, ACM Press, 2002, 34. https://doi.org/10.1145/584792.584802.
Davison, BD. Topical locality in the Web. In: Proceedings of the 23rd Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr.—SIGIR ’00, New York, NY, ACM Press, 2000, 272–279. https://doi.org/10.1145/345508.345597.
Greenwood, M, Nenadic, G. Lexical profiling of existing web directories to support fine‐grained topic‐focused Web crawling. In: Proceedings of the 2008 BCS IRSG Conf. Corpus Profiling, Swinton, British Computer Society, 2008. http://dl.acm.org/citation.cfm?id=2227976.2227982.
Hsu, C‐C, Wu, F. Topic‐specific crawling on the Web with the measurements of the relevancy context graph. Inf Syst 2006, 31:232–246. https://doi.org/10.1016/j.is.2005.02.007.
Huifu, Z, Yaping, Z, Ping, L, Xiaolan, Z. Research and implementation on topic crawler of rotating machinery fault knowledge. In: Proceedings of the 2011 Int. Conf. Comput. Sci. Netw. Technol., IEEE, 2011, 1464–1467. https://doi.org/10.1109/ICCSNT.2011.6182242.
Luo, L, Wang, R, Huang, X, Chen, Z. A novel shark‐search algorithm for theme crawler. In: WISM`12 Proceedings of the 2012 international conference on Web Information Systems and Mining, 2013, 603–609. https://doi.org/10.1007/978‐3‐642‐33469‐6_75.
Menczer, F, Pant, G, Srinivasan, P. Topical Web crawlers: evaluating adaptive algorithms. ACM Trans Internet Technol 2004, 4:378–419. https://doi.org/10.1145/1031114.1031117.
Mouton, A, Marteau, P. Exploiting routing information encoded into backlinks to improve topical crawling. In: 2009 International Conference of the Soft Comput. Pattern Recognit., IEEE, 2009, 659–664. https://doi.org/10.1109/SoCPaR.2009.129.
Mukherjea, S. WTMS: a system for collecting and analyzing topic‐specific Web information. Comput Networks 2000, 33:457–471. https://doi.org/10.1016/S1389‐1286(00)00035‐9.
Noh, S, Choi, Y, Seo, H, Choi, K, Jung, G. %22An Intelligent Topic‐Specific Crawler Using Degree of Relevance%22. In: Yang, ZR, Yin, H, Everson, RM, eds. 3177 LNCS. Berlin and Heidelberg: Springer; 2004, 491–498. https://doi.org/10.1007/978‐3‐540‐28651‐6_72.
Pant, G, Srinivasan, P. Link contexts in classifier‐guided topical crawlers. IEEE Trans Knowl Data Eng 2006, 18:107–122. https://doi.org/10.1109/TKDE.2006.12.
Pant, GPG, Tsioutsiouliklis, K, Johnson, J, Giles, CL. Panorama: extending digital libraries with topical crawlers, Proceedings of the 2004 Jt. ACM/IEEE Conf. Digit. Libr. 2004, 2004, 142–150. https://doi.org/10.1109/JCDL.2004.1336111.
Peng, Q, Du, Y, Hai, Y, Chen, S, Gao, Z. Topic‐Specific Crawling on the Web with Concept Context Graph Based on FCA, in: 2009 International Conference of the Manag. Serv. Sci., IEEE, 2009, 1–4. https://doi.org/10.1109/ICMSS.2009.5302301.
Pesaranghader, A, Mustapha, N, Pesaranghader, A. Applying semantic similarity measures to enhance topic‐specific Web crawling. In: 2013 13th Int. Conf. Intellient Syst. Des. Appl., IEEE, New York, NY, 2013, 205–212. https://doi.org/10.1109/ISDA.2013.6920736.
Qian, R, Zhang, K, Zhao, G. A topic‐specific Web crawler based on content and structure mining. In: Proceedings of the 2013 3rd Int. Conf. Comput. Sci. Netw. Technol., IEEE, 2013, 458–461. https://doi.org/10.1109/ICCSNT.2013.6967153.
Rungsawang, A, Angkawattanawit, N. Learnable topic‐specific Web crawler. J Netw Comput Appl 2005, 28:97–114. https://doi.org/10.1016/j.jnca.2004.01.001.
Saha, S, Murthy, CA, Pal, SK. Rough set based ensemble prediction for topic specific Web crawling. In: 2009 Seventh International Conference of the Adv. Pattern Recognit., IEEE, 2009, 153–156. https://doi.org/10.1109/ICAPR.2009.17.
Vikas, O, Chiluka, NJ, Ray, PK, Meena, G, Meshram, AK, Gupta, A, et al., WebMiner—anatomy of super peer based incremental topic‐specific Web crawler. In: Sixth Int. Conf. Netw., IEEE, 2007, 32–32. https://doi.org/10.1109/ICN.2007.104.
Wei‐jiang, L, Hua‐suo, R, Kun, H, Jia, L. A new algorithm of blog‐oriented crawler. In: 2009 Int. Forum Comput. Sci. Appl., IEEE, 2009, 428–431. https://doi.org/10.1109/IFCSTA.2009.110.
Wei‐jiang, L, Hua‐suo, R, Tie‐jun, Z, Wen‐mao, Z. A new algorithm of topical crawler. In: 2009 Second Int. Work. Comput. Sci. Eng., IEEE, 2009, 443–446. https://doi.org/10.1109/WCSE.2009.706.
Yang, Y, Du, Y, Sun, J, Hai, Y. %22A Topic‐specific web crawler with concept similarity context graph based on FCA%22. In: Huang, D‐S, Wunsch, DC, Levine, DS, Jo, K‐H, eds. Advanced Intelligent Computing Theories and Applications with Aspects of Contemporary Intelligent Computing Techniques. Berlin and Heidelberg: Springer; 2008, 840–847. https://doi.org/10.1007/978‐3‐540‐85984‐0_101.
Yang, Y, Du, Y, Hai, Y, Gao, Z. A topic‐specific Web crawler with web page hierarchy based on HTML dom‐tree. In: 2009 Asia‐Pacific Conf. Inf. Process., IEEE, 2009, 420–423. https://doi.org/10.1109/APCIP.2009.110.
Zhang, H, Lu, J. SCTWC: an online semi‐supervised clustering approach to topical Web crawlers. Appl Soft Comput 2010, 10:490–495. https://doi.org/10.1016/j.asoc.2009.08.017.
Zhang, W, Xu, B, Lu, H. Web page`s blocks based topical crawler. Proceedings of the 4th IEEE International Symposium on Service‐Oriented System Engineering. SOSE 2008, 2008, 44–49. https://doi.org/10.1109/SOSE.2008.10.
Zhang, Y‐H, Zhang, F. Research on new algorithm of topic‐oriented crawler and duplicated web pages detection. In: 8th International Conference of the Intell. Comput. Theor. Appl., 2012, 35–42. https://doi.org/10.1007/978‐3‐642‐31576‐3_5.
Zhao, M, Zhu, P, He, T. An intelligent topic Web crawler based on DTB. In: 2010 International Conference of the Web Inf. Syst. Min., IEEE, 2010, 84–86. https://doi.org/10.1109/WISM.2010.155.
Zong, X, Shen, Y, Liao, X. %22Improvement of HITS for topic‐specific Web crawler%22. In: Advances in Intelligent Computing. Huang, D‐S, Zhang, X‐P, Huang, G‐B, eds. Berlin Heidelberg: Springer; 2005, 524–532. https://doi.org/10.1007/11538059_55.
Bergmark, D. Collection synthesis. In: Proceedings of the Second ACM/IEEE‐CS Jt. Conf. Digit. Libr.—JCDL ’02, New York, NY, ACM Press, 2002, 253. https://doi.org/10.1145/544220.544275.
Chen, L, Li, Z, Yu, Z, Han, G. Classifier‐guided topical crawler: a novel method of automatically labeling the positive URLs. In: 2009 Fifth International Conference of the Semant. Knowl. Grid, IEEE, 2009, 270–273. https://doi.org/10.1109/SKG.2009.60.
Xu, Y, Ai‐na, S, Zhan‐kun, T. Topical crawler based on multi‐level vector space model and optimized hyperlink chosen strategy. In: 9th IEEE Int. Conf. Cogn. Informatics, IEEE, 2010, 430–435. https://doi.org/10.1109/COGINF.2010.5599702.
Dixit, A, Sharma, AK. Security system for migrating crawlers. In: 2011 International Conference of the Comput. Intell. Commun. Networks, IEEE, 2011, 667–671. https://doi.org/10.1109/CICN.2011.145.
Gupta, A, Dixit, A, Sharma, AK. Prospective terms based architecture for migrating crawler. In: 2012 Fourth International Conference of the Comput. Intell. Commun. Networks, IEEE, 2012, 915–919. https://doi.org/10.1109/CICN.2012.168.
Kausar, MA, Nasar, M, Singh, SK. Maintaining the repository of search engine freshness using mobile crawler. In: 2013 Annu. International Conference of the Emerg. Res. Areas 2013 Int. Conf. Microelectron. Commun. Renew. Energy, IEEE, 2013, 1–6. https://doi.org/10.1109/AICERA‐ICMiCR.2013.6575995.
Miller, RC, Bharat, K. SPHINX: a framework for creating personal, site‐specific Web crawlers. Comput Networks ISDN Syst 1998, 30:119–130. https://doi.org/10.1016/S0169‐7552(98)00064‐6.
Pandey, S, Mishra, RB. Intelligent Web mining model to enhance knowledge discovery on the Web. In: 2006 Seventh International Conference of the Parallel Distrib. Comput. Appl. Technol., IEEE, 2006, 339–343. https://doi.org/10.1109/PDCAT.2006.74.
Upadhyay, V, Balwan, J, Shankar, G, Amritpal, A. %22Security approach for mobile agent based crawler%22. In: Advances in Computer Science, Engineering %26 Applications. Wyld, DC, Zizka, J, Nagamalai, D, eds. Berlin and Heidelberg: Springer; 2012, 119–123. https://doi.org/10.1007/978‐3‐642‐30111‐7_12.
Wang, Y, Du, Y, Chen, S. The understanding between two agent crawlers based on domain ontology. In: 2009 Int. Conf. Comput. Intell. Nat. Comput., IEEE, 2009, 47–50. https://doi.org/10.1109/CINC.2009.204.
Singhal, N, Agarwal, RP, Dixit, A, Sharma, AK. Information retrieval from the Web and application of migrating crawler. In: 2011 International Conference of the Comput. Intell. Commun. Networks, IEEE, 2011, 476–480. https://doi.org/10.1109/CICN.2011.99.
Bal, S, Nath, R. A novel approach to filter non‐modified pages at remote site without downloading during crawling. In: 2009 Int. Conf. Adv. Recent Technol. Commun. Comput., IEEE, 2009, 165–168. https://doi.org/10.1109/ARTCom.2009.11.
Nath, R, Bal, S. A novel mobile crawler system based on filtering off non‐modified pages for reducing load on the network. Int Arab J Inf Technol 2011, 8:272–279.
Pahal, N. Security on mobile agent based crawler (SMABC). Int J Comput Appl 2010, 1:5–11.
Gao, Q, Xiao, B, Lin, Z, Chen, X, Zhou, B. A high‐precision forum crawler based on vertical crawling. In: 2009 IEEE Int. Conf. Netw. Infrastruct. Digit. Content, IEEE, 2009, 362–367. https://doi.org/10.1109/ICNIDC.2009.5360990.
Sachan, A, Lim, W‐Y, Thing, VLL. A generalized links and text properties based forum crawler. In: Proceedings of 2012 IEEE/WIC/ACM International Conference on Web Intelligent Agent Technology Washington, DC, IEEE Computer Society, 2012, 01, 113–120. http://dl.acm.org/citation.cfm?id=2457524.2457671
Heydon, A, Najork, M. Mercator: a scalable, extensible Web crawler. World Wide Web 1999, 2:219–229. https://doi.org/10.1023/A:1019213109274.
Chen, R, Desai, BC, Zhou, C. CINDI robot: an intelligent Web crawler based on multi‐level inspection. Proc. Int. Database Eng. Appl. Symp. IDEAS, 2007, 93–101. https://doi.org/10.1109/IDEAS.2007.4318093.
Cleverdon, CW, Keen, M. Aslib Cranfield research project‐factors determining the performance of indexing systems. Volume 2, Test results., Technical Report, 1966.
Rijsbergen, V, Joost, C. Foundation of evaluation. Journal of Documentation 1974, 30:365–373.
Kausar, A. Web crawler: a review. Int J Comput Appl 2013, 63:31–36.