Search Engine and Web-related Publications from Frank McCown's Ph.D. work at Old Dominion University and later work at Harding University

Ramakrishna, M., and Zobel, J. Performance in Practice of String Hashing Functions

Moffat, A., and Zobel, J. Self-Indexing Inverted Files for Fast Text Retrieval

Zobel, J., and Moffat, A. Inverted files for text search engines

Williams, H., Zobel, J., and Bahle, D. Fast phrase querying with combined indexes

Heinz, S., and Zobel, J. Efficient Single-Pass Index Construction for Text Databases

Dean, J., and Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters

Barroso, L.A., Dean, J., and Hölzle, U. Web Search for a Planet: the Google Cluster Architecture

Ferragina, P., and Venturini, R. The Compressed Permuterm Index

Incremental Encoding article from Wikipedia

Chapter 19 Supplements

Rabin Fingerprint article from Wikipedia

Broder, A., Glassman, S., Manasse, M., Zweig, G. Syntactic Clustering of the Web

Broder, A. Presentation on Algorithms for Near-Duplicate Documents

Schonfeld, U., Keidar, I., and Bar Yossef, Z. DUST (Different URLs Similar Text)

Vassilvitskii, S. Duplicate Detection on the Web

Gulli, A. C++ Implementation of Broder's Shingles

Adamic, L., and Huberman, B. Zipf's Law and the Internet

Baykan, E., Henzinger, M., Keller, S., de Castelberg, S., and Kinzler, M. A Comparison of Techniques for Sampling Web Pages

Henzinger, M. Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms

Chapter 20 Supplements

Menczer, F. Web Crawling

Pant, G., Srinivasan, P., and Menczer, F. Multi-threaded crawlers in Java

Heydon, A. and Najork, M. Mercator: A Scalable, Extensible Web Crawler (use cached PDF)

A Standard for Robot Exclusion

Bharat, K., Broder, A., Henzinger, M., Kumar, P., and Venkatasubramanian, S., The Connectivity Server: fast access to linkage information on the Web

Randall, K., Stata, R., Wickremesinghe, R., and Wiener, J. The Link Database: Fast Access to Graphs of the Web

Boldi, P., and Vigna, S. The WebGraph Framework I: Compression Techniques

Boldi, P., and Vigna, S. Codes for the World-Wide Web

Baeza-Yates, R., Castillo, C., and Efthimiadis, E. Characterization of National Web Domains

Chapter 21 Supplements

Scientist Finds PageRank-Type Algorithm from the 1940s

How Google Ranks Tweets

Mcbryan, O. GENVL and WWWW: Tools for Taming the Web Use cached PDF version.

Wikipedia article about Albert-László Barabási

Barabási, A.-L., Albert, R., and Jeong, H. Diameter of World-Wide Web

Geller, N. On the Citation Influence Methodology of Pinski and Narin Link no longer available, even via the Internet Archive.

Pinski, G., and Narin, F. CITATION INFLUENCE FOR JOURNAL AGGREGATES OF SCIENTIFIC PUBLICATIONS: THEORY, WITH APPLICATION TO THE LITERATURE OF PHYSICS

Garfield, E. Citation Indexes to Science: A New Dimension in Documentation through Association of Ideas

Brin, S., and Page, L. The Anatomy of a Large-Scale Hypertextual Web Search Engine

Langville, A. Presentation about the PageRank textbook

Moler, C. The World's Largest Matrix Computation

Wu, F., and Huberman, B. Persistence and Success in the Attention Economy (showing Zipf-like phenomena in YouTube)

Greenwald, A., and Wicks, J. QuickRank: A Recursive Ranking Algorithm. In Proceedings of the 1st International Workshop on Computational Social Choice (Dec 2006), pp. 220-233. [ pdf ]

Pandurangan, G., Raghavan, P., and Upfal, E. Using PageRank to characterize web structure. Internet Mathematics 2 (2005), 217-236. [ pdf ]

Crawling the Infinite Web: Five Levels are Enough , by Baeza-Yates and Castillo

Enhancing Web Search by Promoting Multiple Search Engine Use

BrowseRank: letting web users vote for page importance

  • Y. Sun, Z. Zhuang, I. Councill, C.L. Giles, "Determining Bias to Search Engines from Robots.txt," Proceedings of IEEE/WIC/ACM Internation Conference on Web Intelligence (WI 2007), 149-155, 2007.
  • B. Sun, Q. Tan, P. Mitra, C.L. Giles, "Extraction and Search of Chemical Formulae in Text Documents on the Web," Proceedings of the 16th International World Wide Web Conference (WWW 2007), 251-260, 2007. (Nominated for best student paper award.)