Pdf search engines information retrieval in practice




















The primary target audience for the book is undergraduates in computer science. Hence, the book is written in a language that most people who have a basic understanding of computers and the Web can follow though certain sections of the book may require background on probability and statistics. The book is organized in quite a top-down fashion, with a will to cover and categorize every concept under a heading, which occasionally makes the placement of concepts in the hierarchy and sequencing of sections disputable.

Some of these issues will be pointed out throughout the detailed discussion of chapters. The book is composed of 13 chapters. The first two chapters are introductory chapters, providing an overview of key concepts and issues in search engine design.

Chapters three through eight, where basic building blocks of a search engine are explored, form the core of the book. The final three chapters can be seen as supplementary chapters or extensions as they do not fully fit into the theme of the book.

The rest of this review iterates over the chapters of the book in sequence, summarizing their content.

The book starts with Chapter 1 The Big Issues , a short chapter trying to establish a connection between information retrieval and search engines. The "big issues" in information retrieval are summarized as relevance, evaluation, and information needs.

For search engines, these issues are extended to include performance, data incorporation, scalability, adaptability, and specific problems. This chapter provides a sufficiently well written overview to search engines. Chapter 2 Architecture of a Search Engine goes into more depth on the issues given in the previous chapter and discusses search engines from an architectural point of view. A basic but general architecture is described, where functions of the search engine are divided into two as the indexing process further divided as text acquisition, text transformation, and index creation and the query process further divided as user interaction, ranking, and evaluation.

More than 20 different search engine components are classified under these headings. Although this chapter provides a very good coverage of relevant topics, the classification of topics is occasionally problematic e.

Perhaps, it would have been better to replace "performance optimization" and "distribution" headings as "search performance" and discuss performance issues at different granularities such as node-level, cluster-level, and data-center-level issues. As a side note, in this chapter, the topics under each heading are listed in boxes on the margins of pages. This is a very useful convention, which is, unfortunately, not followed in succeeding chapters.

Issues related to discovery, acquisition, conversion, and storage of text are discussed in Chapter 3 Crawls and Feeds. The topics covered also include character encoding issues, duplicate document detection, and identification of document structures. Interestingly, only 15 pages are dedicated to Web crawling, which is probably one of the most important components in a search engine, and almost the entire "The Web Crawler" section is allocated to the politeness issue, which should have its own subsection.

Consequently, some important concepts are either omitted e. Moreover, a high variation is observed in the level of provided details. For example, while the discussion about politeness remains at the level of "angry site owners", the freshness discussion involves integrals over Poisson distributions.

Sections 3. Chapter 4 Processing Text is the strongest chapter in the book together with Chapter 5, but excluding Sections 5. The three main topics that the chapter discusses are text statistics, document parsing, and link analysis. Several interesting text processing problems are investigated with a statistical perspective. These sample problems are well selected and could be very motivating for the students. Document parsing covers standard issues such as tokenization, stopword elimination, stemming, and phrase identification.

The section on link analysis is mainly dedicated to PageRank, and to a short extent, discusses link spam. Other topics included in this chapter are text processing in non- English languages and information extraction.

Information Retrieval: An Overview. Information retrieval IR is the field of computer science that deals with the processing of documents containing free text, so that they can be rapidly retrieved based on keywords specified in a … Expand. Abstract Lucene is a Java library which is able to perform the indexing and searching process. It allows the development of a text-based information retrieval systems or applications such as search … Expand.

View 1 excerpt, cites background. Highly Influenced. View 5 excerpts, cites background and methods. Effective focused retrieval by exploiting query context and document structure.

View 2 excerpts, cites background and methods. View 9 excerpts, cites background and methods. Description For introductory information retrieval courses at the undergraduate and graduate level in computer science, information science and computer engineering departments.

Many of the programming exercises require the use, modification, and extension of Galago components. Practical treatment of the IR field gives aspiring search engineers the understanding and tools they need to evaluate, compare, and modify search engines. Broad, yet concise, coverage of only the most important issues and techniques in search engines includes the underlying IR and mathematical models to reinforce key concepts. Instructor supplements include lecture slides in PDF and PPT format , solutions to select end-of-chapter problems, test collections for exercises, and the Galago search engine.

Share a link to All Resources. Instructor Resources. Websites and online courses. Other Student Resources. About the Author s. Relevance Ranking for Vertical Search Engines. In plain, uncomplicated language, and using detailed examples to explain the key concepts, models, and algorithms in vertical search ranking, Relevance Rankingfor Vertical Search Engines teaches readers how to manipulate ranking algorithms to achieve better results in real-world applications.

This reference book for professionalscovers concepts and theories from the fundamental to the advanced, such as relevance, query intention, location-based relevance ranking, and cross-property ranking. It covers the most recent developments in vertical search ranking applications, such as freshness-based relevance theory for new search applications, location-based relevance theory for local search applications, and cross-property ranking theory for applications Introduction to Video Search Engines.

The evolution of technology has set the stage for the rapid growth of the video Web: broadband Internet access is ubiquitous, and streaming media protocols, systems, and encoding standards are mature.



0コメント

  • 1000 / 1000