Search Engine Interface and Indexing Problem.

Authors Avatar

Search Engine Interface and Indexing Problem

Scenario:

Using Internet search engines to find relevant answers to a specific query or for data mining has been a frustrating information overloading experience for many Internet users.  Most of the links in the search results returned by a search engine are links to Webpages with contents irrelevant to the question a user has in mind when posing the query to the search engine.  To date, search engine companies are either just breaking even or making losses due to the limited revenue generated from the current Internet advertising business model, and they continue to search for a solution to address these pressing unsolved problems.

Solution:

The new W6H search engine interface and indexing architecture.

Potential Clients:

The search engine development community in the search engine race for functionality and popularity, and Fortune 1000 media and publishing corporations currently seeking effective Internet portal strategy.

This solution is strong in substance for proposals and recommendations to top search engines (Excite, Infoseek, Lycos, Yahoo!) to solve their top problem in delivering “query-results relevance”, and to media and publishing corporations seeking to establish more than Internet presence with effective Internet portal strategies.

Objective:

Increase the relevant information to junk information ratio in the results of a search query by using returned by a search engine.

Description of Solution:

Search W6H is an intelligent context-based search engine interface and indexing architecture.

W6H stands for What, Where, Who, Which, When, Why and How.

The indexing approach of W6H is specifically suitable for search engines serving the Internet masses such as Excite, Infoseek, Lycos and Yahoo!  The approach gives a significant competitive advantage to search engines that adopt it.

According to general survey, when one wants to use a search engine, for most of the time one already has a particular question in mind and wish to look on the Internet for answers to the question.

We shall make use of the following observation:

All search queries can be broken down as questions that can be classified into 7 categories.

Each basic query is then a question of one of the following:

What, Where, Who, Which, When, Why and How.

(To verify this observation, try to see if you can think of an exception or a counterexample).

For example, a college student wants to learn more about the history of US Independence Day via the Web.

Coincidentally, a journalist is writing an online news article about and a brief historical account of the US Independence Day.  The journalist, being the author of the Webpage, can think of questions that readers might ask which the Webpage can provide the answers, and then establishes the contexts for keywords which the author wish to be indexed in the following format:

“US Independence Day”

  • When        → indexed to the keyword “July 4” that appears on the Webpage

        → answer to the possible search query of “When is the Independence Day?”

  • What        → indexed to the entire Webpage

        → answer to the possible search query of “What is US Independence Day?”

  • Who        → indexed to the keywords "Thomas Jefferson" or a portrait image that appears on the Web page

         answer to the possible search query of “Who were the key people involved?”

  • How        → indexed to the section with the history of how independence was won.

        → answer to the possible search query of “How did it happen?”

The questions of Where, Which and Why of US Independence Day may not be meaningful or may not be covered by the Web page and so may be left out.

When the college student keys in the search query keywords “US Independence Day” and clicks on one of the When, What, Who and How buttons, the search engine is able to return relevant and meaningful search results answering the question the student has in mind, and the Web pages linked from the search engine results will be scrolled to the indexed keyword, block or image.

Join now!

Thus, instead of just indexing the occurrence of certain keywords on a Web page, we have an indexing mechanism for attaching a corresponding context of a keyword or key phrase to not only another keyword but a particular block of text or images or an entire section on the Web page, or even the entire Web page itself, and for the search engine to directly deliver meaningful search results given a search query.

The "How" context is most useful for Web pages of operating manuals, guides to various activities (sports/games), and general instructions.

The "Why" context is ...

This is a preview of the whole essay