Search Suggestions from Previously Submitted Searcher Queries

I came across an interesting Search Engine Land post last week. It inspired me to search and see if I could find a patent that might be related to it from Google:

Google is suggesting searches based on users’ recent activity

I tried to reproduce the search suggestions that were being shown to the author of the Search Engine Land article, but Google would not return those to me. Google may be experimenting with a limited number of searchers rather than showing those results to everyone. I did find a patent discussing search suggestions that were similar.

When Google shows a search suggestion about something you may have searched for in the past, that predicted suggestion is likely related to a patent I’ve written about before, Autocompletion using previously submitted query data.

I wrote about that patent being updated in a continuation patent, but hadn’t provided much in the way of details about how it works at: How Google Predicts Autocomplete Query Suggestions is Updated.

There are some interesting parts about how search suggestions are identified and ranked, which inspired me to write this post.

Search Suggestions Based on Previously Submitted Query Data

The description of this patent starts off by telling us that it is about: “using previously submitted query data to anticipate a user’s search request.”

That pinpoints that Google has a long memory, and it remembers a lot about what someone might search for.

This patent description also includes a lot of the assumptions that search engineers make about searchers (often an interesting reason to read through patents). Here are some from this patent that are worth thinking about:

Internet search engines aim to identify documents or other items that are relevant to a user’s needs and to present the documents or items in a manner that is most useful to the user. Such activity often involves a fair amount of mind-reading–inferring from various clues what the user wants. Certain clues may be user-specific. For example, the knowledge that a user is making a request from a mobile device, and knowledge of the location of the device, can result in much better search results for such a user.

Clues about a user’s needs may also be more general. For example, search results can have elevated importance, or inferred relevance, if a number of other search results link to them. If the linking results are themselves highly relevant, then the linked-to results may have particularly high relevance. Such an approach to determining relevance may be premised on the assumption that, if authors of web pages felt that another web site was relevant enough to be linked to, then web searchers would also find the site to be particularly relevant. In short, the web authors “vote up” the relevance of the sites.

Other various inputs may be used instead of, or in addition to, such techniques for determining and ranking search results. For example, user reactions to particular search results or search result lists may be gauged, so that results on which users often click will receive a higher ranking. The general assumption under such an approach is that searching users are often the best judges of relevance, so that if they select a particular search result, it is likely to be relevant, or at least more relevant than the presented alternatives.

A Summary of the Search Suggestions Process Based on Previous Submitted Queries

Like most patents, the Description for this one starts out with a summary section that provides an overview of how the process defined in the patent works. It is followed by a “Detailed Description” section that goes into more depth and provides details about how search at Google works, and how specific aspects of search at Google power this search suggestion process. So read about how search suggestions might be provided based upon user queries that have been searched for previously, and then read for the more detailed explanation, which goes way beyond autocomplete.

In the summary section of the description for the patent, we are told about how the patent may address those assumptions:

When anticipating user search requests, responding to the algorithm in this patent can involve certain methods for processing query information. Those include:

Receiving query information at a server system, with a portion of a query from a searcher
Obtaining a set of predicted queries relevant to the portion of the searcher’s query based on query and data indicative of the searcher relative to previously submitted queries
Providing the set of predicted queries to the searcher

The patent also points out additional features involved in the process such as obtaining the predicted queries including ordering the set of predicted queries based upon ranking criteria.

Those ranking criteria may be based upon the data indicative of searcher’s behavior relative to previously submitted queries.

Data about the searcher’s behavior regarding those previously submitted queries may include:

Click data
Location-specific data
Language-specific data
Other similar types of data

The patent points out the following as advantages of following the process described in the patent:

A search assistant receives query information from a search requestor, prior to a searcher completely inputting the query.

Information associated with previous user (or users) searches (such as click data associated with search results) is collected. From the query information and the previous search information, a set of predicted queries is produced and provided to the search requestor for presentation.

The patent can be found at:

Autocompletion using previously submitted query data
Inventors: Michael Herscovici, Dan Guez, and Hyung-Jin Kim
Assignee: Google Inc.
US Patent: 9,740,780
Granted: August 22, 2017
Filed: December 1, 2014

Abstract

A computer-implemented method for processing query information includes receiving query information at a server system. The query information includes a portion of a query from a search requestor. The method also includes obtaining a set of predicted queries relevant to the portion of the search requestor query based upon the portion of the query from the search requestor and data indicative of search requestor behavior relative to previously submitted queries. The method also includes providing the set of predicted queries to the search requestor.

Analysis of Ranking and Selection of Search Suggestions Based Upon Previous Query Data

The “Detailed Description” section of this search suggestions patent provides some insightful analysis about search at Google.

Relevance and Backlinks and a Rank Modifying Engine Lead to Ranking For Many Results at Google

This patent points out some of how search works at Google. It tells us that:

The purpose of the process in the patent is to “improve the relevance of results obtained from submitting search queries.”
It describes the ranking of documents for a query as something that can be “performed using traditional techniques for determining an information retrieval (IR) score for indexed documents in view of a given query.” And relevance of a particular document with respect to a query term may be determined by a technique, such as looking at the general level of back-links to a document that contain matches for a search term that may be used to infer a document’s relevance. As the patent tells us:

In particular, if a document is linked to (e.g., is the target of a hyperlink) by many other relevant documents (e.g., documents that also contain matches for the search terms), it can be inferred that the target document is particularly relevant. This inference can be made because the authors of the pointing documents presumably point, for the most part, to other documents that are relevant to their audience.
We are given more details about some results being even more relevant than ones with backlinks. We are told that:

If the pointing documents are in turn the targets of links from other relevant documents, they can be considered more relevant, and the first document can be considered particularly relevant because it is the target of relevant (or even highly relevant) documents. Such a technique may be the determinant of a document’s relevance or one of multiple determinants. The technique is exemplified in some systems that treat a link from one web page to another as an indication of quality for the latter page, so that the page with the most such quality indicators is rated higher than others. Appropriate techniques can also be used to identify and eliminate attempts to cast false votes so as to artificially drive up the relevance of a page.
There is another step that could potentially make some results even more relevant that involve what is referred to as a rank modifier engine:

To further improve such traditional document ranking techniques, the ranking engine can receive an additional signal from a rank modifier engine to assist in determining an appropriate ranking for the documents. The rank modifier engine provides one or more prior models, or one or more measures of relevance for the documents based on one or more prior models, which can be used by the ranking engine to improve the search results’ ranking provided to the user. In general, a prior model represents a background probability of document result selection given the values of multiple selected features, as described further below. The rank modifier engine can perform one or more of the operations described below to generate the one or more prior models, or the one or more measures of relevance based on one or more prior models.

This is a more detailed description of ranking than we normally see at Google. The section above references a Rank Modifier Engine that will be described in more detail further down this post

Indexing, Scoring, Ranking, and Rank Modifier Engine

The information retrieval system from this patent includes a number of different components:

Indexing engine
Scoring engine
Ranking engine
Rank modifier engine

The indexing engine can function as described in the section above for the indexing engine.

Scoring Engine

In addition, a scoring engine may provide scores for document results based on many different features including:

Content-based features that link a query to document results
query-independent features that generally indicate the quality of document results

Content-based features include aspects of document format, such as Query matches to title or anchor text in an HTML (HyperText Markup Language) page.

The query-independent features can include aspects of document cross-referencing, such as a rank of the document or the domain.

Moreover, the particular functions used by the scoring engine can be tuned, to adjust the various feature contributions to the final IR score, using automatic or semi-automatic processes.

Ranking Engine

A ranking engine can produce a ranking of document search results for display to a searcher based on IR scores received from the scoring engine and possibly one or more signals from the rank modifier engine.

A tracking component may be used to record information about individual searcher selections of the search results presented in the ranking. The patent describes how selections may be tracked using javascript or a proxy system or a toolbar plugin:

For example, the tracking component can be embedded JavaScript code included in a web page ranking that identifies user selections (clicks) of individual document results and also identifies when the user returns to the results page, thus indicating the amount of time the user spent viewing the selected document result. In other implementations, the tracking component can be a proxy system through which user selections of the document results are routed, or the tracking component can include pre-installed software at the client (e.g., a toolbar plug-in to the client’s operating system). Other implementations are also possible, such as by using a feature of a web browser that allows a tag/directive to be included in a page, which requests the browser to connect back to the server with message(s) regarding link(s) clicked by the user.

That selection information may also be logged, which could capture for each selection:

the query (Q)
the document (D)
the time (T) on the document
the language (L) employed by the user
the country (C) where the user is likely located (e.g., based on the server used to access the IR system).

Also other information may be recorded about a searcher’s interactions with presented rankings:

Negative information, such as the fact that a document result was presented to a user, but was not clicked
Position(s) of click(s) in the user interface
IR scores of clicked results
IR scores of all results shown before the clicked result
Titles and snippets shown to the user before the clicked result
The user’s cookie
Cookie age
IP (Internet Protocol) address
User agent of the browser
Etc

More information can be recorded (as described in this post below) about building a prior model.

Rank Modifier Engine

Similar information (e.g., IR scores, position, etc.) may be recorded for an entire session, or multiple sessions of a searcher, including possibly recording it for every click that occurs both before and after a current click.

Information that is stored in the result selection logs may be used by the rank modifier engine to generate one or more signals to the ranking engine.

Information stored in the search results selection logs along with the information collected by the tracking component may also be accessible by a search assistant, which is also a component of the information retrieval system.

Along with receiving information from these components, the search assistant could also monitor a user’s entry of a search query.

On receiving a partial search query, the query along with the information (e.g., click data) from the tracking component and the results selection log(s) may be used to predict a searcher’s contemplated complete query.

Based on this information, predictions may be ordered according to one or more ranking criteria before being presented to assist the user in completing the query.

Presentation of a Search Suggestion

As a searcher enters a search query, the searcher’s input is monitored.

Before the searcher signals that they have completed entering the search query, a portion of the query is sent to the search engine.

Also, data such as click data (or other types of previously collected information) may also be sent with the query portion.

The portion of the query sent may be:

A few characters
A search term
More than one search term
Any other combination of characters and terms

The search engine receives the partial query and the data (e.g., click data) for processing and makes predictions) as to the searcher’s contemplated complete query.

Relevant information may be retrieved for processing with the received partial query to produce search suggestions predictions.

Predictions may be ordered according to one or more ranking criteria.

So, queries that have been submitted at a higher frequency may be ordered before queries submitted at lower frequencies.

The search engine may also use various types of information for ranking and ordering predicted queries as search suggestions.

Information about previously entered search queries may be used to make ordered predictions.

Previous queries may include search queries associated with the same user, another user, or from a community of users.

If one of the predicted queries is what the searcher intended as the desired query, the searcher may select that predicted query and proceed without having to finish entering the desired query.

Alternatively, if the predicted queries do not reflect what the searcher had in mind, then the searcher can continue entering the desired search query, which could trigger one or more other sets of search suggestions.

Ranking User Submitted Previous Queries as Search Suggestions

The patent tells us that a few different processes may be used in ranking and ordering predicted search queries:

Predicted search queries may be ordered in accordance with a frequency of submission by a community of users
Time constraints may also be used with search queries ordered in accordance with the last time/date value that the query was submitted
Personalization information or community information may be used such as information about subjects, concepts or categories of information that are of interest to the user (from prior search or browsing information)
Personalization may also be from a group that the searcher is associated with or belongs to (a member or an employee.)
According to a first ranking criteria, such as predefined popularity criteria, and then possibly reordered if any of the predicted search queries match the user personalization information of the user, to place the matching predicted search queries at or closer to the top of the ordered set of predicted search queries
Information provided by the tracking component and the result selection log(s) might be used for ranking and ordering the predicted search queries. (click data, language-specific, and country-specific data.)
Processed click data (e.g., aggregated click data for a given query) could be used for ranking and ordering predicted search queries – or each query a score may be calculated by summing click data (e.g., weighted clicks, etc.) on documents associated with the query, and predicted queries may be ordered based upon the score (e.g., higher values representing better)

An Information Model Based On Previously Submitted Query Data to Obtain Search Suggestions Predictions

This model can be used to predict what query data might satisfy a searcher the most by looking at long click information. A timer can be used to track how long a user views or “dwells” on a document.

The amount of time is referred to as “click data”.

A longer time spent dwelling on a document, would be termed a “long click”, and can indicate that a user found the document to be relevant for their query.

A brief period viewing a document would be termed a “short click”, and can be interpreted as a lack of document relevance.

Click data is a count of each click type (e.g., long, medium, short) for a particular query and document combination.

This aggregated click data from model queries for a given document can be used to create a quality of result statistic for that document to enhance a ranking of that document.

Quality of result statistic can be a weighted average of the count of long clicks for a given document and query.

This description from the patent tells us about how click data might be stored in tuples:

A search engine (e.g., the search engine) or other processes may create a record in the model for documents that are selected by users in response to a query or a partial query. Each record within the model (herein referred to as a tuple: <document, query,=”” data=””>) is at least a combination of a query submitted by users, a document reference selected by users in response to that query, and an aggregation of click data for all users that select the document reference in response to the query. The aggregate click data can be viewed as an indication of document relevance. In various implementations, model data can be location-specific (e.g. country, state, etc) or language-specific. For example, a country-specific tuple would include the country from where the user query originated from in whereas a language-specific tuple would include the language of the user query. Other extensions of model data are possible.</document,>

The model may also include Post-click behavior that has been tracked by the tracking component.

This patent does include a lot of information about how Google might use click tracking data when ranking search suggestion predictions. It tells us about the data that could be collected about clicks:

The information gathered for each click can include:

(1) the query (Q) the user entered,
(2) the document result (D) the user clicked on,
(3) the time (T) on the document,
(4) the interface language (L) (which can be given by the user),
(5) the country (C) of the user (which can be identified by the host that they use, such as www-store-co-uk to indicate the United Kingdom), and
(6) additional aspects of the user and session.

The time (T) can be measured as the time between the initial click through to the document result until the time the user comes back to the main page and clicks on another document result. Moreover, an assessment can be made about the time (T) regarding whether this time indicates a longer view of the document result or a shorter view of the document result, since longer views are generally indicative of quality for the clicked through result. This assessment about the time (T) can further be made in conjunction with various weighting techniques.

Beyond Long Clicks

We are also told that document views from the selections can be weighted based on viewing length information to produce weighted views of the document result.

So, rather than simply distinguishing long clicks from short clicks, a wider range of click through viewing times can be included in the assessment of result quality, where longer viewing times in the range are given more weight than shorter viewing times.

Predicted Search Suggestions

Google will sometimes display search suggestions using autocomplete and also based upon user data from previous queries from a searcher’s previous search history, or the history of someone whom the searcher may be associated with, such as a fellow member of an organization or a co-worker.

While results related to those previous queries were ranked based upon such things as relevance and backlinks, the search suggestions may include results that searchers spent long clicks upon, including long times viewing.

So pursuant to this patent, predictions about search suggestions chosen using autocomplete may best meet a searcher’s informational needs by being searches that include results remembered as resulting in long clicks and long viewing times.

Source link

Facebook Tweet LinkedIn

See All Blogs