Need Help? Talk to
I came across an interesting Search Engine Land post last week. It inspired me to search and see if I could find a patent that might be related to it from Google:
Google is suggesting searches based on users’ recent activity
I tried to reproduce the search suggestions that were being shown to the author of the Search Engine Land article, but Google would not return those to me. Google may be experimenting with a limited number of searchers rather than showing those results to everyone. I did find a patent discussing similar search suggestions.
When Google shows a search suggestion about something you may have searched for in the past, that predicted suggestion is likely related to a patent I’ve written about before, Autocompletion using previously submitted query data.
I wrote about that patent being updated in a continuation patent, but hadn’t provided much in the way of details about how it works at: How Google Predicts Autocomplete Query Suggestions is Updated.
There are some interesting parts about how search suggestions are identified and ranked, which inspired me to write this post.
The description of this patent starts by telling us that it is about: “using previously submitted query data to anticipate a user’s search request.”
That pinpoints that Google has a long memory, and it remembers a lot about what someone might search for.
This patent description also includes a lot of the assumptions that search engineers make about searchers (often an interesting reason to read through patents). Here are some from this patent that is worth thinking about:
Internet search engines aim to identify documents or other items that are relevant to a user’s needs and to present the documents or items in a manner that is most useful to the user. Such activity often involves a fair amount of mind-reading–inferring from various clues what the user wants. Certain clues may be user-specific. For example, the knowledge that a user is requesting a mobile device, and knowledge of the location of the device, can result in much better search results for such a user.
Clues about a user’s needs may also be more general. For example, search results can have elevated importance, or inferred relevance, if several other search results link to them. If the linking results are themselves highly relevant, then the linked-to results may have particularly high relevance. Such an approach to determining relevance may be premised on the assumption that, if authors of web pages felt that another web site was relevant enough to be linked to, then web searchers would also find the site to be particularly relevant. In short, the web authors “vote up” the relevance of the sites.
Other various inputs may be used instead of, or in addition to, such techniques for determining and ranking search results. For example, user reactions to particular search results or search result lists may be gauged, so that results on which users often click will receive a higher ranking. The general assumption under such an approach is that searching users are often the best judges of relevance, so that if they select a particular search result, it is likely to be relevant, or at least more relevant than the presented alternatives.
Like most patents, the Description for this one starts with a summary section that provides an overview of how the process defined in the patent works. It is followed by a “Detailed Description” section that goes into more depth and provides details about how search at Google works, and how specific aspects of search at Google power this search suggestion process. So read about how search suggestions might be provided based upon user queries that have been searched for previously, and then read for the more detailed explanation, which goes way beyond autocomplete.
In the summary section of the description for the patent, we are told about how the patent may address those assumptions:
When anticipating user search requests, responding to the algorithm in this patent can involve certain methods for processing query information. Those include:
The patent also points out additional features involved in the process such as obtaining the predicted queries including ordering the set of predicted queries based upon ranking criteria.
Those ranking criteria may be based upon the data indicative of searcher’s behavior relative to previously submitted queries.
Data about the searcher’s behavior regarding those previously submitted queries may include:
The patent points out the following as advantages of following the process described in the patent:
A search assistant receives query information from a search requestor before a searcher completely inputting the query.
Information associated with previous user (or users) searches (such as click data associated with search results) is collected. From the query information and the previous search information, a set of predicted queries is produced and provided to the search requestor for presentation.
The patent can be found at:
Autocompletion using previously submitted query data
Inventors: Michael Herscovici, Dan Guez, and Hyung-Jin Kim
Assignee: Google Inc.
US Patent: 9,740,780
Granted: August 22, 2017
Filed: December 1, 2014
A computer-implemented method for processing query information includes receiving query information at a server system. The query information includes a portion of a query from a search requestor. The method also includes obtaining a set of predicted queries relevant to the portion of the search requestor query based upon the portion of the query from the search requestor and data indicative of search requestor behavior relative to previously submitted queries. The method also includes providing the set of predicted queries to the search requestor.
The “Detailed Description” section of this search suggestions patent provides some insightful analysis about search at Google.
This patent points out some of how search works at Google. It tells us that:
In particular, if a document is linked to (e.g., is the target of a hyperlink) by many other relevant documents (e.g., documents that also contain matches for the search terms), it can be inferred that the target document is particularly relevant. This inference can be made because the authors of the pointing documents presumably point, for the most part, to other documents that are relevant to their audience.
If the pointing documents are in turn the targets of links from other relevant documents, they can be considered more relevant, and the first document can be considered particularly relevant because it is the target of relevant (or even highly relevant) documents. Such a technique may be the determinant of a document’s relevance or one o multiple determinants. The technique is exemplified in some systems that treat a link from one web page to another as an indication of quality for the latter page so that the page with the most such quality indicators is rated higher than others. Appropriate techniques can also be used to identify and eliminate attempts to cast false votes to artificially drive up the relevance of a page.
To further improve such traditional document ranking techniques, the ranking engine can receive an additional signal from a rank modifier engine to assist in determining an appropriate ranking for the documents. The rank modifier engine provides one or more prior models, or one or more measures of relevance for the documents based on one or more prior models, which can be used by the ranking engine to improve the search results’ ranking provided to the user. In general, a prior model represents a background probability of document result selection given the values of multiple selected features, as described further below. The rank modifier engine can perform one or more of the operations described below to generate the one or more prior models, or the one or more measures of relevance based on one or more prior models.
This is a more detailed description of ranking than we normally see at Google. The section above references a Rank Modifier Engine that will be described in more detail further down this post
The information retrieval system from this patent includes a number of different components:
The indexing engine can function as described in the section above for the indexing engine.
In addition, a scoring engine may provide scores for document results based on many different features including:
Content-based features include aspects of document format, such as Query matches to a title or anchor text in an HTML (HyperText Markup Language) page.
The query-independent features can include aspects of document cross-referencing, such as a rank of the document or the domain.
Moreover, the particular functions used by the scoring engine can be tuned, adjust the various feature contributions to the final IR score, using automatic or semi-automatic processes.
A ranking engine can produce a ranking of document search results for display to a searcher based on IR scores received from the scoring engine and possibly one or more signals from the rank modifier engine.
That selection information may also be logged, which could capture for each selection:
Also other information may be recorded about a searcher’s interactions with presented rankings:
More information can be recorded (as described in this post below) about building a prior model.
Similar information (e.g., IR scores, position, etc.) may be recorded for an entire session, or multiple sessions of a searcher, including possibly recording it for every click that occurs both before and after a current click.
Information that is stored in the result selection logs may be used by the rank modifier engine to generate one or more signals to the ranking engine.
Information stored in the search results selection logs along with the information collected by the tracking component may also be accessible by a search assistant, which is also a component of the information retrieval system.
Along with receiving information from these components, the search assistant could also monitor a user’s entry of a search query.
On receiving a partial search query, the query along with the information (e.g., click data) from the tracking component and the results selection log(s) may be used to predict a searcher’s contemplated complete query.
Based on this information, predictions may be ordered according to one or more ranking criteria before being presented to assist the user in completing the query.
As a searcher enters a search query, the searcher’s input is monitored.
Before the searcher signals that they have completed entering the search query, a portion of the query is sent to the search engine.
Also, data such as click data (or other types of previously collected information) may be sent with the query portion.
The portion of the query sent may be:
The search engine receives the partial query and the data (e.g., click data) for processing and makes predictions) as to the searcher’s contemplated complete query.
Relevant information may be retrieved for processing with the received partial query to produce search suggestions predictions.
Predictions may be ordered according to one or more ranking criteria.
So, queries that have been submitted at a higher frequency may be ordered before queries submitted at lower frequencies.
The search engine may also use various types of information for ranking and ordering predicted queries as search suggestions.
Information about previously entered search queries may be used to make ordered predictions.
Previous queries may include search queries associated with the same user, another user, or from a community of users.
If one of the predicted queries is what the searcher intended as the desired query, the searcher may select that predicted query and proceed without having to finish entering the desired query.
Alternatively, if the predicted queries do not reflect what the searcher had in mind, then the searcher can continue entering the desired search query, which could trigger one or more other sets of search suggestions.
The patent tells us that a few different processes may be used in ranking and ordering predicted search queries:
This model can be used to predict what query data might satisfy a searcher the most by looking at long click information. A timer can be used to track how long a user views or “dwells” on a document.
The amount of time is referred to as “click data”.
A longer time spent dwelling on a document would be termed a “long click”, and can indicate that a user found the document to be relevant for their query.
A brief period viewing a document would be termed a “short click”, and can be interpreted as a lack of document relevance.
Click data is a count of each click type (e.g., long, medium, short) for a particular query and document combination.
This aggregated click data from model queries for a given document can be used to create a quality of result statistic for that document to enhance a ranking of that document.
Quality of result statistic can be a weighted average of the count of long clicks for a given document and query.
This description from the patent tells us about how click data might be stored in tuples:
A search engine (e.g., the search engine) or other processes may create a record in the model for documents that are selected by users in response to a query or a partial query. Each record within the model (herein referred to as a tuple:
) is at least a combination of a query submitted by users, a document reference selected by users in response to that query, and aggregation of click data for all users that select the document reference in response to the query. The aggregate click data can be viewed as an indication of document relevance. In various implementations, model data can be location-specific (e.g. country, state, etc) or language-specific. For example, a country-specific tuple would include the country from where the user query originated from in whereas a language-specific tuple would include the language of the user query. Other extensions of model data are possible.
The model may also include Post-click behavior that has been tracked by the tracking component.
This patent does include a lot of information about how Google might use click tracking data when ranking search suggestion predictions. It tells us about the data that could be collected about clicks:
The information gathered for each click can include:
(1) the query (Q) the user entered,
(2) the document result (D) the user clicked on,
(3) the time (T) on the document,
(4) the interface language (L) (which can be given by the user),
(5) the country (C) of the user (which can be identified by the host that they use, such as www-store-co-uk to indicate the United Kingdom), and
(6) additional aspects of the user and session.
The time (T) can be measured as the time between the initial click through to the document result until the time the user comes back to the main page and clicks on another document result. Moreover, an assessment can be made about the time (T) regarding whether this time indicates a longer view of the document result or a shorter view of the document result since longer views are generally indicative of quality for the click through the result. This assessment about the time (T) can further be made in conjunction with various weighting techniques.
We are also told that document views from the selections can be weighted based on viewing length information to produce weighted views of the document result.
So, rather than simply distinguishing long clicks from short clicks, a wider range of click through viewing times can be included in the assessment of result quality, where longer viewing times in the range are given more weight than shorter viewing times.
Google will sometimes display search suggestions using autocomplete and also based upon user data from previous queries from a searcher’s previous search history, or the history of someone whom the searcher may be associated with, such as a fellow member of an organization or a co-worker.
While results related to those previous queries were ranked based upon such things as relevance and backlinks, the search suggestions may include results that searchers spent long clicks upon, including long times viewing.
So under this patent, predictions about search suggestions chosen using autocomplete may best meet a searcher’s informational needs by being searches that include results remembered as resulting in long clicks and long viewing times.
: Required parameter $field follows optional parameter $i in on line : Required parameter $post_id follows optional parameter $i in on line : Required parameter $field follows optional parameter $i in on line : Required parameter $post_id follows optional parameter $i in on line