Advertising Papers

Advertising papers [Listed by years](uncompleted)

More than 100 related papers listed in this pages. Most of them are from WWW, SIGIR, KDD, WSDM and CIKM.

Note that this paper list might be incomplete. If your paper is not listed, please let us know paullzn@gmail.com.

   2010  

WWW 2010

  • A Characterization of Online Search Behavior
    Author:
    Ravi Kumar, ravikumar@yahoo-inc.com
    Abstract:
    In this paper we undertake a large-scale study of online user search behavior based on search and toolbar logs. We identify three types of search: web, multimedia, and item. Together, we show that these different flavors represent almost 10% of all online pageviews, and indirectly result in over 21% of all pageviews. We study search queries themselves, and show that more than half of them contain direct references to some type of structured object; we characterize the types of objects that occur in these queries. We then revisit the relationship search and navigation specifically in the context of e-commerce, and consider how search aids users in online shopping tasks.
  • Beyond Position Bias- Examining Result Attractiveness as   a Source of Presentation Bias in Clickthrough Data
    Author: Yisong Tue, yyue@cs.cornell.edu
    Abstract: In this paper, we examine result summary attractiveness as a potential source of presentation bias. This study distinguishes itself from prior work by aiming to detect systematic biases in click behavior due to attractive summaries inflating perceived relevance. Our experiments conducted on the Google web search engine show substantial evidence of presentation bias in clicks towards results with more attractive titles.
  • Clustering Query Refinements by User Intent
    Author: Eldar Sadikov, eldar@cs.stanford.edu
    Abstract:
    The paper introduce a new method to solve the query suggestions problem by combining document click and session co-occurrence information by performing multiple random walks on a Markov graph that approximates user search behavior. The motivation is to be represent distinct information that the user may need, and improve related-query suggestions across user sessions.
  • Exploiting Query Reformulations for Web Search Result Diversification
    Author:  Rodrygo L. T. Santos, rodrygo@dcs.gla.ac.uk
    Abstract:
    When aWeb user’s underlying information need is not clearly specified from the initial query, an effective approach is to diversify the results retrieved for this query. In this paper, we introduce a novel probabilistic framework for Web search result diversification, which explicitly accounts for the various aspects associated to an underspecified query. In particular, we diversify a document ranking by estimating how well a given document satisfies each uncovered aspect and the extent to which different aspects are satisfied by the ranking as a whole.

  • Inferring Search Behaviors Using Partially Observable Markov Model with Duration (POMD)
    Author: Yin He, Samuel.HY@gmail.com
    Abstract:
    This paper presents Partially Observable Markov model with Duration (POMD), a statistical method that addresses the challenge of understanding sophisticated user behaviors from the search log in which some user actions, such as reading and skipping search results, cannot be observed and recorded. POMD utilizes not only the positional but also the temporal information of the clicks in the log. In this work, they treat the user engagements with a search engine as a Markov process, and model the unobservable engagements as hidden states.

 

WSDM 2010

  • Adaptive Weighing Designs for Keyword Value Computation
    Author: John W. Byers, byers@cs.bu.edu
    Abstract:
    We introduce the channelization problem: how do we adaptively assign keywords to channels over the course of multiple days to quickly obtain accurate VPC estimates of all keywords? We relate this problem to classical results in weighing design, devise new adaptive algorithms for this problem, and quantify the performance of these algorithms experimentally. Our results demonstrate that adaptive weighing designs that exploit statistics of term frequency, variability in VPCs across keywords, and flexible channel assignments over time provide the best estimators of keyword VPCs.
  • An Optimization Framework for Query Recommendation
    Author: Aris Anagnostopoulos,  aris@cs.brown.edu
    Abstract:
    In this paper, we present a formal treatment of the problem of query recommendation. In our framework we model the querying behavior of users by a probabilistic reformulation graph, or query-flow graph [Boldi et al. CIKM 2008]. A sequence of queries submitted by a user can be seen as
    a path on this graph. Assigning score values to queries allows us to define suitable utility functions and to consider the expected utility achieved by a reformulation path on the query-flow graph. Providing recommendations can be seen as adding shortcuts in the query-flow graph that “nudge” the
    reformulation paths of users, in such a way that users are more likely to follow paths with larger expected utility.


  • A Novel Click Model and Its Applications to Online Advertising
    Author: Zeyuan Allen Zhu, zhuzeyuan@hotmail.com
    Abstract:
    Recent advances in click model have positioned it as an attractive method for representing user preferences in web search and online advertising. Yet, most of the existing works focus on training the click model for individual queries, and cannot accurately model the tail queries due to the lack of training data. Simultaneously, most of the existing works consider the query, url and position, neglecting some other important attributes in click log data, such as the local time. Obviously, the click through rate is different between daytime and midnight. In this paper, we propose a novel click model based on Bayesian network, which is capable of modeling the tail queries because it builds the click model on attribute values, with those values being shared across queries. We called our work General Click Model (GCM) as we found that most of the existing works can be special cases of GCM by assigning different parameters. Experimental results on a large-scale commercial advertisement dataset show that GCM can significantly and consistently lead to better results as compared to the state-of-the-art works.
     
  • Automatic Generation of Bid Phrases for Online Advertising
    Author: Sujith Ravi, sravi@isi.edu
    Abstract:
    Our study aims towards the automatic construction of online ad campaigns: given a landing page, we propose several algorithmic methods to generate bid phrases suitable for the given input. Such phrases must be both relevant (that is, reflect the content of the page) and well-formed (that is, likely to be used as queries to a Web search engine). To this end, we use a two phase approach. First, candidate bid phrases are generated by a number of methods, including a (monolingual) translation model capable of generating phrases not contained within the text of the input as well as previously “unseen” phrases. Second, the candidates are ranked in a probabilistic framework using both the translation model, which favors relevant phrases, as well as a bid phrase language model, which favors well-formed phrases.

  • Beyond DCG- User Behavior as a Predictor of a Successful Search
    Author: Ahmed Hassan, hassanam@umich.edu
    Abstract:
    Web search engines are traditionally evaluated in terms of the relevance of web pages to individual queries. However, relevance of web pages does not tell the complete picture, since an individual query may represent only a piece of the user’s information need and users may have different information needs underlying the same queries. We address the problem of predicting user search goal success by modeling user behavior. We show empirically that user behavior alone can give an accurate picture of the success of the user’s web search goals, without considering the relevance of the documents displayed.
  • Improving Ad Relevance in Sponsored Search
    Author:
    Dustin Hillard, dhillard@yahoo-inc.com
    Abstract:
    We describe a machine learning approach for predicting sponsored search ad relevance. Our baseline model incorporates basic features of text overlap and we then extend the model to learn from past user clicks on advertisements. We present a novel approach using translation models to learn user click propensity from sparse click logs. Our relevance predictions are then applied to multiple sponsored search applications in both oine editorial evaluations and live online user tests. The predicted relevance score is used to improve the quality of the search page in three areas: ltering low quality ads, more accurate ranking for ads, and optimized page placement of ads to reduce prominent placement of low relevance ads. We show signi ficant gains across all three tasks.
  • Large Scale Query Log Analysis of Re-Finding
    Author: Sarah K. Tyler, skt@soe.ucsc.edu
    Abstract: Although Web search engines are targeted towards helping people find new information, people regularly use them to re-find Web pages they have seen before. Researchers have noted the existence of this phenomenon, but relatively little is understood about how re-finding behavior differs from the finding of new information. This paper dives deeply into the differences via analysis of three large-scale data sources: 1) query differs from the finding of new information. This paper dives deeply into the differences via analysis of three large-scale data sources: 1) query logs (queries, clicks, result impressions), 2) Web browsing logs (URL visits), and 3) a daily Web crawl (page content).
  • Personalized Click Prediction in Sponsored Search
    Author: Haibin cheng, hcheng@yahoo-inc.com
    Abstract: The objective of this paper is to present a framework for the personalization of click models in sponsored search. We develop user-specific and demographic-based features that reflect the click behavior of individuals and groups. The features are based on observations of search and click behaviors of a large number of users of a commercial search engine. We add these features to a baseline non-personalized click model and perform experiments on offline test sets derived from user logs as well as on live traffic. Our results demonstrate that the personalized models significantly improve the accuracy of click prediction.
  • Precomputing Search Features for Fast and Accurate Query Classification
    Author: Venkatesh Ganti, vganti@microsoft.com
    Abstract:
    In this paper, we propose a new class of features that realizes the benefit of search-based features without high latency. These leverage cooccurrence between the query keywords and tags applied to documents in search results, resulting in a significant boost to web query classification accuracy. By pre-computing the tag incidence for a suitably chosen set of keyword-combinations, we are able to generate the features online with low latency and memory requirements. We evaluate the accuracy of our approach using a large corpus of real web queries in the context of commercial search.
  • Query Reformulation Using Anchor Text
    Author: Van Dang, vdang@cs.umass.edu
    Abstract:
    Query reformulation techniques based on query logs have been studied as a method of capturing user intent and improving retrieval effectiveness. The evaluation of these techniques has primarily, however, focused on proprietary query logs and selected samples of queries. In this paper, we suggest that anchor text, which is readily available, can be an effective substitute for a query log and study the effectiveness of a range of query reformulation techniques (including log-based stemming, substitution, and expansion) using standard TREC collections. Our results show that log based query reformulation techniques are indeed effective with standard collections, but expansion is a much safer form of query modification than word substitution. We also show that using anchor text as a simulated query log is as least as effective as a real log for these techniques.

   2009   

WWW 2009

  • Advertising Keyword Generation Using Active Learning
    Author:
    Hao Wu, haowu@zju.edu.cn
    Abstract:
    This paper proposes an efficient relevance feedback based interactive model for keyword generation in sponsored search advertising. We formulate the ranking of relevant terms as a supervised learning problem and suggest new terms for the seed by leveraging user relevance feedback information. Active learning is employed to select the most informative samples from a set of candidate terms for user labeling. Experiments show our approach improves the relevance of generated terms significantly with little user effort required.
  • A Dynamic Bayesian Network Click Model for Web Search Ranking
    Author: Olivier Chapelle, chap@yahoo-inc.com
    Abstract:
    As with any application of machine learning, web search ranking requires labeled data. The labels usually come in the form of relevance assessments made by editors. Click logs can also provide an important source of implicit feedback and can be used as a cheap proxy for editorial labels. The main difficulty however comes from the so called position bias — urls appearing in lower positions are less likely to be clicked even if they are relevant. In this paper, we propose a Dynamic Bayesian Network which aims at providing us with unbiased estimation of the relevance from the click logs. Experiments show that the proposed click model outperforms other existing click models in predicting both click-through rate and relevance.
  • An Axiomatic Approach for Result Diversification
    Author: Sreenivas Gollapudi, sreenig@microsoft.com
    Abstract:
    Understanding user intent is key to designing an effective ranking system in a search engine. In the absence of any explicit knowledge of user intent, search engines want to diversify results to improve user satisfaction. In such a setting, the probability ranking principle-based approach of presenting the most relevant results on top can be sub-optimal, and hence the search engine would like to trade-off relevance for diversity in the results. In analogy to prior work on ranking and clustering systems, we use the axiomatic approach to characterize and design diversification systems. We develop a set of natural axioms that a diversification system is expected to satisfy, and show that no diversification function can satisfy all the axioms simultaneously.
  • A Probabilistic Model Based Approach for Blended Search
    Author: Ning Liu, ningl@microsoft.com
    Abstract:
    In this paper, we propose to model the blended search problem by assuming conditional dependencies among queries, VSEs and search results. The probability distributions of this model are learned from search engine query log through unigram language model. Our experimental exploration shows that, (1) a large number of queries in generic Web search have vertical search intentions; and (2) our proposed algorithm can effectively blend vertical search results into generic Web search, which can improve the Mean Average Precision (MAP) by as much as 16% compared to traditional Web search without blending.
  • A Search-based Method for Forecasting Ad Impression in Contextual Advertising
    Author: Xuerui Wang, marcusf@yahoo-inc.com
    Abstract:
    In this paper, we address the problem of forecasting the number of impressions for new or changed ads in the system. Producing such forecasts, even within large margins of error, is quite challenging: 1) ad selection in contextual advertising is a complicated process based on tens or even hundreds of page and ad features; 2) the publishers’ content and traffic vary over time; and 3) the scale of the problem is daunting: over a course of a week it involves billions of impressions, hundreds of millions of distinct pages, hundreds of millions of ads, and varying bids of other competing advertisers. We tackle these complexities by simulating the presence of a given ad with its associated bid over weeks of historical data.
     
  • Click Chain Model in Web Search
    Author: Fan Guo, fanguo@cs.cmu.edu
    Abstract:
    It is commonly believed that web search click logs are a gold mine for search business, because they reflect users’ preference over web documents presented by the search engine. Click models provide a principled approach to inferring user-perceived relevance of web documents, which can be leveraged in numerous applications in search businesses.  We present the click chain model (CCM), which is based on a solid, Bayesian framework. It is both scalable and incremental, perfectly meeting the computational challenges imposed by the voluminous click logs that constantly grow.
     
  • Competitive Analysis from Click-Through Log
    Author: Gang Wang, gawa@microsoft.com
    Abstract:
    Existing keyword suggestion tools from various search engine companies could automatically suggest keywords related to the advertisers’ products or services, counting in simple statistics of the keywords, such as search volume, cost per click (CPC), etc. However, the nature of the generalized Second Price Auction suggests that better understanding the competitors’ keyword selection and bidding strategies better helps to win the auction, other than only relying on general search statistics. In this paper, we propose a novel keyword suggestion strategy, called Competitive Analysis, to explore the keyword based competition relationships among advertisers and eventually help advertisers to build campaigns with better performance.
  • General Auction Mechanism for Search Advertising
    Author: Gagan Aggarwal, gagana@google.com
    Abstract:
    In sponsored search, a number of advertising slots is available on a search results page, and have to be allocated among a set of advertisers competing to display an ad on the page. In this paper, we model advertising auctions in terms of an assignment model with linear utilities, extended with bidder and item specific maximum and minimum prices. Auction mechanisms like the commonly used GSP or the well-known Vickrey-Clarke-Groves (VCG) can be interpreted as simply computing a bidder-optimal stable matching in this model, for a suitably defined set of bidder preferences, but our model includes much richer bidders and preferences. Our main technical contributions are the existence of bidder-optimal matchings and strategyproofness of the resulting mechanism, and are proved by induction on the progress of the matching algorithm.
  • How Much Can Behavioral Targeting Help Online Advertising
    Author: Jun Yan, junyan@microsoft.com
    Abstract:
    Behavioral Targeting (BT) is a technique used by online advertisers to increase the effectiveness of their campaigns, and is playing an increasingly important role in the online advertising market. I this paper, we provide an empirical study on the click-through log of advertisements collected from a commercial search engine. From the experiment results over a period of seven days, we draw three important conclusions: (1) Users who clicked the same ad will truly have similar behaviors on the Web; (2) Click-Through Rate (CTR) of an ad can be averagely improved as high as 670% by properly segmenting users for behavioral targeted advertising in a sponsored search; (3) Using short term user behaviors to represent users is more effective than using long term user behaviors for BT.
  • Hybrid Keyword Search Auctions
    Author: Ashish Goel, ashishg@stanford.edu
    Abstract:
    Search auctions have become a dominant source of revenue generation on the Internet. Such auctions have typically used per-click bidding and pricing. We propose the use of hybrid auctions where an advertiser can make a per-impression as well as a per-click bid, and the auctioneer then chooses one of the two as the pricing mechanism. We assume that the advertiser and the auctioneer both have separate beliefs (called priors) on the click-probability of an advertisement. We first prove that the hybrid auction is truthful, assuming that the advertisers are risk-neutral. We then show that this auction is superior to the existing per-click auction in multiple ways. As Internet commerce matures, we need more sophisticated pricing models to exploit all the information held by each of the participants. We believe that hybrid auctions could be an important step in this direction.
  • Towards Context-Aware Search by Learning a Very Large Variable Length Hidden Markov Model from Search Logs
    Author: Huanhuan Cao, caohuan@ustc.edu.cn
    Abstract:
    Capturing the context of a user’s query from the previous queries and clicks in the same session may help understand the user’s information need. A context-aware approach to document re-ranking, query suggestion, and URL recommendation may improve users’ search experience substantially. In this paper, we propose a general approach to context-aware search. To capture contexts of queries, we learn a variable length Hidden Markov Model (vlHMM) from search sessions extracted from log data. We develop a strategy for parameter initialization in vlHMM learning which can greatly reduce the number of parameters to be estimated in practice. We also devise a method for distributed vlHMM learning under the map-reduce model.
  • Spatio-Temporal Models for Estimating Click-through Rate
    Author: Deepak Agarwal, dagarwal@yahoo-inc.com
    Abstract:
    We propose novel spatio-temporal models to estimate click-through rates in the context of content recommendation. We track article CTR at a fixed location over time through a dynamic Gamma-Poisson model and combine information from correlated locations through dynamic linear regressions, significantly improving on per-location model. Our models adjust for user fatigue through an exponential tilt to the first-view CTR (probability of click on first article exposure) that is based only on user-specific repeat-exposure features. We illustrate our approach on data obtained from a module (Today Module) published regularly on Yahoo! Front Page and demonstrate significant improvement over commonly used baseline methods.
  • Predicting Click Through Rate for Job Listings
    Author: Manish Gupta, gmanish@yahoo-inc.com
    Abstract: Click Through Rate (CTR) is an important metric for ad systems, job portals, recommendation systems. CTR impacts publisher’s revenue, advertiser’s bid amounts in “pay for performance” business models. We learn regression models using features of the job, optional click history of job, features of “related” jobs. We show that our models predict CTR much better than predicting avg. CTR for all job listings, even in absence of the click history for the job listing.
  • Matchbox-Large Scale Online Bayesian Recommendations
    Author: David Stern, dstern@microsoft.com
    Abstract:
    We present a probabilistic model for generating personalised recommendations of items to users of a web service. The Matchbox system makes use of content information in the form of user and item meta data in combination with collaborative filtering information from previous user behavior in order to predict the value of an item for a user. Users and items are represented by feature vectors which are mapped into a low-dimensional ‘trait space’ in which similarity is measured in terms of inner products. The model can be trained from different types of feedback in order to learn user-item preferences.
  • Query Clustering using Click-Through Graph
    Author: Jeonghee Yi, Jeonghee@yahoo-inc.com
    Abstract:
    In this paper we describe a problem of discovering query clustersfrom a click-through graph of web search logs. The graph consists of a set of web search queries, a set of pages selected for the queries, and a set of directed edges that connects a query node and a page node  clicked by a user for the query. The proposed method extracts all maximal bipartite cliques (bicliques) from a click-through graph and compute  an equivalence set of queries (i.e., a query cluster) from the maximal bicliques. A cluster of queries is formed from the queries in a biclique. We  present a scalable algorithm that enumerates all maximal bicliques from the click-through graph. We have conducted experiments on Yahoo web  search queries and the result is promising.
  • Towards Intent-Driven Bidterm Suggestion
    Author: William Chang, wchang@isi.edu
    Abstract:
    In online advertising, pervasive in commercial search engines, advertisers typically bid on few terms, and the scarcity of data makes ad matching difficult. Suggesting additional bidterms can significantly improve ad clickability and conversion rates. In this paper, we present a large-scale bidterm suggestion system that models an advertiser’s intent and finds new bidterms consistent with that intent. Preliminary experiments show that our system significantly increases the coverage of a state of the art production system used at Yahoo while maintaining comparable precision.

 

SIGIR 2009

  • Adaptation of Offline Vertical Selection Predictions in the Presence of User Feedback
    Author: Fernando Diaz, diazf@yahoo-inc.com
    Abstract: Web search results often integrate content from specialized corpora known as verticals. Given a query, one important aspect of aggregated search is the selection of relevant verticals from a set of candidate verticals. One drawback to previous approaches to vertical selection is that methods have not explicitly modeled user feedback. However, production search systems often record a variety of feedback information. In this paper, we present algorithms for vertical selection which adapt to user feedback. We evaluate algorithms using a novel simulator which models performance of a vertical selector situated in realistic query traffic.
  • Context-Aware Query Classification
    Author: Huanhuan Cao, caohuan@ustc.edu.cn
    Abstract:
    Understanding users’ search intent expressed through their search queries is crucial to Web search and online advertisement. Web query classification (QC) has been widely studied for this purpose. In this paper, we incorporate context information into the problem of query classification by using conditional random field (CRF) models. In our approach, we use neighboring queries and their corresponding clicked URLs (Web pages) in search sessions as the context information. We perform extensive experiments on real world search logs and validate the effectiveness and efficiency of our approach. We show that we can improve the F1 score by 52% as compared to other state-of-the-art baselines.
  • Context Transfer in Search Advertising
    Author: Hila Becker, hila@cs.columbia.edu
    Abstract:
    We define and study the process of context transfer in search advertising, which is the transition of a user from the context of Web search to the context of the landing page that follows an ad-click. We conclude that in the vast majority of cases, the user is shown one of three types of pages, which can be accurately distinguished using automatic text classification.
  • Click-Through Prediction for News Queries
    Author: Arnd Christian Konig, chrisko@microsoft.com
    Abstract:
    In this paper, we consider the problem of estimating the click-through rate for dedicated news search results. For queries for which news results have been displayed repeatedly before, the click-through rate can be tracked online; however, the key challenge for which previously unseen queries to display news results remains. In this paper we propose a supervised model that offers accurate prediction of news click-through rates and satisfies the requirement of adapting quickly to emerging news events.
  • Efficient Query Expansion for Advertisement Search
    Author: Haofen Wang, whfcarter@sjtu.edu.cn
    Abstract:
    In this paper, we propose an efficient ad search solution relying on a block-based index able to tackle the issues associated with query expansion. Our index structure places clusters of similar bid phrases in corresponding blocks with their associated ads. It reduces the number of merge operations significantly during query expansion and allows sequential scans rather than random accesses, saving I/O costs. We adopt flexible block sizes according to the clustering results of bid phrases to further optimize the index structure for efficient ad search. The pre-computation of such clusters is achieved through an agglomerative iterative clustering algorithm. Finally, we adapt the spreading activation mechanism to return the top-k relevant ads, improving search precision.
     
  • Good Abandonment in Mobile and PC Internet Search
    Author: Jane Li, janeli@google.com
    Abstract:
    Query abandonment by search engine users is generally considered to be a negative signal. In this paper, we explore the concept of good abandonment. We define a good abandonment as an abandoned query for which the user’s information need was successfully addressed by the search results page, with no need to click on a result or refine the query. We present an analysis of abandoned internet search queries across two modalities (PC and mobile) in three locales. The goal is to approximate the prevalence of good abandonment, and to identify types of information needs that may lead to good abandonment, across different locales and modalities. Our findings imply that it is a mistake to uniformly consider query abandonment as a negative signal. Further, there is a potential opportunity for search engines to drive additional good abandonment, especially for mobile search users, by improving search features and result snippets.
  • Learning to Recommend with Social Trust Ensemble
    Author: Hao Ma, hma@cse.cuhk.edu.hk
    Abstract:
    As an indispensable technique in the field of Information Filtering, Recommender System has been well studied and developed both in academia and in industry recently. Aiming at modeling recommender systems more accurately and realistically, we propose a novel probabilistic factor analysis framework, which naturally fuses the users’ tastes and their trusted friends’ favors together. In this framework, we coin the term Social Trust Ensemble to represent the formulation of the social trust restrictions on the recommender systems. The complexity analysis indicates that our approach can be applied to very large datasets since it scales linearly with the number of observations, while the experimental results show that our method performs better than the state-of-the-art approaches.
  • Multiple Approaches to Analysing Query Diversity
    Author: Paul Clough, p.d.clough@sheffield.ac.uk
    Abstract:
    In this paper we examine user queries with respect to diversity: providing a mix of results across different interpretations. Using two query log analysis techniques (click entropy and reformulated queries), 14.9 million queries from the Microsoft Live Search log were analysed. We found that a broad range of query types may benefit from diversification. Additionally, although there is a correlation between word ambiguity and the need for  diversity, the range of results users may wish to see for an ambiguous query stretches well beyond traditional notions of word sense.
  • Optimizing Search Engine Revenue in Sponsored Search
    Author: Yunzhang Zhu, v-yuazhu@microsoft.com
    Abstract:
    In this paper, we address a new revenue optimization problem and aim to answer the question: how to construct a ranking model that can deliver high quality ads to the user as well as maximize search engine revenue? We introduce two novel methods from different machine learning perspectives, and both of them take the revenue component into careful considerations. The algorithms are built upon the click-through log data with real ad clicks and impressions. The extensively experimental results verify the proposed algorithm that can produce more revenue than other methods as well as avoid losing relevance accuracy. To provide deep insight into the importance of each feature to search engine revenue, we extract twelve basic features from four categories. The experimental study provides a feature ranking list according to the revenue benefit of each feature.
  • Smoothing Clickthrough Data for Web Search Ranking
    Author: Jianfeng Gao, jfgao@microsoft.com
    Abstract:
    Incorporating features extracted from clickthrough data (called clickthrough features) has been demonstrated to significantly improve the performance of ranking models for Web search applications. Such benefits, however, are severely limited by the data sparseness problem, i.e., many queries and documents have no or very few clicks. The ranker thus cannot rely strongly on clickthrough features for document ranking. This paper presents two smoothing methods to expand clickthrough data: query clustering via Random Walk on click graphs and a discounting method inspired by the Good-Turing estimator. Experimental results show that the ranking models trained on smoothed clickthrough features consistently outperform those trained on unsmoothed features. This study demonstrates both the importance and the benefits of dealing with the sparseness problem in clickthrough data.
  • When More is Less-The Paradox of Choice in Search Engine Use
    Author: Antti Oulasvirta, Helsinki University of Technology TKK
    Abstract:
    In numerous everyday domains, it has been demonstrated that increasing the number of options beyond a handful can lead to paralysis and poor choice and decrease satisfaction with the choice. Were this so-called paradox of choice to hold in search engine use, it would mean that increasing recall can actually work counter to user satisfaction if it implies choice from a more exten-ive set of result items. The existence of this effect was demonstrated in an experiment where users (N=24) were shown a search scenario and a query and were required to choose the best result item within 30 seconds. Having to choose from six results yielded both higher subjective satisfaction with the choice and greater confidence in its correctness than when there were 24 items on the results page. We discuss this finding in the wider context of “choice architecture”—that is, how result presentation affects choice and satisfaction.

 

KDD 2009

  • Large-Scale Behavioral Targeting
    Author:
    Ye Chen, yechen1@ebay.com
    Abstract:
    Behavioral targeting (BT) leverages historical user behavior to select the ads most relevant to users to display. The state-of-the-art of BT derives a linear Poisson regression model from fine-grained user behavioral data and predicts click-through rate (CTR) from user history. We designed and implemented a highly scalable and efficient solution to BT using Hadoop MapReduce framework. With our parallel algorithm and the resulting system, we can build above 450 BT-category models from the entire Yahoo’s user base within one day, the scale that one can not even imagine with prior systems. Moreover, our approach has yielded 20% CTR lift over the existing production system by leveraging the  well-grounded probabilistic model fitted from a much larger training dataset.

  • Predicting Bounce Rates in Sponsored Search Advertisements
    Author: D. Sculley, dsculley@google.com
    Abstract:
    This paper explores an important and relatively unstudied quality measure of a sponsored search advertisement: bounce rate. The bounce rate of an ad can be informally defined as the fraction of users who click on the ad but almost immediately move on to other tasks. A high bounce rate can lead to poor advertiser return on investment, and suggests search engine users may be having a poor experience following the click. In this paper, we first provide quantitative analysis showing that bounce rate is an effective measure of user satisfaction. We then address the question, can we predict bounce rate by analyzing the features of the advertisement? An affirmative answer would allow advertisersand search engines to predict the effectiveness and quality of advertisements before they are shown. We propose solutions to this problem involving large-scale learning methods that leverage features drawn from ad creatives in addition to their keywords and landing pages.

  • PSkip-estimating relevance ranking quality from web search clickthrough data
    Author: Kuansan Wan, Microsoft Corporation
    Abstract:
    In this article, we report our efforts in mining the information encoded as clickthrough data in the server logs to evaluate and monitor the relevance ranking quality of a commercial web search engine. We describe a metric called pSkip that aims to quantify the ranking quality by estimating the probability of users encountering non relevant results that cost them the efforts to read and skip. A search engine with a lower pSkip is regarded as having a better ranking quality. A key design goal of pSkip is to integrate the findings from two sets of user studies that utilize eye-tracking devices to track users’ browsing patterns on the search result pages, and that use specially instrumented browsers to actively solicit users’ explicit judgments on their search activities.

WSDM 2009

  • Analysis of long queries in a large scale search log
    Author:
    Michael Bendersky, bemike@cs.umass.edu
    Abstract:
    We propose to use the search log to study long queries, in order to understand the types of information needs that are behind them, and to design techniques to improve search effectiveness when they are used. In this paper we analyze the long queries in the search log with the aim of identifying the characteristics of the most commonly occurring types of queries, and the issues involved with using them effectively in a search engine. In addition, we propose a simple yet effective method for evaluating the performance of the queries in the search log using a combination  of the click data in the search log with the existing TREC corpora.
  • Discovering and Using Groups to Improve Personalized Search
    Author:
    Jaime Teevan, teevan@microsoft.com
    Abstract:
    Personalized Web search takes advantage of information about an individual to identify the most relevant results for that person. To better understand whether groups of people can be used to benefit personalized search, we explore the similarity of query selection, desktop information, and explicit relevance judgments across people grouped in different ways. The groupings we explore fall along two dimensions: the longevity of the group members’ relationship, and how explicitly the group is formed. We find that some groupings provide valuable insight into what members consider relevant to queries related to the group focus, but that it can be difficult to identify  valuable groups implicitly. Building on these findings, we explore an algorithm to "groupize" (versus "personalize") Web search results that leads to a significant improvement in result ranking on group-relevant queries.
     
  • Diversifying Search Results
    Author: Rakesh Agrawal, rakesha@microsoft.com
    Abstract:
    We study the problem of answering ambiguous web queries in a setting where there exists a taxonomy of information, and that both queries and documents may belong to more than one category according to this taxonomy. We present a systematic approach to diversifying  results that aims to minimize the risk of dissatisfaction of the average user. We propose an algorithm that well approximates this objective in  general, and is provably optimal for a natural special case. Furthermore, we generalize several classical IR metrics, including NDCG, MRR, and MAP, to explicitly account for the value of diversification.
  • Efficient Multiple-Click Models in Web Search
    Author: Fan Guo, fanguo@cs.cmu.edu
    Abstract:
    Many tasks that leverage web search users’ implicit feed-back rely on a proper and unbiased interpretation of user clicks. Previous eye-tracking experiments and studies on explaining position-bias of user clicks provide a spectrum of hypotheses and models on how an average user examines and possibly clicks web documents returned by a search engine with respect to the submitted query. In this paper, we attempt to close the gap between previous work, which studied how to model a single click, and the reality that multiple clicks on web documents in a single result page are not uncommon. Specifically, we present two multiple-click models: the independent click model (ICM) which is reformulated from previous work, and the dependent click model (DCM) which takes into consideration dependencies between multiple clicks.
  • Tailoring Click Models to User Goals
    Author: Fan Guo, fanguo@cs.cmu.edu
    Abstract:
    Click models provide a principled way of understanding user interaction with web search results in a query session and a statistical tool for leveraging search engine click logs to analyze and improve user experience. In this paper, we present how to tailor click models to user goals in web search through query term classification. We demonstrate that better predicative power could be achieved by fitting two click models for navigational queries and informational queries respectively, as evidenced by the likelihood and perplexity evaluation results on a subset of the MSN 2006 RFP data which consists of 121,179 distinct query terms and over 2.8 million query sessions.
  • Usefulness of Quality Click-through Data for Training
    Author: Craig Macdonald, craigm@dcs.gla.ac.uk
    Abstract:
    Modern Information Retrieval (IR) systems often employdocument weighting models with many parameters that require to be appropriately set for effective retrieval performance. To obtain these parameter settings, quality training is usually required, where assessors have manually labelled the relevance of retrieved items for many queries. In this work, we examine the usefulness of high-quality click-through data for training an IR system, on searching the .gov vertical domain of the Web. We find that, compared to training using relevance judgements created using human assessors, the click-through trained settings are as good and occasionally better.

 

CIKM 2009

  • Translating Relevance Scores to Probabilities for Contextual Advertising
    Author:
    Deepak Agarwal, dagarwal@yahoo-inc.com
    Abstract:
    Information retrieval systems conventionally assess docu-ent relevance using the bag of words model. Consequently, relevance scores of documents retrieved for different queries are often difficult to compare, as they are computed on different (or even disjoint) sets of textual features. Many tasks, such as federation of search results or global thresholding of relevance scores, require that scores be globally  comparable. To achieve this aim, we propose methods for non-monotonic transformation of relevance scores into probabilities for a contextual advertising selection engine that uses a vector space model. The calibration of the raw scores is based on historical click data.
     

NIPS 2009

  • Factor Modeling for Advertisement Targeting
    Author:
    Ye Chen, yechen1@ebay.com
    Abstract:
    We adapt a probabilistic latent variable model, namely GaP (Gamma-Poisson) [6],to ad targeting in the contexts of sponsored search (SS) and behaviorally targeted (BT) display advertising. We also approach the important problem of ad positional bias by formulating a one-latent-dimension GaP factorization. Learning from click-through data is intrinsically large scale, even more so for ads. We scale up the algorithm to terabytes of real-world SS and BT data that contains hundreds of millions of users and hundreds of thousands of features, by leveraging the scalability characteristics of the algorithm and the inherent structure of the problem including data sparsity and locality. Finally, we report the experimental results using Yahoo’s vast datasets, and show that our approach substantially outperform the state-of-the-art methods in prediction accuracy.
  • Online Learning of Assignments
    Author:
    Matthew Streeter, mstreeter@google.com
    Abstract:
    Which ads should we display in sponsored search in order to maximize our revenue? How should we dynamically rank information sources to maximize the value of the ranking? These applications exhibit strong diminishing returns: Redundancy decreases the marginal utility of each ad or information source. We show that these and other problems can be formalized as repeatedly selecting an assignment of items to positions to maximize a sequence of monotone submodular functions that arrive one by one. We present an efficient algorithm for this general problem and analyze it in the no-regret model. We empirically evaluate our algorithm on two real-world online optimization problems on the web: ad allocation with submodular utilities, and dynamically ranking blogs to detect information cascades.

 

IJCAI 2009

  • Efficient Incremental Search for Moving Target Search
    Author:
    Abstract:
  • Methodology for Designing Reasonably Expressive Mechanisms with Application to Ad Auctions
    Author:
    Abstract:

OTHERS

  • A COLLABORATIVE FILTERING APPROACH TO SPONSORED SEARCH_technical report_2009.pdf
  • Get more Clicks_ira2009.pdf
  • Identification of ambiguous queries in web search_IPM09.pdf
  • Online story scheduling in web advertising_SODA09.pdf
  • Optimal Online Assignment with Forecasts09.pdf
  • Organizing Web Sites Using Search Engine Queries_2009.pdf
  • Position Effects in Web Search Click Behavior_WSSP09.pdf
  • Predicting Click Rates by Consistent Bipartite Spectral Graph Model_ADMA09.pdf
  • Survey and evaluation of query intent detection methods
    Author:
    Abstract:

  2008  

WWW 2008

  • Predicting Ads' Click Through Rate with Decision Rules
    Author:
    Krzysztof Dembczynski, kdembczynski@pc.put.poznan.pl
    Abstract:
    Paid advertisements displayed alongside search results constitute a major source of income for search companies.  In this context, an ad’s quality can be measured by the probability of it being clicked assuming it was noticed by the user (click-through rate, CTR). There are two problems here. First, the real CTR is heavily affected by the ad’s position, not only by its content. Second, it is not possible to directly calculate CTR for new ads. In this paper we build upon a large data set with real ad clicks and impressions, acquired thanks to Microsoft’s Beyond Search program. First, we describe the data set and anomalies found inside it. We then address the problem of computing CTR for existing ads using maximum likelihood estimation. Finally, we present an algorithm learning an ensemble of decision rules that can be used for predicting the CTR for unseen ads and giving recommendations  to improve ads’ quality.
  • Online learning from click data for sponsored search
    Author: Massimiliano Ciaramita, massi@yahoo-inc.com
    Abstract:
    Sponsored search is one of the enabling technologies for today’s Web search engines. It corresponds to matching and showing ads related to the user query on the search engine results page. Users are likely to click on topically related ads and the advertisers pay only when a user clicks on their ad. Hence, it is important to be able to predict if an ad is likely to be clicked, and maximize the number of clicks. We investigate the sponsored search problem from a machine learning perspective with respect to three main sub-problems: how to use click data for training and evaluation, which learning framework is more suitable for the task, and which features are useful for existing models. Furthermore, we find that online multilayer perceptron  learning, based on a small set of features representing content similarity of different kinds, significantly outperforms an information retrieval baseline and  other learning models, providing a suitable framework for the sponsored search task.
  • Externalities in Online Advertising
    Author: Arpita Ghosh, arpita@yahoo-inc.com
    Abstract:
    Most models for online advertising assume that an advertiser’s value from winning an ad auction, which depends on the clickthrough rate or conversion rate of the advertisement, is independent of other advertisements served along-side it in the same session. This ignores an important externality effect: as the advertising audience has a limited attention span, a high-quality ad on a page can detract attention from other ads on the same page. In this paper, we introduce the problem of modeling externalities in online advertising, and study the winner determination problem in these models. Our models are based on choice models on the audience side. We show that in the most general case, the winner determination problem is hard even to approximate. Furthermore, we show that there are some interesting special cases, such as the case where the audience preferences are single peaked, where the problem can be solved exactly in polynomial time.
  • Dynamic Cost-Per-Action Mechanisms and Applications to Online Advertising
    Author: Hamid Nazerzadeh, hamidnz@stanford.edu
    Abstract:
    We study the Cost-Per-Action or Cost-Per-Acquisition (CPA) charging scheme in online advertising. In this scheme, instead of paying per click, the advertisers pay only when a user takes a specific action (e.g. fills out a form) or completes a transaction on their websites. We focus on designing efficient and incentive compatible mechanisms that use this charging scheme. We describe a mechanism based on a sampling-based learning algorithm that under suitable assumptions is asymptotically individually rational, asymptotically Bayesian incentive compatible and asymptotically ex-ante efficient. In particular, we demonstrate our mechanism for the case where the utility functions of the advertisers are independent and identically-distributed random variables as well as the case where they evolve like independent reflected Brownian motions.
  • Contextual Advertising by Combining Relevance with Click Feedback
    Author: Deepayan Chakrabarti, deepay@yahoo-inc.com
    Abstract:
    Contextual advertising supports much of the Web’s ecosystem today. User experience and revenue (shared by the site publisher ad the ad network) depend on the relevance of the displayed ads to the page content. In this paper we show how this match can be improved significantly by augmenting the ad-page scoring function with extra parameters from a logistic regression model on the words in the pages and ads. A key property of the proposed model is that it can be mapped to standard cosine similarity matching and is suitable for efficient and scalable implementation over inverted indexes. The model parameter values are learnt from logs containing ad impressions and clicks, with shrinkage estimators being used to combat sparsity. To scale our computations to train on an extremely large training corpus consisting of several gigabytes of data, we parallelize our fitting algorithm in a Hadoop framework.
  • Analyzing Search Engine Advertising- Firm Behavior and Cross-Selling in Electronic Markets
    Author: Anindya Ghose, aghose@stern.nyu.edu
    Abstract:
    The phenomenon of sponsored search advertising is gaining ground as the largest source of revenues for search engines. Firms across different industries have are beginning to adopt this as the primary form of online advertising. This process works on an auction mechanism in which advertisers bid for different keywords, and final rank for a given keyword is allocated by the search engine. Based on the model and estimates from prior work, we conduct a number of policy simulations in order to investigate to what extent an advertiser can benefit from bidding optimally for its keywords. Further, we build a Hierarchical Bayesian modeling framework to explore the potential for cross-selling or spillovers effects from a given keyword advertisement across multiple product categories, and estimate the model using Markov Chain Monte Carlo (MCMC) methods.
  • A Combinatorial Allocation Mechanism With Penalties For Banner Advertising
    Author: Uriel Feige, uriel.feige@weizmann.ac.il
    Abstract:
    Most current banner advertising is sold through negotiation thereby incurring large transaction costs and possibly suboptimal allocations. We propose a new automated system for selling banner advertising. In this system, each advertiser specifies a collection of host webpages which are relevant to his product, a desired total quantity of impressions on these pages, and a maximum per-impression price. The system selects a subset of advertisers as winners and maps each winner to a set of impressions on pages within his desired collection. The distinguishing feature of our system as opposed to current combinatorial allocation mechanisms is that, mimicking the current negotiation system, we guarantee that winners receive at least as many  advertising opportunities as they requested or else receive ample compensation in the form of a monetary payment by the host. Such guarantees are essential in markets like banner advertising where a major goal of the advertising campaign is developing brand recognition.

 

SIGIR 2008

  • Optimizing relevance and revenue in ad search_ a query substitution approach
    Author: Filip Radlinski, filip@cs.cornell.edu
    Abstract:
    The primary business model behind Web search is based on textual advertising, where contextually relevant ads are displayed alongside search results. We address the problem of selecting these ads so that they are both relevant to the queries and profitable to the search engine, showing that optimizing ad relevance and revenue is not equivalent. Selecting the best ads that satisfy these constraints also naturally incurs high computational costs, and time constraints can lead to reduced relevance and profitability. We propose a novel two-stage approach, which conducts most of the analysis ahead of time. An offline preprocessing phase leverages additional knowledge that is impractical to use in real time, and rewrites frequent queries in a way that subsequently facilitates fast and accurate online matching.

 

  • A user browsing model to predict search engine click data from past observations
    Author: Georges Dupret, gdupret@yahoo-inc.com
    Abstract:
    Search engine click logs provide an invaluable source of releance information but this information is biased because we ignore which documents from the result list the users have actually seen before and after they clicked. Otherwise, we could estimate document relevance by simple counting. In this paper, we propose a set of assumptions on user browsing behavior that allows the estimation of the probability that a document is seen, thereby providing an unbiased estimate of document relevance. To train, test and compare our model to the best alternatives described in the Literature, we gather a large set of real data and proceed to an extensive cross-validation experiment.

 

WSDM 2008

  • Entropy of Search Logs-How Hard is Search-With Personalization-With Backoff
    Author: Qiaozhu Mei, qmei2@uiuc.edu
    Abstract:
    Entropy is a powerful tool for sizing challenges and opportunities. How hard is search? How hard are query suggestion mechanisms like auto-complete? How much does personalization help? All these difficult questions can be answered by estimation of entropy from search logs. What is the potential opportunity for personalization? In  this paper, we propose a new way to personalize search, personalization with backoff. If we have relevant data for a particular user, we should use it. But if we don’t, back off to larger and larger classes of similar users. As a proof of concept, we use the first few bytes of the IP address to define classes. The coefficients of each backoff class are estimated with an EM algorithm. Ideally, classes would be defined by market segments, demographics and surrogate variables such as time and geography.
  • An Empirical Analysis of Sponsored Search Performance in Search Engine Advertising
    Author: Anindya Ghose, aghose@stern.nyu.edu
    Abstract:
    Despite the growth of search advertising, we have little understanding of how consumers respond to contextual and sponsored search advertising on the Internet. Using a unique panel dataset of several hundred keywords collected from a large nationwide retailer that advertises on Google, we empirically model the relationship between different metrics such as click-through rates, conversion rates, bid prices and keyword ranks. Our paper proposes a novel framework and data to better understand what drives these differences. We use a Hierarchical Bayesian  modeling framework and estimate the model using Markov Chain Monte Carlo (MCMC) methods. We empirically estimate the impact of keyword  attributes on consumer search and purchase behavior as well as on firms’ decision-making behavior on bid prices and ranks.
  • An Experimental Comparison of Click Position-Bias Models
    Author: Nick Craswell, nickcr@microsoft.com
    Abstract:
    Search engine click logs provide an invaluable source of relevance information, but this information is biased. A key source of bias is presentation order: the probability of click is influenced by a document’s position in the results page. This paper focuses on explaining that bias,  modelling how probability of click depends on position. We propose four simple hypotheses about how position bias might arise. We carry out a  large data-gathering effort, where we perturb the ranking of a major search engine, to see how clicks are affected. We then explore which of the  four hypotheses best explains the real-world position effects, and compare these to a simple logistic regression model.
  • Advertising Keyword Suggestion Based On Concept Hierarchy
    Author: Yifan Chen, ivan@apex.sjtu.edu.cn
    Abstract:
    The increasing growth of the World Wide Web constantly enlarges the revenue generated by search engine advertising. Advertisers bid on keywords associated with their products to display their ads on the search result pages. Keyword suggestion methods are proposed to fill the gap  between the keywords chosen by advertisers and the popular queries, through finding new relevant keywords according to some statistical information (for example, the keyword co-occurrence). However, there is little effort taking semantic information, such as concept hierarchy, into account. In this  paper, we propose a novel keyword suggestion method that fully exploits the semantic knowledge among concept hierarchy. Given a keyword, we first match it with some relevant concepts. Then the relevant concepts are used with their hierarchy to fertilize the meanings of the keywords. Finally new keywords are suggested according to the concept information rather than the statistical co-occurrence of the keyword itself.

CIKM 2008

  • Search Advertising using Web Relevance Feedback
    Author: Andrei Z. Broder, broder@yahoo-inc.com
    Abstract:
    Traditionally, matching of ads to queries employed standard information retrieval techniques using the bag of words approach. Here we propose to go beyond the bag of words, and augment both queries and ads with additional knowledge-rich features. We use Web search results initially returned for the query to create a pool of relevant documents. Classifying these documents with respect to an external taxonomy and identifying salient named entities give rise to two new feature types. Empirical evaluation based on over 9,000 query-ad pairwise judgments confirms that using  augmented queries produces highly relevant ads. Our methodology also relaxes the requirement for each ad to explicitly specify the exhaustive list of queries (“bid phrases”) that can trigger it.
  • To Swing or not to Swing Learning when (not) to Advertise
    Author: Andrei Broder, Yahoo! Research
    Abstract:
    When no ads are relevant to the user’s interests, then showing irrelevant ads should be avoided since they annoy the user and produce no economic benefit. In this paper we pose a decision problem “whether to swing”, that is, whether or not to show any of the ads for the incoming request. We propose two methods for addressing this problem, a simple thresholding approach and a machine learning approach, which collectively analyzes the set of candidate ads augmented with external knowledge. Our experimental evaluation, based on over 28,000 editorial judgments, shows that we are able to predict,  with high accuracy, when to “swing” for both content match and sponsored search advertising.

 

AAAI 2008

  • An Expressive Auction Design for Online Display Advertising
    Author: Sebastien Lahaie, Yahoo! Research
    Abstract:
    We propose an expressive auction design that allows advertisers to specify the kinds of demographics and websites they wish to target within an advertising network. The design allows the network to differentiate impressions according to relevant attributes (e.g., geographic location of the user, topic of the webpage). Advertisers can place bids for different kinds of impressions according to their attributes, and can also specify volume constraints to control exposure. The novelty of the design is a bidding language that admits scalable allocation and pricing algorithms. We discuss the incentive properties of different pricing approaches. We also propose a bidder feedback mechanism to mitigate the complexity of expressive bidding.
  • Expressive Banner Ad Auctions and Model-Based Online Optimization for Clearing
    Author: Craig Boutilier, cebly@cs.toronto.edu
    Abstract:
    We present the design of a banner advertising auction which is considerably more expressive than current designs. We describe a general model of expressive ad contracts/bidding and an allocation model that can be executed in real time through the assignment of fractions of relevant ad channels to specific advertiser contracts. The uncertainty in channel supply and demand is addressed by the formulation of a stochastic combinatorial optimization problem for channel allocation that is rerun periodically. We solve this in two different ways: fast deterministic optimization with respect to expectations; and a novel online sample-based stochastic optimization method—that can be applied to continuous decision spaces—which exploits the deterministic optimization as a black box. Experiments demonstrate the importance of  expressive bidding and the value of stochastic optimization.

 

OTHERS

  • Transfer Learning by Distribution Matching for Targeted Advertising
    Author: Steffen Bickel, bickel@cs.uni-potsdan.de
    Abstract: We address the problem of learning classifiers for several related tasks that may differ in their joint distribution of input and output variables. For each task, small –possibly even empty – labeled samples and large unlabeled samples are available. While the unlabeled samples reflect the target distribution, the labeled samples may be biased. This setting is motivated by the problem of predicting sociodemo- graphic features for users of web portals, based on the content which they have accessed. Here, questionnaires offered to a portion of each portal’s users produce biased samples. We derive a transfer learning procedure that produces resampling weights which match the pool of all examples to the target distribution of any given task. Transfer learning enables us to make predictions even for new portals with few or no training data and improves the overall prediction accuracy.

  • Algorithmic methods for sponsored search advertising
    Author:
    Jon Feldman and S. Muthukrishnan
    Abstract:
    Modern commercial Internet search engines display advertisements along side the search results in response to user queries. Such sponsored search relies on market mechanisms to elicit prices for these advertisements, making use of an auction among advertisers who bid in order to have their ads shown for specific keywords. We present an overview of the current systems for such auctions and also describe the underlying game-theoretic aspects. The game involves three parties—advertisers, the search engine, and search users—and we present example research directions that emphasize the role of each. The algorithms for bidding and pricing in these games use techniques from three mathematical areas: mechanism design, optimization, and statistical estimation. Finally, we present some challenges in sponsored search advertising.

  • Optimal Ad Ranking for Profit Maximization
    Author:
    Raju Balakrishnan,  rajub@asu.edu
    Abstract:
    Despite the enormous commercial importance of on-line advertisements (ads), there has been little work done to clarify the basis for ranking and displaying them. Most existing methods rank ads as if the user views each of them in isolation. We will consider a more realistic user model that induces three mutual influences between displayed ads: (i) positional bias (for viewing ads placed higher up) (ii) similar ad fatigue (which reduces interest in an ad when similar ads have already been displayed above it) and (iii) browsing impatience (which accounts for the user abandoning the ad viewing based on the ads already seen). We will show that in general, when the inter-ad similarity is taken into account, optimal ranking is NP-hard. Ignoring inter-ad similarity, we state and prove the optimal ranking function for sorting the ads that is sensitive to the other two factors. We will show that the known ad ranking strategies correspond to restricted special cases of our ranking function.

  • Simrank++ Query Rewriting through Link Analysis of the Click Graph
    Author:
    Ioannis Antonellis, antonell@cs.stanford.edu
    Abstract:
    We focus on the problem of query rewriting for sponsored search. We base rewrites on a historical click graph that records the ads that have been clicked on in response to past user queries. Given a query q, we first consider Simrank as a way to identify queries similar to q, i.e., queries  whose ads a user may be interested in. We argue that Simrank fails to properly identify query similarities in our application, and we present two enhanced versions of Simrank: one that exploits weights on click graph edges and another that exploits “evidence.” We experimentally evaluate our new  schemes against Simrank, using actual click graphs and queries from Yahoo!, and using a variety of metrics. Our results show that the enhanced  methods can yield more and better query rewrites.

  2007  

  • A Largescale Evaluation and Analysis of Personalized Search Strategies
  • Click Entropy---A Large-scale Evaluation and Analysis of Personalizaed Search Strategies
  • Comparing Click Logs and Editorial Labels for Training Query Rewriting
  • Identifying Ambiguous Queries in Web Search
  • Predicting Clicks - Estimating the Click-Through Rate for New Ads
    Author:
    Matthew Richardson, mattri@microsoft.com
    Abstract:
    Search engine advertising has become a significant element of the Web browsing experience. Choosing the right ads for the query and the order in which they are displayed greatly affects the probability that a user will see and click on each ad. This ranking has a strong impact on the revenue the search engine receives from the ads. Further, showing the user an ad that they prefer to click on improves user satisfaction. For these reasons, it is important to be able to accurately estimate the click-through rate of ads in the system. For ads that have been displayed repeatedly, this is empirically measurable, but for new ads, other means must be used. We show that we can use features of ads, terms, and advertisers to learn a model that accurately predicts the click-though rate for new ads. We also show that using our model improves the convergence and performance of an advertising system. As a result, our model increases both revenue and user satisfaction.
  • AdaRank-A Boosting Algorithm for Information Retrieval
  • Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks
  • On score distributions and relevance

  2006  

  • Adapting ranking SVM to document retrieval-SIGIR06.pdf
  • Learning to Advertise-SIGIR06.pdf
  • Learning to Advertise-SIGIR06.ppt
  • Learning user interaction models for predicting web search result preferences_SIGIR06.pdf
  • Adapting ranking SVM to document retrieval_SIGIR2006.pdf
  • lambdarank_Learning to Rank with Nonsmooth Cost Functions_NIPS06.pdf
  • On the relationship between click rate and relevance for search engines-dmie06.pdf
  • Predicting click-through rate using keyword clusters_EC06.pdf
     

  Previous  

  • Implementing Sponsored Search in Web Search Engines_ Computational Evaluation of Alternative Mechanisms - INFORMS journal on computing05.pdf
  • RankNet_Learning to Rank using Gradient Descent_ICML05.pdf
  • Toward the Next Generation of Recommender Systems A Survey of the State-of-the-Art and Possible Extensions_2005.pdf
  • Boosting support vector machines for text classification through parameter-free threshold relaxation_CIKM03.pdf
  • SimRank A Measure of Structural-Context Similarity_KDD02.pdf
  • SimRank A Measure of Structural-Context Similarity_KDD02.ppt
  • The Score-Distributional Threshold Optimization for Adaptive Binary Classification Tasks_SIGIR01.pdf
  • The use of MMR, diversity-based reranking for reordering documents and producing summaries_sigir98.pdf