In this work we consider the problem of searching in a database of objects with different popularities. Contrary to traditional databases, users do not submit queries that are subsequently matched to objects. Instead, a search for a target object is conducted in several phases. In each phase, a user is presented with a list of database objects. The user selects from this list the object that is most similar to the target she wishes to find. A new list is then presented to the user, and the above process is repeated. When the target item is among the items presented to the user, she retrieves this item and the search terminates. The above problem and has several important real-life applications, for instance navigating through a database of pictures of humans, such as Flickr or Picasa, in which subjects are photographed under a variety of non-idealized conditions.
The use of interactive methods (i.e., that incorporate human feedback) for content search has a long history in literature. Arguably, the first oracle considered to model such methods is the so-called membership oracle, which allows the search mechanism to ask a user questions of the form “does the target belong to set A”. It is well known that to find a target t one needs to submit at least H(μ) queries, on average, to the oracle where η is the popularity distribution on objects. Moreover, there exists an algorithm (Huffman coding) that finds the target with only H(μ)+1 queries on average. Having access to a membership oracle however is a strong assumption, as humans may not necessarily be able to answer queries of the above type for any object set A. Moreover, the large number of possible sets makes the cost of designing optimal querying strategies over large datasets prohibitive.
One way to address the above constraints is through a graphical model, i.e., each query set must conform to the constraints imposed by a graph . For instance, items (with potentially different popularity) can be associated with vertices and each query is any set of nodes that is path connected. In this context, as a special case the optimum algorithm for finding an object on a complete graph reduces to that of Huffman coding.
Intuitively, the efficiency of an algorithm that can find a target by using the membership oracle with graphical constraints should depend on the entropy of the objects as well as the charactersitics of the graph. We would like to understand this depencency.
Programming skills and basic notions of information theory
Ruediger Urbanke, office INR 116 [email protected]ch