Preference learning (or preference elicitation) is a critical problem in many scientific fields, such as decision theory [1,2], economics [3,4], logistic [ref] and database [5]. When modeling user preference, researchers often model the preferences as a solution to an optimization problem which maximizes some utility function. In reality, however, we are not a-priori given a utility but have only access to a finite historical user choice data. Therefore, the passive preference learning problem, that is, how to learn user preferences using her historical choice data, has gained a lot of attention in recent years.

When dealing with preference learning, it is often assumed that user preference over the values of each attribute is independent of the values of other attributes. However, this assumption is not a sound in many world scenarios. For example, as it is shown in Fig. 1 for cloth shopping problem, one might choose the color of her shoes depending on the color of dress she will buy, i.e. her preference over shoes color is conditioned by the available dresses. More formally, we say the preferences induced by the user’s behavior are intrinsically related to textit{conditional preferential independence}, a key notion in multi-attribute decision theory[20].

Conditional preference networks (CP-nets) have been proposed for such problems [4] and have received a great deal of attention due to the compact and natural representation of ordinal preferences in multi-attribute domains [8-12, 17-19,22]. Briefly, a CP-net, fig. 1, is a digraph, whose nodes correspond to alternative attributes and edges correspond to the dependency between nodes and each node is annotated with a conditional preference table which describe the preferences over that particular attribute (chapter 3).

It is sometimes claimed that CP-nets are `easy to elicit’ [16]. That is, we first explain CP-nets to the user, and then ask her to write down the CP-net that best describes her decision-making process [18,30]. However, it has been shown that when facing the choices, people often act differently from what they described previously as their preferences [39,40,97,103]. As an example, Kamishima and Akaho [53] point out that when customers were asked to rank ten sushi items and then later to assign rating scores to the same items, in 68% of the cases, the ordering implied by the ratings did not agree with the ranking elicited directly only minutes before. Based on these experiments, several CP-net learning algorithms have been developed depend on the users choice data. Some algorithms work on the historical choice data [23,64], a process known as passive learning. Others actively offer solutions in an attempt to learn the users’ preferences as they choose [23,29,47,58]. The work of this paper falls into the category of passive learning, in which the learner uses the recorded user’s choices and then fits a CP-net model to the observed data. Formally, we collect the set of samples $S = {o_i succ o’_i}$, where $o_i succ o’_i$ means that the user strictly prefers outcome $o_i$ over outcome $o’_i$ and then find a model $N$ that can best describe $S$. Such set of samples may be gathered, for instance, by observing online users’ choices.

Table1 shows the number of binary CP-nets up to 7 nodes, i.e. each outcome consists of 7 attributes [A250110]. From the values, it is evident that, even for a small number of attributes, finding the best CP-net is not a trivial task due to the huge size of the search space. textbf{inja np-completo begoo.} To the best of our knowledge, there is no existing approach that can perform well on problem with more than 7 attributes hence they are not practical when facing real world problems, in which the alternatives usually consist of tens or even hundreds of attributes.

Another problem that rises when learning preferences from human subjects is the possibility of noise or comparison data that are ultimately inconsistent in the chose data-set $S$. While noise is results of the observation of the users’ behavior, inconsistency is the result of randomicity of the users’ behaviors; that is, the transitive closure of data-set may result in a cycle in which some outcome $o$ is seen to be preferred to itself. The objective of most CP-net learning techniques is to learn (i.e. rebuild) a CP-net that can describe the whole data-set[ref]. However, since the $S$ is not usually clean, there is no possibility of finding such a CP-net, that is consistent with every example in $S$. This fact motivated us to frame the CP-net learning problem as an optimization problem that is, to identify a model that maximizes some objective function, $f$, with respect to choice data-set.

In this work, we utilized the power of Genetic Algorithm (GA) as an optimization technique. GA is an optimization algorithm inspired from the mechanism of natural selection and natural genetics, which can work without any a-priori knowledge about the problem domain and have received a growing interest in solving the complex combinatorial optimization problems especially for their scalability as compared with the deterministic algorithms [1]. In this work, we investigate the feasibility of implementing the GA to solve the passive CP-net learning problem.