
Research
Abstracts  2006 
Learning Seemingly Unrelated Tasks with Regularized ManifoldsGiorgos Zacharia & Tomaso PoggioThe ProblemIn this work we investigate how to incorporate data available from seemingly unrelated tasks to improve task specific learning models. MotivationRecent work in multitask learning (Evgeniou et al, 2005; Chapelle, and Harchaoui 2004, Girosi 2003) has shown that data from related tasks can be used effectively by regularized learning algorithms, with nonlinear loss functions that penalize errors based on aggregate data less than the errors from the individual data. The same approach can apply to different regularized algorithms, including SVMs, Regularized Least Squares Classification (RLSC) (Rifkin 2002), or Regularized Logistic Regression (Minka 2001). In this work we extend these approaches with the graph Laplacian transformation, to show how to pose the same problem as a special case of semisupervised learning, with regularized manifolds (Belkin, and Niyogi 2003). Previous WorkIn previous work (Evgeniou, Boussios, and Zacharia 2005) we introduced a combined classifier approach that allows us to exploit information from the aggregate data set. The weighted aggregate information (estimated through cross validation) improved the individual specific models. Our previous work has focused on applications of user preference modeling, and was evaluated on a widely available toy dataset generator (Toubia et al 2004) ApproachRegularized Laplacian algorithms (Belkin and Niyogi, 2003) have been used successfully in other semisupervised learning settings. The loss function of the Laplacian RLSC penalizes the weighted deviation of the estimated function , for instances that fall close to each other in the geodesic space of a manifold (high weight in the manifold space ). The manifold is estimated on both the labeled and the unlabeled data. In our problem setting, we already have an estimate for the labels, namely the choices another individual made in the particular instance. Therefore, we modify the Laplacian RLSC formulation introduced by Belkin, and Niyogi (2003) to use the actual labels for the additional instances , and we again penalize the seemingly unrelated information with a penalty term estimated by crossvalidation on the training set. ProgressWe evaluate our approach on the publicly available preference data provided by Sawtooth Software (Sawtooth). The dataset includes data from 100 individuals, with 10 metric instances of products with five attributes (the users provide metric ratings for each product configuration). We transform the problem to choice based comparisons by creating the vectors of differences of the instances, and classifying each comparison with a “+1” or “1” for a winning and losing comparisons respectively. We subsample l=10 comparisons per individual and u={10,20,30,50,100} comparisons from the other 99 individuals. We report the results in Table 1 below. Table 1 Results of Laplacian RLSC experiments with10 instances of individual specific data, u instances of seemingly unrelated data, and weight μ on loss contributed by the seemingly unrelated data.
As we can see from the results, the optimal μ=0.2 seems to not depend on amount of additional data, and the addition of more data from other users does not seem to affect the performance significantly. FutureWe will further investigate the Laplacian approach with other similar algorithms like Laplacian SVMs, and work on mathematical understanding of the interaction between the intrinsic regularization penalty, and the penalty term we introduced for the seemingly unrelated data. We will also apply the same approach to other applications of seemingly unrelated data, such as the Inner London Education Authority examination data. References:[1] Theodoros Evgeniou, Charles Micchelli, and Massimiliano Pontil. Learning multiple tasks with kernel methods. J. Machine Learning Research, 6: pp. 615637. 2005. [2] Olivier Chapelle and Zaid Harchaoui: A Machine Learning Approach to Conjoint Analysis. Advances in Neural Information Processing Systems 17, pp. 257264, MIT Press, Cambridge, MA, USA, 2005 [3] Federico Girosi. Demographic Forecasting. PhD Thesis, Harvard University, Cambridge MA, USA, 2003 [4] Ryan Rifkin. Everything Old Is New Again: A Fresh Look at Historical Approaches in Machine Learning. PhD Thesis, MIT, Cambridge MA, USA, 2002 [5] Tom Minka. Algorithms for maximumlikelihood logistic regression. Technical Report 758, Department of Statistics, Carnegie Mellon University. 2001 [6] Mikhail Belkin., and Partha Niyogi, Semisupervised Learning on Riemannian Manifolds, Machine Learning, 56, Machine Learning, Special Issue on Clustering, 209239, 2004. [7] Theodoros Evgeniou, Constantinos Boussios, and Giorgos Zacharia“Generalized Robust Conjoint Estimation”, Marketing Science, Vol. 24, N°. 3, pp. 415429, 2005 [8] Olivier Toubia, John Hauser and Duncan Simester, "Polyhedral Methods for Adaptive ChoiceBased Conjoint Analysis," Journal of Marketing Research, Vol. XLI, 116131, 2004 [9] Sawtooth Software, Inc, HBReg: Hierarchical Bayes Regression, URL: http://www.sawtoothsoftware.com/hbreg.shtml 

