Is multicollinearity a problem for conjoint analysis?

Multicollinearity is the presence of any linear relationship of two or more variables contained in the dataset. This correlation between independent variables distorts probabilities, scores, and P-values of parameters and could result in incorrect findings in our explanatory and dependent variables. Let’s imagine the following data:


We can code this table and obtain this design:


This design describes an experiment about car preferences, we are using data contained in Example experiment 2: Preferences in cars, which is available in your experiment list. As shown in the image below, the sum of values in Kea Rocketta and Ladina Klubnika columns are the same that the values contained in Engine motor, creating a collinearity problem in the data.

Collinearity 2

All the conjoint analysis experiments launched with use relative preference data to compute part-worth utilities. The main advantage of this practice is that a simulated scenario closer to reality will provide accurate and real measures of the attitudes of potential customers towards a product. However, under this design, some problems of multicollinearity could arise.

A common practice is to remove one level per attribute. For the design above, we can drop, for example, Kea Rocketta from the brand attribute. A key factor to note is that the data for the removed features is still contained in the entire dataset, and their part-worth utilities will be also calculated.

Collinearity fixed

When this step is done for each attribute, we can estimate coefficients through regressions and our HB model for each level in the experiment. Those removed to avoid collinearity could be retrieved as the sum of the other levels (i.e. for brand: Landrange Hoover=2.1, Ladina Klubnika=0.9, and Maruda Maru II=0.5. The relative preference score for Kea Rocketta will be 3.5)

In conclusion, multicollinearity is not a problem for conjoint analysis when the researcher drops one level per attribute, estimates the parameters for the new reduced design, and then retrieves relative scores for deleted levels (this is done automatically by