An extended variable inclusion and shrinkage algorithm for correlated variables
The problem of variable selection for linear regression in a high dimension model is considered. A new method, called Extended-VISA (Ext-VISA), is proposed to simultaneously select variables and encourage a grouping effect where strongly correlated predictors tend to be in or out of the model together. Moreover, Ext-VISA is capable of selecting a sparse model while avoiding the overshrinkage of a Lasso-type estimator. It combines the idea of the VISA algorithm which avoids the overshrinkage problem of regression coefficients and those of the Lasso-type estimators, based on â1+â2 penalty, that overcome the limitation of the grouping effect of Lasso in high dimension. It is based on a modified VISA algorithm, so it is also computationally efficient. Three interesting cases of Ext-VISA are examined. The first case is Smooth-VISA (SVISA), where the variations among successive regression coefficients are low. The second case is VISA-Net (VNET), where the correlations between predictors are taken into account. The third case is Laplacian-VISA (LVISA), where the predictors are measured on an undirected graph. A theoretical property on sparsity inequality of Ext-VISA is established. A detailed simulation study in small and high dimensional settings is performed, which illustrates the advantages of the new approach in relation to several other possible methods. Finally, we apply VNET, SVISA and LVISA to a GC-retention data set.