Title: | Robust Mixture Regression |
---|---|
Description: | Finite mixture models are a popular technique for modelling unobserved heterogeneity or to approximate general distribution functions in a semi-parametric way. They are used in a lot of different areas such as astronomy, biology, economics, marketing or medicine. This package is the implementation of popular robust mixture regression methods based on different algorithms including: fleximix, finite mixture models and latent class regression; CTLERob, component-wise adaptive trimming likelihood estimation; mixbi, bi-square estimation; mixL, Laplacian distribution; mixt, t-distribution; TLE, trimmed likelihood estimation. The implemented algorithms includes: CTLERob stands for Component-wise adaptive Trimming Likelihood Estimation based mixture regression; mixbi stands for mixture regression based on bi-square estimation; mixLstands for mixture regression based on Laplacian distribution; TLE stands for Trimmed Likelihood Estimation based mixture regression. For more detail of the algorithms, please refer to below references. Reference: Chun Yu, Weixin Yao, Kun Chen (2017) <doi:10.1002/cjs.11310>. NeyKov N, Filzmoser P, Dimova R et al. (2007) <doi:10.1016/j.csda.2006.12.024>. Bai X, Yao W. Boyer JE (2012) <doi:10.1016/j.csda.2012.01.016>. Wennan Chang, Xinyu Zhou, Yong Zang, Chi Zhang, Sha Cao (2020) <arXiv:2005.11599>. |
Authors: | Sha Cao [aut, cph, ths], Wennan Chang [aut, cre], Chi Zhang [aut, ctb, ths] |
Maintainer: | Wennan Chang <[email protected]> |
License: | GPL |
Version: | 1.1.0 |
Built: | 2025-03-09 11:22:07 UTC |
Source: | https://github.com/changwn/robmixreg |
Tukey's bisquare family of functions.
biscalew(t)
biscalew(t)
t |
Numerical input, usually residuals. |
bisquare weight for scale.
Tukey's bisquare family of functions.
bisquare(t, k = 4.685)
bisquare(t, k = 4.685)
t |
Numerical input, usually residuals. |
k |
A constant tuning parameter, default is 4.685. |
A bi-square weight for mean.
Plot the coefficient matrix.
blockMap(rrr)
blockMap(rrr)
rrr |
The result from CSMR function |
The list which contain all the information to generate variables used in the real application.
CCLE_data
CCLE_data
A list whose length is 2:
Gene expression dataset.
AUCC score.
The list which contain all the information to generate variables used in the real application.
colon_data
colon_data
A list whose length is 3:
A string contains the name of binding protein and epigenetic regulator.
The gene expression profile of CREB3L1.
The methylation profile of cg16012690 on 299 colon adenocarcinoma patients.
x2
y2
x1
y1
The plot wrapper function.
compPlot(type = "rlr", x, y, nc, inds_in, res)
compPlot(type = "rlr", x, y, nc, inds_in, res)
type |
The character to choose which type of plot to generate. |
x |
The independent variables |
y |
The external variable |
nc |
The number of components |
inds_in |
A vector indicate the outlier samples. |
res |
The result object returned by MLM function. |
Compute the row space using SVD.
Compute_Rbase_SVD(bulk_data, tg_R1_lists_selected)
Compute_Rbase_SVD(bulk_data, tg_R1_lists_selected)
bulk_data |
The bulk data.. |
tg_R1_lists_selected |
A list of the marker genes for several cell types. |
A matrix which each row span the row space using cell type specific marker genes.
The main function of the RBSL algorithm.
CSMR(x, y, nit, nc, max_iter)
CSMR(x, y, nit, nc, max_iter)
x |
The matrix |
y |
The external supervised variable. |
nit |
xxx? |
nc |
The component number in the mixture model. |
max_iter |
The maximum iteration number. |
A list object consist of coefficient, clustering membership, data x, external variable y, predicted y based on regression model.
Perform the RBSL algorithm one times.
CSMR_one(x, y, nit = 1, nc, max_iter)
CSMR_one(x, y, nit = 1, nc, max_iter)
x |
The matrix |
y |
The external supervised variable. |
nit |
xxx? |
nc |
The component number in the mixture model. |
max_iter |
The maximum iteration number. |
A list object consist of coefficient, clustering membership, data x, external variable y, predicted y based on regression model.
The predict function of the CSMR algorithm.
CSMR_predict(CSMR_coffs, CSMR.model, xnew, ynew, singleMode = F)
CSMR_predict(CSMR_coffs, CSMR.model, xnew, ynew, singleMode = F)
CSMR_coffs |
The coefficient matrix. |
CSMR.model |
The trained model. |
xnew |
x variable. |
ynew |
y variable. |
singleMode |
A parameter to set the component to one. |
A list object consist of coefficient, clustering membership, data x, external variable y, predicted y based on regression model.
The train function of the CSMR algorithm.
CSMR_train(x, y, nit, nc, max_iter)
CSMR_train(x, y, nit, nc, max_iter)
x |
The matrix |
y |
The external supervised variable. |
nit |
xxx |
nc |
The component number in the mixture model. |
max_iter |
The maximum iteration number. |
A list object consist of coefficient, clustering membership, data x, external variable y, predicted y based on regression model.
CTLERob performes robust linear regression with high breakdown point and high efficiency in each mixing components and adaptively remove the outlier samples.
CTLERob(formula, data, nit = 20, nc = 2, rlr_method = "ltsReg") ## S4 method for signature 'formula,ANY,ANY,numeric' CTLERob(formula, data, nit = 20, nc = 2, rlr_method = "ltsReg")
CTLERob(formula, data, nit = 20, nc = 2, rlr_method = "ltsReg") ## S4 method for signature 'formula,ANY,ANY,numeric' CTLERob(formula, data, nit = 20, nc = 2, rlr_method = "ltsReg")
formula |
A symbolic description of the model to be fit. |
data |
A data frame containing the predictor and response variables, where the last column is the response varible. |
nit |
Number of iterations. |
nc |
Number of mixture components. |
rlr_method |
The regression methods, default is 'ltsReg'. |
Laplace distribution.
denLp(rr, sig)
denLp(rr, sig)
rr |
Shift from the location parameter |
sig |
Scale parameter. |
Laplace density.
Detect outlier observations from a vector.
DeOut(daData, method)
DeOut(daData, method)
daData |
A numerical vector. |
method |
Choose from '3sigma','hampel' and 'boxplot'. |
indices of outlier observations.
Mixture regression based on MLE could be unstable when assuming unequal variance. Multiple runs of flexmix is performed to stabilize the results.
flexmix_2(formula, data1, k, mprior)
flexmix_2(formula, data1, k, mprior)
formula |
A symbolic description of the model to be fit. |
data1 |
A data frame containing the predictor and response variables, where the last column is the response varible. |
k |
Number of mixture components. |
mprior |
A numeric number in (0,1) that specifies the minimum proportion of samples in each mixing components. |
A S4 object of flexmix class. xxx
A dataset generated from gaussian distribution in RobMixReg package.
gaussData
gaussData
A data frame with 100 rows and 3 variables:
x variable
y variable
cluster information
lars variant for LSA.
lars.lsa(Sigma0, b0, intercept, n, type = c("lasso", "lar"), eps = .Machine$double.eps, max.steps)
lars.lsa(Sigma0, b0, intercept, n, type = c("lasso", "lar"), eps = .Machine$double.eps, max.steps)
Sigma0 |
The parameter. |
b0 |
The intercept of the regression line. |
intercept |
The bool variable of whether consider the intercept situation |
n |
The number of data point. |
type |
Regression options, choose form "lasso" or "lar". |
eps |
The converge threshold defined by the machine. |
max.steps |
The maximum iteration times to stop. |
object.
Reference Wang, H. and Leng, C. (2006) and Efron et al. (2004).
S3 method for class 'mixtureReg'. However, it doesn't return a 'logLik' object. For simlicity, it returns a 'numeric' value.
logLik_mixtureReg(mixtureModel)
logLik_mixtureReg(mixtureModel)
mixtureModel |
mixtureReg object, typically result from 'mixtureReg()'. |
Return a numeric value of log likelihood.
Least square approximation. This version Oct 19, 2006.
lsa(obj)
lsa(obj)
obj |
lm/glm/coxph or other object. |
beta.ols: the MLE estimate ; beta.bic: the LSA-BIC estimate ; beta.aic: the LSA-AIC estimate.
Reference Wang, H. and Leng, C. (2006) and Efron et al. (2004).
An EM-type of parameter estimation by replacing the least square estimation in the M-step with a robust criterior.
mixlinrb_bi(formula, data, nc = 2, nit = 200) ## S4 method for signature 'formula,ANY,numeric,numeric' mixlinrb_bi(formula, data, nc = 2, nit = 20)
mixlinrb_bi(formula, data, nc = 2, nit = 200) ## S4 method for signature 'formula,ANY,numeric,numeric' mixlinrb_bi(formula, data, nc = 2, nit = 20)
formula |
A symbolic description of the model to be fit. |
data |
A data frame containing the predictor and response variables, where the last column is the response varible. |
nc |
Number of mixture components. |
nit |
Number of iterations for biSauqre method. |
Estimated coefficients of all components.
An EM-type of parameter estimation by replacing the least square estimation in the M-step with a robust criterior.
mixlinrb_bione(formula, data, nc = 2)
mixlinrb_bione(formula, data, nc = 2)
formula |
A symbolic description of the model to be fit. |
data |
A data frame containing the predictor and response variables, where the last column is the response varible. |
nc |
Number of mixture components. |
Estimated coefficients of all components.
mixLp estimates the mixture regression parameters robustly using bisquare function based on multiple initial values. The solution is found by the modal solution.
mixLp(formula, data, nc=2, nit=200) ## S4 method for signature 'formula,ANY,numeric,numeric' mixLp(formula, data, nc = 2, nit = 20)
mixLp(formula, data, nc=2, nit=200) ## S4 method for signature 'formula,ANY,numeric,numeric' mixLp(formula, data, nc = 2, nit = 20)
formula |
A symbolic description of the model to be fit. |
data |
A data frame containing the predictor and response variables, where the last column is the response varible. |
nc |
Number of mixture components. |
nit |
Number of iterations |
Estimated coefficients of all components.
library("RobMixReg") formula01=as.formula("y~x") x=(gaussData$x);y=as.numeric(gaussData$y); example_data01=data.frame(x,y) res = mixLp(formula01, example_data01, nc=2, nit=20)
library("RobMixReg") formula01=as.formula("y~x") x=(gaussData$x);y=as.numeric(gaussData$y); example_data01=data.frame(x,y) res = mixLp(formula01, example_data01, nc=2, nit=20)
Robust mixture regression assuming that the error terms follow a Laplace distribution.
mixLp_one(formula, data, nc = 2)
mixLp_one(formula, data, nc = 2)
formula |
A symbolic description of the model to be fit. |
data |
A data frame containing the predictor and response variables, where the last column is the response varible. |
nc |
Number of mixture components. |
Estimated coefficients of all components.
The main function in this package.
mixtureReg(regData, formulaList, xName = NULL, yName = NULL, mixingProb = c("Constant", "loess"), initialWList = NULL, epsilon = 1e-08, max_iter = 10000, max_restart = 15, min_lambda = 0.01, min_sigmaRatio = 0.1, silently = TRUE)
mixtureReg(regData, formulaList, xName = NULL, yName = NULL, mixingProb = c("Constant", "loess"), initialWList = NULL, epsilon = 1e-08, max_iter = 10000, max_restart = 15, min_lambda = 0.01, min_sigmaRatio = 0.1, silently = TRUE)
regData |
data frame used in fitting model. |
formulaList |
a list of the regression components that need to be estimated. |
xName |
character; Name used to pick x variable from data. |
yName |
character; Name used to pick y variable from data. |
mixingProb |
character; Specify how the mixing probabilities are estimated in the M step. "Constant" specifies a constant mixing probabilities; "loess" specifies predictor dependent mixing probabilities obtained by loess smoothing. |
initialWList |
a list of weights guesses (provided by user). Typically this is not used, unless the user has a good initial guess. |
epsilon |
a small value that the function consider as zero. The value is used in determine matrix sigularity and in determine convergence. |
max_iter |
the maximum number of iterations. |
max_restart |
the maximum number of restart before giving up. |
min_lambda |
a value used to ensure estimated mixing probabilities (lambda's) are not too close to zero. |
min_sigmaRatio |
a value used to prevent estimated variaces of any regression component from collapsing to zero. |
silently |
a switch to turn off the screen printout. |
A class 'mixtureReg' object.
The mixtureReg package is developed by Tianxia Zhou on github. All right reserved by Tianxia Zhou.
The main function of mining the latent relationship among variables.
MLM(ml.method = "rlr", rmr.method = "cat", b.formulaList = list(formula(y ~ x), formula(y ~ 1)), formula = y ~ x, nit = 1, nc = 2, x = NULL, y = NULL, max_iter = 50, tRatio = 0.05)
MLM(ml.method = "rlr", rmr.method = "cat", b.formulaList = list(formula(y ~ x), formula(y ~ 1)), formula = y ~ x, nit = 1, nc = 2, x = NULL, y = NULL, max_iter = 50, tRatio = 0.05)
ml.method |
The option to select the four methods in vignette. |
rmr.method |
The option to select the robust mixture regression method. |
b.formulaList |
The case b require the user provide the formula list. This enable the flexible mixture regression. |
formula |
The linear relationship between two variables. |
nit |
Number of iterations for CTLE, mixbi, mixLp. |
nc |
Number of mixture components. |
x |
The matrix x of the high dimension situation. |
y |
The external outcome variable. |
max_iter |
Maximum iteration for TLE method. |
tRatio |
The ratio of the outliers in the TLE robust mixture regression method. |
Main result object.
Model selection function for low dimension data.
MLM_bic(ml.method = "rlr", x, y, nc = 1, formulaList = NULL, K = 2)
MLM_bic(ml.method = "rlr", x, y, nc = 1, formulaList = NULL, K = 2)
ml.method |
The parameter to choose the fitted model for calculating the BIC |
x |
x variable. |
y |
y variable. |
nc |
The component number for low dimensional feature |
formulaList |
The list of target formular |
K |
The component number for high dimensional feature |
BIC value.
Cross validation (fold-5) function for high dimension data.
MLM_cv(x = NULL, y = NULL, nit = 1, nc = 2, max_iter = 50)
MLM_cv(x = NULL, y = NULL, nit = 1, nc = 2, max_iter = 50)
x |
x variable. |
y |
y variable. |
nit |
Iteration number. |
nc |
The number of component. |
max_iter |
Maximum iteration. |
The correlation between y and y_hat based on five fold cross validation.
Rearrange X and Y coordinates before calling "lines()" function.
orderedLines(x, y, ...)
orderedLines(x, y, ...)
x |
X coordinate vectors of points to join. |
y |
Y coordinate vectors of points to join. |
... |
Further graphical parameters. |
CTLERob performes robust linear regression with high breakdown point and high efficiency in each mixing components and adaptively remove the outlier samples.
plot_CTLE(formula, data, nc = 2, inds_in) ## S4 method for signature 'formula,ANY,numeric' plot_CTLE(formula, data, nc = 2, inds_in)
plot_CTLE(formula, data, nc = 2, inds_in) ## S4 method for signature 'formula,ANY,numeric' plot_CTLE(formula, data, nc = 2, inds_in)
formula |
A symbolic description of the model to be fit. |
data |
A data frame containing the predictor and response variables, where the last column is the response varible. |
nc |
Number of mixture components. |
inds_in |
The index of the point which belongs to the current regression line. |
S3 plot method for class 'mixtureReg'.
plot_mixtureReg(mixtureModel, which = 1:2, xName = NULL, yName = NULL, xlab = NULL, ylab = NULL, ...)
plot_mixtureReg(mixtureModel, which = 1:2, xName = NULL, yName = NULL, xlab = NULL, ylab = NULL, ...)
mixtureModel |
mixtureReg object, typically result from 'mixtureReg()'. |
which |
numeric; choose which plot to display. '1' gives a plot of fit; '2' gives a plot of mixing probability. |
xName |
character; Name used to pick x variable from data. |
yName |
character; Name used to pick y variable from data. |
xlab |
character; label that should be put on the x axis. |
ylab |
character; label that should be put on the y axis. |
... |
Further graphical parameters. |
Feed in a list of mixtureReg models and get an overlayed plot.
plot_mixtureRegList(mixtureRegList, xName = NULL, yName = NULL, ...)
plot_mixtureRegList(mixtureRegList, xName = NULL, yName = NULL, ...)
mixtureRegList |
a list of multiple mixtureReg objects. |
xName |
character; Name used to pick x variable from data. |
yName |
character; Name used to pick y variable from data. |
... |
Further graphical parameters. |
Adaptive lasso.
Rec_Lm(XX, yy)
Rec_Lm(XX, yy)
XX |
The independent variable. |
yy |
The dependent variable. |
A list object consist of index of selected variable and coefficient for all variables.
The main function of Robust Mixture Regression using five methods.
rmr(lr.method = "flexmix", formula = NULL, data = NULL, nc = 2, nit = 20, tRatio = 0.05, MaxIt = 200)
rmr(lr.method = "flexmix", formula = NULL, data = NULL, nc = 2, nit = 20, tRatio = 0.05, MaxIt = 200)
lr.method |
A robust mixture regression method to be used. Should be one of "flexmix", "TLE", "CTLERob", "mixbi","mixLp". |
formula |
A symbolic description of the model to be fit. |
data |
A data frame containing the predictor and response variables, where the last column is the response varible. |
nc |
Number of mixture components. |
nit |
Number of iterations for CTLE, mixbi, mixLp. |
tRatio |
Trimming proportion for TLE method. |
MaxIt |
Maximum iteration for TLE method. |
An S4 object about the regression result.
library(RobMixReg) #library(robust) library(flexmix) library(robustbase) library(MASS) library(gtools) # gaussData x=(gaussData$x);y=as.numeric(gaussData$y); formula01=as.formula("y~x") example_data01=data.frame(x,y) res_rmr = rmr(lr.method='flexmix', formula=formula01, data=example_data01) res_rmr = rmr(lr.method='CTLERob', formula=formula01, data=example_data01)
library(RobMixReg) #library(robust) library(flexmix) library(robustbase) library(MASS) library(gtools) # gaussData x=(gaussData$x);y=as.numeric(gaussData$y); formula01=as.formula("y~x") example_data01=data.frame(x,y) res_rmr = rmr(lr.method='flexmix', formula=formula01, data=example_data01) res_rmr = rmr(lr.method='CTLERob', formula=formula01, data=example_data01)
Class RobMixReg
defines a robust mixture regression class as a S4 object.
inds_in
The indices of observations used in the parameter estimation.
indout
The indices of outlier samples, not used in the parameter estimation.
ctleclusters
The cluster membership of each observation.
compcoef
Regression coefficients for each component.
comppvals
Component p values.
compwww
The posterior of the clustering.
call
Call function.
Simulate high dimension data for RBSL algorithm validation.
simu_data_sparse(n, bet, pr, sigma)
simu_data_sparse(n, bet, pr, sigma)
n |
Patient number. |
bet |
The coefficient matrix. |
pr |
A vector of probability threshold which simulate the sampling based on uniform distribution. |
sigma |
A vector of noise level. The length should be equal to the component number. |
A list object consist of x, y, true cluster label.
The simulation function for low/high dimensional space.
simu_func(beta, sigma, alpha = NULL, n = 400)
simu_func(beta, sigma, alpha = NULL, n = 400)
beta |
The slope vector for low dimensional space or matrix for high dimensional space. |
sigma |
A vector whose k-th element is the standard deviation for the k-th regression component. |
alpha |
The parameter to control the number of outliers for low dimensional space. |
n |
The sample number for high dimensional data. |
A list object.
The simulation function for low dimensional space.
simu_low(beta, inter, alpha = NULL)
simu_low(beta, inter, alpha = NULL)
beta |
The slope vector. |
inter |
The intercept vector. |
alpha |
The parameter to control the number of outliers. |
A list object consists of the x variable in low dimensional space and the external y variable.
A simulation dataset from RobMixReg package. This simulation dataset is in dimension 2 and ground truth (include outliers label) of the cluster information also generated.
simuData
simuData
A data frame with 500 rows and 5 variables:
X1 variable
X2 variable
y variable
cluster information
outlier indicator
The algorithm fits a mixture regression model after trimming a proportion of the observations, given by tRatio.
TLE(formula, data, nc = 2, tRatio, MaxIt = 200) ## S4 method for signature 'formula,ANY,numeric,numeric,numeric' TLE(formula, data, nc = 2, tRatio, MaxIt = 200)
TLE(formula, data, nc = 2, tRatio, MaxIt = 200) ## S4 method for signature 'formula,ANY,numeric,numeric,numeric' TLE(formula, data, nc = 2, tRatio, MaxIt = 200)
formula |
A symbolic description of the model to be fit. |
data |
A data frame containing the predictor and response variables, where the last column is the response varible. |
nc |
Number of mixture components. |
tRatio |
Trimming proportion. |
MaxIt |
Maximum iteration. |
A S4 object of RobMixReg class.
library("RobMixReg") formula01=as.formula("y~x") x=(gaussData$x);y=as.numeric(gaussData$y); example_data01=data.frame(x,y) res = TLE(formula01,example_data01, nc=2,tRatio=0.05,MaxIt=200)
library("RobMixReg") formula01=as.formula("y~x") x=(gaussData$x);y=as.numeric(gaussData$y); example_data01=data.frame(x,y) res = TLE(formula01,example_data01, nc=2,tRatio=0.05,MaxIt=200)