报告人:吴静静教授
报告时间:6月11号 下午 3:00-4:00
报告地点: 英国威廉希尔公司一楼北研教室
报告摘要:
The advancement of microarray technology has greatly facilitated the research in gene expression based classification of patient samples. For example, in cancer research, microarray gene expression data has been used for cancer or tumor classification. When the study is only focusing on two classes, for example two different cancer types, we propose a two-sample semiparametric model to model the distributions of gene expression level for different classes. To estimate the parameters, we consider both maximum semiparametric likelihood estimate (MLE) and minimum Hellinger distance estimate (MHDE). For each gene, Wald statistic is constructed based on either the MLE or MHDE. Significance test is then performed on each gene. We exploit the idea of weighted sum of misclassification rates to develop a novel classification model, in which previously identified significant genes only are involved. To testify the usefulness of our proposed method, we consider a predictive approach. We apply our method to analyze the acute leukemia data of Golub et al. (1999) in which a training set is used to build the classification model and the testing set is used to evaluate the accuracy of our classification model.