Statistics Seminar
Unless otherwise noted, the Statistics Seminar will be held on Fridays @ 3:30 pm at FO 1.502, and it will be livestreamed through Microsoft Teams (click Online to join).
For questions about the Statistics Seminar, please contact Qiwei Li or ChuanFa Tang. The Statistics Seminar was organized by Yulia R. Gel in Spring 2014, Fall 2014, and Spring 2015. The archive of talks from the previous semester can be found here.
List of talks in Academic Year 202223
Date & Location  Speaker  Title 
Jan 21, 3:30 pm Online Recording 
Li Wang Mathematics, University of Texas at Arlington 
Probabilistic semisupervised learning via sparse graph structure learning 
Jan 28, 3:30 pm Online Recording 
Yingbo Li

TBA 
Feb 11, 3:30 pm Online Recording 
Guanyu Hu Statistics, University of Missouri 
Bayesian heterogeneity learning for field goal attempts of professional basketball players 
Feb 18, 3:30 pm Online Recording 
Andrey Sarantsev Mathematics and Statistics, University of Nevado, Reno 
TBA 
Feb 25, 3:30 pm Online Recording 
MinJae Lee Population and Data Sciences, University of Texas Southwestern 
TBA 
Mar 04, 3:30 pm Online Recording 
Quan Zhou Statistics, Texas A&M University 
TBA 
Mar 25, 3:30 pm Online Recording 
Yulun Liu Population and Data Sciences, University of Texas Southwestern 
TBA 
Apr 1, 11:00 am TBA Online Recording 
Jeffrey Morries Biostatistics, University of Pennsylvania 
TBA 
Apr 8, 3:00 pm Online Recording 
Hong Zhu Population and Data Sciences, University of Texas Southwestern 

Apr 15, 3:00 pm Online Recording 
Zhangdong Liu BCM 
TBA 
Apr 22, 3:00 pm Online Recording 
Boxiang Wang Statistics and Acturial Sciences, University of Iowa 
TBA 
Apr 29, 3:00 pm Online Recording 
Daniel Heitjan Statistics, Southern Methodist University 
List of talk abstracts in Academic Year 202223
Probabilistic semisupervised learning via sparse graph structure learning
Li Wang (UTA), Jan 21, 2022
We present a probabilistic semisupervised learning (SSL) framework based on sparse graph structure learning. Different from existing SSL methods with either a predefined weighted graph heuristically constructed from the input data or a learned graph based on the locally linear embedding assumption, the proposed SSL model is capable of learning a sparse weighted graph from the unlabeled highdimensional data and a small amount of labeled data, as well as dealing with the noise of the input data. Our representation of the weighted graph is indirectly derived from a unified model of density estimation and pairwise distance preservation in terms of various distance measurements, where latent embeddings are assumed to be random variables following an unknown density function to be learned and pairwise distances are then calculated as the expectations over the density for the model robustness to the data noise. Moreover, the labeled data based on the same distance representations is leveraged to guide the estimated density for better class separation and sparse graph structure learning. A simple inference approach for the embeddings of unlabeled data based on point estimation and kernel representation is presented. Extensive experiments on various data sets show the promising results in the setting of SSL compared with many existing methods, and significant improvements on small amounts of labeled data.
Bayesian heterogeneity learning for field goal attempts of professional basketball players
Guanyu Hu (University of Missouri), Jan 28, 2022
In this talk, I will introduce Bayesian learning approaches to analyze the underlying heterogeneity structure of field goal attempts among professional basketball players in the NBA. Generally, we propose a mixture of finite mixtures (MFM) model to capture the heterogeneity of field goal attempts among different players. Our proposed method can simultaneously estimate the number of groups and group configurations. An efficient Markov Chain Monte Carlo (MCMC) algorithm is developed for our proposed model. Simulation studies have been conducted to demonstrate its performance. Ultimately, our proposed learning approach is further illustrated in analyzing shot charts of selected players in the NBA’s 20172018 regular season.
Sample size considerations for matchedpair cluster randomization design with incomplete observations of binary outcomes
Hong Zhu (University Texas Southwestern), Apr 8, 2022
Multiple public health and medical research studies have applied matchedpair cluster randomization design to the evaluation of the intervention and/or prevention effects. One of the most common and severe problems faced by researchers when conducting cluster randomized trials (CRTs) is incomplete observations, which are associated with various reasons causing the individuals to discontinue participating in the trials. Although statistical methods to remedy the problems of missing data have already been proposed, there are still methodological gaps in research concerning the determination of sample size in matchedpair CRTs with incomplete binary outcomes. One conventional method for adjusting for missing data in the sample size determination is to divide the sample size under complete data by the expected followup rate. However, such crude adjustment ignores the impact of the structure and strength of correlations regarding both outcome data and missing data mechanism. We propose a closedform sample size formula for matchedpair CRTs with incomplete binary outcomes, which appropriately accounts for different missing patterns and magnitudes as well as the effects of matching and clustering on the outcome and missing data. The generalized estimating equation (GEE) approach treats incomplete observations as missing data in a marginal logistic regression model, which flexibly accommodates various types of intraclass correlation, missing patterns, and missing proportions. In the presence of missing data, the proposed GEE sample size method provides higher accuracy as compared with the conventional method. We assess the performance of the proposed method by simulations, and apply the proposed method to design a realworld matchedpair CRT to examine the effect of a teambased approach on controlling blood pressure (BP).
Measuring sensitivity to nonignorable incompleteness
Daniel Heitjan (Southern Methodist University), Apr 29, 2022
Statisticians have long recognized the potential biasing effects of nonignorable missing data mechanisms. For example, if, say, larger observations are more likely to be missing or censored, then standard estimates such as the sample mean of the observed data (when some subjects are missing) or the KaplanMeier curve (when some subjects are censored) are invalid. Unfortunately, methods that attempt to estimate or test the degree of nonignorability are unsatisfactory, thanks to conceptual and numerical difficulties associated with nonignorable modeling. How then shall we handle such data sets? My idea is to embed the reference ignorable model (under which the standard analysis is valid) in a nonignorable model (under which the standard analysis is potentially invalid) in which a nonignorability parameter represents the degree of departure from missing at random. I then conduct a sensitivity analysis to evaluate the dependence of the MLE of the parameter of interest (as a function of the nonignorability parameter) on the degree of nonignorability. If it takes a large value of the nonignorability parameter to substantially affect the estimate of interest, we judge the standard analysis to be insensitive. In this talk, I describe an approach to such a sensitivity analysis based on the index of local sensitivity to nonignorability (ISNI) statistic. An R package is available to conduct this analysis for the univariate GLM and a range of models for clustered or longitudinal data. I will demonstrate applications in livedata examples.