Mathematical Sciences

School of Natural Sciences & Mathematics

Statistics Seminar

Unless otherwise noted, the Statistics Seminar will be held on Fridays @ 3:30 pm at FO 1.502, and it will be live-streamed through Microsoft Teams (click Online to join).

For questions about the Statistics Seminar, please contact Qiwei Li or Chuan-Fa Tang. The Statistics Seminar was organized by Yulia R. Gel in Spring 2014, Fall 2014, and Spring 2015. The archive of talks from the previous semester can be found here.

List of talks in Academic Year 2022-23

Date & Location Speaker Title

Jan 21, 3:30 pm

FO 1.502

Online

Recording

Li Wang

Mathematics,

University of Texas at Arlington

Probabilistic semi-supervised learning via sparse graph structure learning

Jan 28, 3:30 pm

FO 1.502

Online

Recording

Yingbo Li

 

TBA

Feb 11, 3:30 pm

FO 1.502

Online

Recording

Guanyu Hu

Statistics,

University of Missouri

Bayesian heterogeneity learning for field goal attempts of professional basketball players

Feb 18, 3:30 pm

FO 1.502

Online

Recording

Andrey Sarantsev

Mathematics and Statistics,

University of Nevado, Reno

TBA

Feb 25, 3:30 pm

FO 1.502

Online

Recording

MinJae Lee

Population and Data Sciences, 

University of Texas Southwestern

TBA

Mar 04, 3:30 pm

FO 1.502

Online

Recording

Quan Zhou

Statistics,

Texas A&M University

TBA

Mar 25, 3:30 pm

FO 1.502

Online

Recording

Yulun Liu

Population and Data Sciences, 

University of Texas Southwestern

TBA

Apr 1, 11:00 am

TBA

Online

Recording

Jeffrey Morries

Biostatistics,

University of Pennsylvania

TBA

Apr 8, 3:00 pm

FO 1.502

Online

Recording

Hong Zhu

Population and Data Sciences, 

University of Texas Southwestern

Sample size considerations for matched-pair cluster randomization design with incomplete observations of binary outcomes

Apr 15, 3:00 pm

FO 1.502

Online

Recording

Zhangdong Liu

BCM

TBA

Apr 22, 3:00 pm

FO 1.502

Online

Recording

Boxiang Wang

Statistics and Acturial Sciences,

University of Iowa

TBA

Apr 29, 3:00 pm

FO 1.502

Online

Recording

Daniel Heitjan

Statistics,

Southern Methodist University

Measuring sensitivity to nonignorable incompleteness

 

List of talk abstracts in Academic Year 2022-23

Probabilistic semi-supervised learning via sparse graph structure learning

Li Wang (UTA), Jan 21, 2022

We present a probabilistic semi-supervised learning (SSL) framework based on sparse graph structure learning. Different from existing SSL methods with either a predefined weighted graph heuristically constructed from the input data or a learned graph based on the locally linear embedding assumption, the proposed SSL model is capable of learning a sparse weighted graph from the unlabeled high-dimensional data and a small amount of labeled data, as well as dealing with the noise of the input data. Our representation of the weighted graph is indirectly derived from a unified model of density estimation and pairwise distance preservation in terms of various distance measurements, where latent embeddings are assumed to be random variables following an unknown density function to be learned and pairwise distances are then calculated as the expectations over the density for the model robustness to the data noise. Moreover, the labeled data based on the same distance representations is leveraged to guide the estimated density for better class separation and sparse graph structure learning. A simple inference approach for the embeddings of unlabeled data based on point estimation and kernel representation is presented. Extensive experiments on various data sets show the promising results in the setting of SSL compared with many existing methods, and significant improvements on small amounts of labeled data.

 

Bayesian heterogeneity learning for field goal attempts of professional basketball players

Guanyu Hu (University of Missouri), Jan 28, 2022

In this talk, I will introduce Bayesian learning approaches to analyze the underlying heterogeneity structure of field goal attempts among professional basketball players in the NBA. Generally, we propose a mixture of finite mixtures (MFM) model to capture the heterogeneity of field goal attempts among different players. Our proposed method can simultaneously estimate the number of groups and group configurations. An efficient Markov Chain Monte Carlo (MCMC) algorithm is developed for our proposed model. Simulation studies have been conducted to demonstrate its performance. Ultimately, our proposed learning approach is further illustrated in analyzing shot charts of selected players in the NBA’s 2017-2018 regular season.

 

Sample size considerations for matched-pair cluster randomization design with incomplete observations of binary outcomes

Hong Zhu (University Texas Southwestern), Apr 8, 2022

Multiple public health and medical research studies have applied matched-pair cluster randomization design to the evaluation of the intervention and/or prevention effects. One of the most common and severe problems faced by researchers when conducting cluster randomized trials (CRTs) is incomplete observations, which are associated with various reasons causing the individuals to discontinue participating in the trials. Although statistical methods to remedy the problems of missing data have already been proposed, there are still methodological gaps in research concerning the determination of sample size in matched-pair CRTs with incomplete binary outcomes. One conventional method for adjusting for missing data in the sample size determination is to divide the sample size under complete data by the expected follow-up rate. However, such crude adjustment ignores the impact of the structure and strength of correlations regarding both outcome data and missing data mechanism. We propose a closed-form sample size formula for matched-pair CRTs with incomplete binary outcomes, which appropriately accounts for different missing patterns and magnitudes as well as the effects of matching and clustering on the outcome and missing data. The generalized estimating equation (GEE) approach treats incomplete observations as missing data in a marginal logistic regression model, which flexibly accommodates various types of intraclass correlation, missing patterns, and missing proportions. In the presence of missing data, the proposed GEE sample size method provides higher accuracy as compared with the conventional method. We assess the performance of the proposed method by simulations, and apply the proposed method to design a real-world matched-pair CRT to examine the effect of a team-based approach on controlling blood pressure (BP).

 

Measuring sensitivity to nonignorable incompleteness

Daniel Heitjan (Southern Methodist University), Apr 29, 2022

Statisticians have long recognized the potential biasing effects of nonignorable missing data mechanisms. For example, if, say, larger observations are more likely to be missing or censored, then standard estimates such as the sample mean of the observed data (when some subjects are missing) or the Kaplan-Meier curve (when some subjects are censored) are invalid. Unfortunately, methods that attempt to estimate or test the degree of nonignorability are unsatisfactory, thanks to conceptual and numerical difficulties associated with nonignorable modeling. How then shall we handle such data sets? My idea is to embed the reference ignorable model (under which the standard analysis is valid) in a nonignorable model (under which the standard analysis is potentially invalid) in which a nonignorability parameter represents the degree of departure from missing at random. I then conduct a sensitivity analysis to evaluate the dependence of the MLE of the parameter of interest (as a function of the nonignorability parameter) on the degree of nonignorability. If it takes a large value of the nonignorability parameter to substantially affect the estimate of interest, we judge the standard analysis to be insensitive. In this talk, I describe an approach to such a sensitivity analysis based on the index of local sensitivity to nonignorability (ISNI) statistic. An R package is available to conduct this analysis for the univariate GLM and a range of models for clustered or longitudinal data. I will demonstrate applications in live-data examples.