Skip to main content

statistics

Ghost Data

As natural as real data, ghost data is everywhere—it is just data that you cannot see.  We need to learn how to handle it, how to model with it, and how to put it to work.  Some examples of ghost data are (see, Sall, 2017):  

  1. Virtual data—it isn’t there until you look at it; 

  1. Missing data—there is a slot to hold a value, but the slot is empty; 

  1. Pretend data—data that is made up;  

  1. Highly Sparse Data—whose absence implies a near zero, and 

  1. Simulation data—data to answer “what if.” 

For example, absence of evidence/data is not evidence of absence.  In fact, it can be evidence of something.  More Ghost Data can be extended to other existing areas: Hidden Markov Chain, Two-stage Least Square Estimate, Optimization via Simulation, Partition Model, Topological Data, just to name a few.  

Three movies will be used for illustration in this talk: (1) “The Sixth Sense” (Bruce Wallis)—I can see things that you cannot see; (2) “Sherlock Holmes” (Robert Downey)—absence of expected facts; and (3) “Edge of Tomorrow” (Tom Cruise)—how to speed up your learning.  It will be helpful, if you watch these movies before coming to my talk.   This is an early stage of my research in this area--any feedback from you is deeply appreciated.  Much of the basic idea is highly influenced via John Sall (JMP-SAS).   

 

Dr. Dennis K. J. Lin is a Distinguished Professor in the Department of Statistics at Purdue University.   Prior to his current job, he was a University Distinguished Professor of Supply Chain Management and Statistics at Penn State.  His research interests are quality assurance, industrial statistics, data mining, and response surface. He has published nearly 300 SCI/SSCI papers in a wide variety of journals.  He currently serves or has served as an associate editor for more than 10 professional journals and was a co-editor for Applied Stochastic Models for Business and Industry.  Dr. Lin is an elected fellow of ASA, IMS, ASQ, and RSS, an elected member of ISI, and a lifetime member of ICSA. He is an honorary chair professor for various universities, including a Chang-Jiang Scholar at Renmin University of China, Fudan University, and National Taiwan Normal University.   His recent awards include, the Youden Address (ASQ, 2010), the Shewell Award (ASQ, 2010), the Don Owen Award (ASA, 2011), the Loutit Address (SSC, 2011), the Hunter Award (ASQ, 2014), the Shewhart Medal (ASQ, 2015), and the SPES Award (ASA, 2016), the Chow Yuan-Shin Award (2019), and the Deming Lecturer Award (JSM, 2020).  His most recent honor is the Outstanding Alumni Award from National Tsing Hua University (Taiwan, 2022). 

Date:
-
Location:
MDS 220
Tags/Keywords:
Type of Event (for grouping events):

Bayesian Regression for Group Testing Data

Abstract: Group testing involves pooling individual specimens (e.g., blood, urine, swabs, etc.) and testing the pools for the presence of a disease. When individual covariate information is available (e.g., age, gender, number of sexual partners, etc.), a common goal is to relate an individual's true disease status to the covariates in a regression model. Estimating this relationship is a nonstandard problem in group testing because true individual statuses are not observed and all testing responses (on pools and on individuals) are subject to misclassification arising from assay error. Previous regression methods for group testing data can be inefficient because they are restricted to using only initial pool responses and/or they make potentially unrealistic assumptions regarding the assay accuracy probabilities. To overcome these limitations, we propose a general Bayesian regression framework for modeling group testing data. The novelty of our approach is that it can be easily implemented with data from any group testing protocol. Furthermore, our approach will simultaneously estimate assay accuracy probabilities (along with the covariate effects) and can even be applied in screening situations where multiple assays are used. We apply our methods to group testing data collected in Iowa as part of statewide screening efforts for chlamydia.

Date:
-
Location:
MDS 220
Tags/Keywords:
Type of Event (for grouping events):

Statistics Tutoring Center

 

The Statistics Tutoring Center (TC) provides free tutoring for students enrolled in STA 210 and STA 296. The tutors are graduate students in statistics who are currently teaching or assisting in these classes. 

The TC offers both online and in-person hours.  The in-person hours are in the Multidisciplinary Science Center (MDS) 333. The online hours are in a Canvas conference (Big Blue Button) in a dedicated Canvas shell.

Subscribe to statistics