Online inference in high-dimensional regression with streaming clustered data
Abstract
Due to the rapidly expanding volume and velocity of data in a dynamic manner, clustered data analysis faces new challenges, and it is impossible to store such an ever-increasing amount of data in memory. The purpose of this paper is to develop an online method for estimating and inferring unknown parameters in linear mixed-effects models with high-dimensional streaming data. Instead of re-accessing the entire raw data, we update the estimators by leveraging the current batch of new data and the summary statistics obtained from historical data. To achieve this goal, we adopt the quasi-likelihood approach that applies to a high-dimensional setting and can ease the computational burden. Theoretical results regarding estimation consistency and asymptotic normality for the developed online estimators are established, which provide support for real-time decisions with streaming data. Extensive simulation studies are conducted to evaluate the effectiveness of the proposed method. Moreover, we consider real applications to the Communities and Crime dataset as well as the ABIDE dataset.