We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ME

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Statistics > Methodology

Title: Linear regression and its inference on noisy network-linked data

Abstract: Linear regression on a set of observations linked by a network has been an essential tool in modeling the relationship between response and covariates with additional network data. Despite its wide range of applications in many areas, such as in social sciences and health-related research, the problem has not been well-studied in statistics so far. Previous methods either lack inference tools or rely on restrictive assumptions on social effects and usually assume that networks are observed without errors, which is unrealistic in many problems. This paper proposes a linear regression model with nonparametric network effects. The model does not assume that the relational data or network structure is exactly observed; thus, the method can be provably robust to a certain network perturbation level. A set of asymptotic inference results is established under a general requirement of the network observational errors, and the robustness of this method is studied in the specific setting when the errors come from random network models. We discover a phase-transition phenomenon of the inference validity concerning the network density when no prior knowledge of the network model is available while also showing a significant improvement achieved by knowing the network model. As a by-product of this analysis, we derive a rate-optimal concentration bound for random subspace projection that may be of independent interest. Extensive simulation studies are conducted to verify these theoretical results and demonstrate the advantage of the proposed method over existing work in terms of accuracy and computational efficiency under different data-generating models. The method is then applied to middle school students' network data to study the effectiveness of educational workshops in reducing school conflicts.
Subjects: Methodology (stat.ME)
Cite as: arXiv:2007.00803 [stat.ME]
  (or arXiv:2007.00803v3 [stat.ME] for this version)

Submission history

From: Tianxi Li [view email]
[v1] Wed, 1 Jul 2020 23:01:22 GMT (1272kb,D)
[v2] Tue, 9 Mar 2021 15:54:32 GMT (3161kb,D)
[v3] Fri, 5 Nov 2021 20:14:00 GMT (5314kb,D)

Link back to: arXiv, form interface, contact.