
Selected
articles in statistics for astronomy & physics
Two types of articles are
presented
here: broad reviews or discussions of statistical issues relevant to
the observational physical sciences; and examples of recent statistical
advances to illustrate the stateoftheart. Papers from both the
mathematical statistical and physical science literature are
included. While not required, a strong preference is given to
articles whose full text is available online without constraint.
Articles are selected by the Center for Astrostatistics Board and
Associates; please forward additional selections to Eric Feigelson
(edf@astro.psu.edu).
 Comparison of
Bayesian and
frequentist approaches
 Bayesians,
frequentists, and scientists
by Bradley Efron. A brief, readable and fascinating view of
21st century
applications of statistics to modern science involving a combination of
frequentist and Bayesian approaches. Topics include bootstrap,
empirical Bayes and false discovery rate. 2005 ASA Presidential
Address. B. Efron (2005), JASA
100, 1
 Bayesian,
frequentists and physicists
A similar article oriented towards particle physics.
Topics
include FeldmanCousins bounds, model selection, JamesStein
estimation, empirical Bayes. Appears in PhyStat 2003.
 Bayesian
reasoning vs. conventional statistics in high energy physics
by G. D'Agostini. Another valuable discussion comparing
frequentist and Bayesian methods in the physical sciences. Talk
at MaxEnt98 conference.
 Confidence interval
and
limits with small signals
 A unified
approach to the classical statistical analysis of small signals
by Gary J. Feldman & Robert D. Cousins. Seminal
study
in particle physics for construction of confidence intervals and limits
when low or zerosignal is present. A bibliography of related
papers appears here.
Phys. Rev. D, 57,
38733889 (1998).
 Lectures
on statistics and numerical methods in HEP
by Frank Porter & Roger Barlow. Lectures on high
energy
physics given to the SLAC Users Organization in 2000.
 A
Fully Bayesian Computation of Upper Limits for Poisson Processes
by Luc Demortier. Detailed treatment (2004).
 Image processing
 Bayesian
restoration
of digital images employing Markov chain Monte Carlo
by K. P. N. Murthy. Invited review. (2005).
 Morphological
classification of galaxies by shapelet decomposition in the Sloan
Digital Sky Survey. II. Multiwavelength classification
by B. C. Kelly and T. A. McKay. Principal components
analysis of shapelet coefficients (after inclination correction) and a
normal mixture model leads to a classification of galaxy
morphologies. Astron. J. 129.
12871310 (2005).
 Bootstrap
resampling as a tool for radiointerferometric imaging fidelity
assessment
by Athol Kemball and Adam Martinsek. Modelbased and
subsample bootstrap methods are examined to test fidelity of features
in radio interoferometric imaging. Astron. J. 129, 1760 (2005).
 Multiscale
likelihood analysis and complexity penalized estimation
by Eric D. Kolaczyk and Robert D. Nowak. A
mathematical
framework is presented for the application multiscale models (e.g.
wavelet decomposition) to count (Poisson) and catagorical (binomial) as
well as Gaussian data. Here the databased likelihood is subject
to a multiscale factorization; in the Poisson case, it involves a
recursive partitioning. Application to photoncounting images is
envisioned. Annals of
Statistics 32, 500527 (2004)
 Massive data sets
 Class
discovery
in galaxy classification
by David Bazell and David J. Miller. Application of
neural
network mixture models to the star/galaxy classification problem.
Astrophys. J. 618, 72332
(2005).
 Statistical
challenges with massive data sets in particle physics
by Bruce Kunteson & Paul Padley. Review of
particle
physics problems for statisticians. Journal of Computational
& Graphical Statistics (2003).
 Bayesian methodology
 Significance
in
gammaray astronomy  the Li & Ma problem in Bayesian statistics
by S. Gillessen and H. L. Harney. The significance of
gammaray source existence in a Poisson data set with high background
is examined in a Bayesian context. Astron & Astrophys 430, 35562 (2005).
 Formal
rules for selecting prior distributions: A review and annotated
bibliography
by Robert E. Kass and Larry Wasserman. Discussion of
noninformative (nonsubjective) priors used in Bayesian inference with
emphasis on Jeffreys's rules. J
Am Stat Assn 91, 1343 (1996).
 Reviews
and research on Bayesian inference in astrophysics
by Thomas Loredo. Several substantial lectures and
research
articles from 1989 to 2003 on the principles and prospects for Bayesian
methods in astronomy. Applications include Gaussian & Poisson
problems (e.g. the neutrinos from SN 1987A), gamma ray bursts,
periodograms for time series, spatial analysis of cosmic microwave
background radiation, adaptive experimental design, and computational
techniques.
Poisson processes
 Equivalence
theory for density estimation, Poisson processes and Gaussian white
noise with drift
by Lawrence D. Brown et al. This very mathematical
paper
gives an example of a current theoretical study of Poisson processes
which are often seen in astronomical and physics observations. Annals of Statistics, 32, 207497 (2004).
 Multivariate analysis
 Analysis
of Variance  Why it is more important than ever
by Andrew Gelman. Discussion of ANOVA, a classical
multivariate technique involving the structuring of regression
coefficients into batches to improve prediction, in terms of
exploratory data analysis, linear modeling, and hierarchical Bayesian
regression. Annals of
Statistics 33, 1 (2005)
 Least
angle regression
by Bradley Efron et al. This discusses various
computational
efficient methods for model selection in leastsquares multiple
regression; e.g. predicting redshift from a large database of
properties of extragalactic objects. Least angle regression is
compared to the Lasso, boosting, and traditional stepwise regression
techniques. Annals of
Statistics 32, 40751 (2004) with commentaries.
Nonparametric statistics
 Spectral
classification technique for Xray sources: Quartile analysis
by Jaesub Hong et al.
Use of the median and ratio of quartiles to characterize
the spectra of faint CCD Xray sources. Astrophys. J. 614,
50817 (2004).
 Time series analysis
 Time
series analysis in astronomy: Limits and potentialities
 Waveletbased
estimation with multiple sampling rates
by Peter Hall and Spiridon Penev. This paper is an
example
of recent studies of the statistical properties of wavelets. When
a nonstationary signal in noise is sampled at discrete times,
information may be lost when signal strength and structure
increases. Here an algorithm for adaptive switching between
sampling rates based on highfrequency wavelet terms is presented. Annals of Statistics, 32, 193356 (2004)
 Multiscale
likelihood analysis and complexity penalized estimation
by Eric D. Kolaczyk and Robert D. Nowak. A
mathematical
framework is presented for the application multiscale models (e.g.
wavelet decomposition) to count (Poisson) and catagorical (binomial) as
well as Gaussian data. Here the databased likelihood is subject
to a multiscale factorization; in the Poisson case, it involves a
recursive partitioning. Application to photoncounting images is
envisioned. Annals of
Statistics 32, 500527 (2004)
 A
selective overview of nonparametric methods in financial econometrics
by Jianqing Fan. This review gives insight into
recent
progress in modeling correlated but stochastic time series such as
seen in stock prices, gammaray bursts, accretion binaries, or BL Lac
objects. The procedures model nonstationary autoregressive processes
with heteroscadasticity (i.e. where the nature of the variations change
with time). See his recent monograph Nonlinear Time Series: Nonparametric and
Parametric Methods (2003).
Model
selection
& goodnessoffit
 A
tutorial introduction to the minimum description length principle
by Peter Grunwald. This is a recent method of
inference
addressing the model selection problem that balances goodnessoffit
with model complexity, and thus avoids overfitting with too many
parameters. The mathematics is based on Kolmogorov Complexity,
information theory and data compression, and the result is related to
penalized likelihood criteria (AIC, BIC, RIC). See also the MDL research Web site.
 Spatial point
processes
 Estimating
the J function without edge correction
by Adrian Baddeley et al. The J function is a
combination
of the empty space (~ the astronomers' void probability) function and
nearestneighbour distance distribution (~ 2point correlation)
function in a spatial point process (e.g. distribution of galaxies in
space). This study proposes a Monte Carlo test of the importance
of weighting due to edge effects (~ survey boundaries). (1997)
 Multivariate
clustering
 A
robust method for cluster analysis
by Maria T. Gallegos and Gunter Ritter. A treatment
of
multivariate clustering when outliers are present. A subset of
the observations are partitioned into clusters using a
maximumlikelihood estimator so that the pooled sum of squares and
products matrix has minimum determinant. Annals of Statistics 33, 347380 (2005)
 Modelbased
clustering, discriminant analysis, and density estimation
by Chris Fraley
and
Adrian E. Raftery A review of
recent methods for discrimination of groups in multivariate datasets,
mixture and classification models. J. Amer.
Statist. Assoc. 97, 611631
(2002)
 False Discovery Rate
method
 Multiple
Comparison Procedures
Hochberg, Y. and Tamhane, A. (Wiley, 1987)
 Controlling
the
False Discovery rate in astrophysical data analysis
Miller, C. J. et al (PiCA collaboration) Astron. J.
122, 34923505 (2002).
 A stochastic
process
approach to False Discovery Rates
Genovese C., Wasserman L. Annals of Statistics 32
10351061 (2004).
 Estimating
the
proportion
of false null hypotheses among a large number of independently tested
hypotheses
Meinshausen, N and Rice, J. (2005)
Return
to CASt bibliographies
