When designing a sampling survey, usually constraints are set on the desired precision levels regarding one or more target estimates (the Y’s). If a sampling frame is available, containing auxiliary information related to each unit (the X’s), it is possible to adopt a stratified sample design. For any given stratiﬁcation of the frame, in the multivariate case it is possible to solve the problem of the best allocation of units in strata, by minimizing a cost function subject to precision constraints (or, conversely, by maximizing the precision of the estimates under a given budget). The problem is to determine the best stratification in the frame, i.e., the one that ensures the overall minimal cost of the sample necessary to satisfy precision constraints. The X’s can be categorical or continuous; continuous ones can be transformed into categorical ones. The most detailed stratiﬁcation is given by the Cartesian product of the X’s (the atomic strata). A way to determine the best stratification is to explore exhaustively the set of all possible partitions derivable by the set of atomic strata, evaluating each one by calculating the corresponding cost in terms of the sample required to satisfy precision constraints. This is unaffordable in practical situations, where the dimension of the space of the partitions can be very high. Another possible way is to explore the space of partitions with an algorithm that is particularly suitable in such situations: the genetic algorithm. The R package SamplingStrata, based on the use of a genetic algorithm, allows to determine the best stratiﬁcation for a population frame, i.e., the one that ensures the minimum sample cost necessary to satisfy precision constraints, in a multivariate and multi-domain case.
The optimization of the sampling design starts by making the sampling frame available, defining the target estimates of the survey and establishing the precision constraints on them. It is then possible to determine the best stratification and the optimal allocation. Finally, we proceed with the selection of the sample.
Formalizing, these are the required steps:
1. analysis of the frame data: identification of available auxiliary information;
2. manipulation of auxiliary information: in case auxiliary variables are of the continuous type, they must be transformed into a categorical form;
3. construction of atomic strata: on the basis of the categorical auxiliary variables available in the sampling frame, a set of strata can be constructed by calculating the Cartesian product of the values of all the auxiliary variables;
4. characterization of each atomic stratum with the information related to the target variables: in order to optimise both strata and allocation of sampling units in strata, we need information on the distributions of the target variables (means and standard deviations);
5. choice of the precision constraints for each target estimate, possibly differentiated by domain;
6. optimization of stratification and determination of required sample size and allocation in order to satisfy precision constraints on target estimates;
7. analysis of the resulting optimized strata;
8. association of new labels to sampling frame units, each of them indicating the new strata resulting by the optimal aggregation of the atomic strata;
9. selection of units from the sampling frame with a stratified random sample selection scheme;
10. evaluation of the found optimal solution in terms of expected precision and bias.
|Licence:||GPL-2 | GPL-3|
|GSBPM code:||2.4 Design frame and sample
4.1 Create frame and select sample
|Keywords:||optimal stratification, sample design, sample allocation, genetic algorithm|
|Contact:||name: Marco Ballin
R (version ≥ 2.15.0)
Copyright 2016 Istat
Licensed under the GNU General Public License (GPL), version 2 or subsequent. You may not use this work except in compliance with the Licence. You may obtain a copy of the Licence at: http://www.gnu.org/licenses/licenses.en.html. Unless required by applicable law or agreed to in writing, software distributed under the Licence is distributed on an “AS IS” basis, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
Istat assumes no responsibility for the results arising from use of the instrument that is inconsistent with the methodological guidance contained in the documentation available.
TECHNICAL AND METHODOLOGICAL DOCUMENTATION
- Optimization of sampling strata with the SamplingStrata package
- Spatial sampling with SamplingStrata
- Use of models in SamplingStrata
Barcaroli G. 2014. SamplingStrata: An R Package for the Optimization of Stratified Sampling. Journal of Statistical Software, 61(4):1-24.
Ballin M., Barcaroli G. 2013. Joint determination of optimal stratification and sample allocation using genetic algorithm. Survey Methodology, 39(2):369-393.