[ Italiano ]

# SamplingStrata

The contents related to SamplingStrata are shown in the following sections:

## Description

When designing a sampling survey, usually constraints are set on the desired precision levels regarding one or more target estimates (the Y’s). If a sampling frame is available, containing auxiliary information related to each unit (the X’s), it is possible to adopt a stratified sample design. For any given stratiﬁcation of the frame, in the multivariate case it is possible to solve the problem of the best allocation of units in strata, by minimizing a cost function subject to precision constraints (or, conversely, by maximizing the precision of the estimates under a given budget). The problem is to determine the best stratification in the frame, i.e., the one that ensures the overall minimal cost of the sample necessary to satisfy precision constraints. The X’s can be categorical or continuous; continuous ones can be transformed into categorical ones. The most detailed stratiﬁcation is given by the Cartesian product of the X’s (the atomic strata). A way to determine the best stratification is to explore exhaustively the set of all possible partitions derivable by the set of atomic strata, evaluating each one by calculating the corresponding cost in terms of the sample required to satisfy precision constraints. This is unaffordable in practical situations, where the dimension of the space of the partitions can be very high. Another possible way is to explore the space of partitions with an algorithm that is particularly suitable in such situations: the genetic algorithm. The R package SamplingStrata, based on the use of a genetic algorithm, allows to determine the best stratiﬁcation for a population frame, i.e., the one that ensures the minimum sample cost necessary to satisfy precision constraints, in a multivariate and multi-domain case.

The optimization of the sampling design starts by making the sampling frame available, defining the target estimates of the survey and establishing the precision constraints on them. It is then possible to determine the best stratification and the optimal allocation. Finally, we proceed with the selection of the sample.

Formalizing, these are the required steps:

1.           analysis of the frame data: identification of available auxiliary information;

2.          manipulation of auxiliary information: in case auxiliary variables are of the continuous type, they must be transformed into a categorical form;

3.          construction of atomic strata: on the basis of the categorical auxiliary variables available in the sampling frame, a set of strata can be constructed by calculating the Cartesian product of the values of all the auxiliary variables;

4.          characterization of each atomic stratum with the information related to the target variables: in order to optimise both strata and allocation of sampling units in strata, we need information on the distributions of the target variables (means and standard deviations);

5.          choice of the precision constraints for each target estimate, possibly differentiated by domain;

6.          optimization of stratification and determination of required sample size and allocation in order to satisfy precision constraints on target estimates;

7.          analysis of the resulting optimized strata;

8.          association of new labels to sampling frame units, each of them indicating the new strata resulting by the optimal aggregation of the atomic strata;

9.          selection of units from the sampling frame with a stratified random sample selection scheme;

10.        evaluation of the found optimal solution in terms of expected precision and bias.

## Information

 Status: validated Author: Istat Licence: GPL-2 | GPL-3 GSBPM code: 2.4 Design frame and sample 4.1 Create frame and select sample Programming language: R Keywords: optimal stratification, sample design, sample allocation, genetic algorithm Contact: name: Marco Ballin email: ballin@istat.it

## Software and documentation

SOFTWARE DEPENDENCIES

R (version ≥ 2.15.0)

Licensed under the GNU General Public License (GPL), version 2 or subsequent. You may not use this work except in compliance with the Licence. You may obtain a copy of the Licence at: http://www.gnu.org/licenses/licenses.en.html. Unless required by applicable law or agreed to in writing, software distributed under the Licence is distributed on an “AS IS” basis, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

DISCLAIMER

Istat assumes no responsibility for the results arising from use of the instrument that is inconsistent with the methodological guidance contained in the documentation available.

Release date: 10/02/2023

INSTALLATION
Install the downloaded package from within R as follows:
> install.packages(path_to_file, repos = NULL)
where the character path_to_file is the path to the .zip or .tar.gz file you downloaded.

TECHNICAL AND METHODOLOGICAL DOCUMENTATION

Reference manual – SamplingStrata v. 1.5-4

Vignettes:

OTHER DOCUMENTATION

SamplingStrata website

Barcaroli, G. 2014. “SamplingStrata: An R Package for the Optimization of Stratified Sampling“. Journal of Statistical Software, Volume 61, Issue 4: 1-24.

Ballin, M., and G. Barcaroli. 2013. “Joint determination of optimal stratification and sample allocation using genetic algorithm“. Survey Methodology, Volume 39, N. 2: 369-393.

Last edit: 10 February 2023