Create frame and select sample

The activities related to the creation of the list and the selection of the sample refer to sub-process 4.1 “Create frame and select sample” of GSBPM.

The creation of the frame consists in the construction of the archive of units belonging to the target population. The selection of the sample consists in the identification of the sample units on the basis of the sampling scheme.

For a given iteration of the collection, the creation of the frame and the selection of the sample are made on the basis of the specifications defined in the sub-process 2.4 “Design frame and sample“.

Data collection

Collection of data comes after a quite complex set of activities aimed at designing the survey questionnaire used to observe the different aspects of the phenomena to be measured.

What above stated is represented in the GSBPM by a set of sub-processes that belong to different phases of the model, from phase 1 to phase 4, as described here after.

Phase 1 “Specify needs“. Sub-processes 1.1 to 1.5 are involved for this phase. They allow to identify the survey needs and to translate them into concepts that have to be comprehensible to respondents. Besides, concepts  should be easily measurable and therefore convertible in statistical variables that will be designed in the following Phase 2. At this stage, it is very important to check whether current data sources (for example administrative data) could meet survey needs in order to reduce the number of variables to collect. In this way both cost and response burden can be reduced.

Phase 2: “Design“. Once Phase 1 is over, activities described in sub-processes 2.1, 2.2 and 2.3 can be carried on. Survey variables, specified in Phase 1, are used to design the tabulation plan (also useful for the Dissemination phase) that will also allow to define derived variables as well as any statistical classifications that will be used in the collection phase. At this point, the creation of the questionnaire can start: statistical variables can be “translated” into survey questions, relations among variables and variable’s characteristics into questionnaire’s paths and/or checking rules. Institutional metadata system should be taken into account when designing survey variables, in order to use existing definitions or to update the system with new ones. This will foster national and international standardisation processes and the re-use of any “element” from previous or similar surveys.

Variables design (sub-process 2.2) should be run in parallel with sub-process 2.3 “Design Collection” that  determines the most appropriate collection method(s) and instrument(s). This is because variables design and, therefore, questions’ structure and wording, strictly depend on the mode used to collect data. The same is true for the design of checking rules in case computer assisted techniques are used (example: CATI, CAPI, CAWI, described in Phase 3). With these modes, data can be validated while they dare collected. It is therefore advisable to design checking rules in such a way to solve the greatest number of inconsistencies (major or more frequent inconsistencies), paying attention, at the same time, to the fluency of the interview. In other words, a balance between data quality and respondent burden should be respected: a too high number of checking rules can increase response burden and negatively affect total or partial non response rate.

Phase 3 “Build”: The sub-process 3.1 “Build collection instruments” describes the activities to build the collection instruments to be used during the next Phase 4 “Collect“. The collection instrument is generated or built based on the design specifications created during the “Design” phase. Data collection may use one or more modes to receive the data, e.g., personal or telephone interviews, paper, electronic or web questionnaires. Collection instruments may also be data extraction routines used to gather data from existing data sources. In this last case instruments of data exchange like EDI (Electronic Data Interchange) or XBRL (eXtensible Business Reporting Language) can be used for information exchange between Istat and reporting units (enterprises or public organizations that provide data in the form of administrative data sets or registers). This sub-process also includes preparing and testing the contents and functioning of that instrument, e.g., testing the questions in a questionnaire. It is recommended to consider the direct connection of collection instruments to the statistical metadata system, so that metadata can be more easily captured in the collection phase. Connection of metadata and data at the point of capture can save work in later phases, like for instance the Dissemination one. Capturing the metrics of data collection (paradata) is also an important consideration in this sub-process.

Computer assisted techniques play an increasingly important role in the data collection phase. The main features of these modes, CADI (Computer Assisted Data Imputing), CAPI (Computer Assisted Personal Interviewing), CATI (Computer Assisted Telephone Interviewing), CAWI (Computer Assisted Web Interviewing) is represented by the possibility of performing the editing phase during the data collection, allowing the collection of only valid data. CADI differs from the others techniques because checking rules are only used to limit keying errors or to support the revision during the data entry of paper questionnaires.

Another distinctive feature of all computer assisted techniques, except CADI, is the customisation of the electronic questionnaire: questions wording can be personalised according to respondent’s characteristics (name, gender) or to answers to previously asked questions or to already available information (previous wave of the survey). In this way, the questionnaire appears more friendly and the respondents cooperation is enhanced. How many and what type of checking rules can be implemented in the electronic questionnaire depends on the technique. Main differences are between CATI/CAPI and CAWI:

  • CATI and CAPI are interviewer administered and, therefore, it is possible to use a greater number of checks than for CAWI. Besides, relying on the fact that the interviewer is well trained on both technical  aspects and survey content, it is also possible to use a greater number of blocking checking rules, that require to solve the inconsistency before proceeding with the interview;
  • As to CAWI, which is instead self-administered, a smaller number of checks should be used. Besides, checking rules should be more like warnings than blocking checks, because they should advise respondents about possible inconsistencies among data, letting them free to solve or not the “error” before proceeding with new questions. This is to avoid the abandoning of the interview before its very end.

It is important to remind here, that the set of checking rules should be such to guarantee a balance between data quality and response burden, whatever technique is used.

Besides, at this point, it is also useful to remind, that the questionnaire design phase has to take into account the technique chosen for collection of data, in order to design the electronic questionnaire in the most suitable way. The electronic questionnaire should be deeply tested to check for its compliance with technical specifications and for its usability and fluency.

Phase 4 “Collect“: Once collection instruments have been built and tested, the collection of data can start. Activities described in sub-processes 4.2 “Set up collection“, 4.3 “Run collection” and 4.4 “Finalise collection” are involved: from interviewers training to the storage of collected data into suitable electronic environment for further processing (Phase 5 “Process“).

Last edit: 19 March 2018