CIRCE (Comprehensive Istat R Coding Environment)

The contents related to CIRCE are shown in the following sections:

CIRCE is a software package based on R aimed at automatically coding textual variables according to official classifications.

It is a generalised software system with respect to the language and the classification used. CIRCE replaces Actr v3, that has been adopted by Istat since 1998, because it was no more commercialised and maintained from its producers and because it was not compatible with the software platform used in Istat (Windows 7, Windows Server 2008).

To prevent lower quality results, the same matching algorithms of ACTR v3 has been developed for CIRCE.

Being an R package it is portable to different platform with no need of compilation. This made it possible to have one single package running on both Windows and Linux operating systems. It can be used on Windows environment through an User Graphical Interface and on web through a “call” to a web service.

CIRCE belongs to the weighting algorithms category and manages three types of coding procedures:

automated coding, for set of records (batch coding);
interactive coding, to analyse coding results of single record (a GUI is provided to coders);
web coding, a web service for single record coding. In this case is currently available a web service dedicated to the identification of the activity code (in Italian language) accessible through the page.

Notwithstanding the type of procedure, the coding phase is performed in two consecutive steps:

1) standardization of texts, called parsing;
2) matching of parsed texts.

The parsing step is a quite sophisticated phase of text standardisation totally customisable, that provides (till now) 14 different functions such as characters mapping, deletion of trivial words, definition of synonymous, suffixes removal, etc.. The parsing aims at removing grammatical or syntactical differences in order to make equal two different descriptions but with the same semantic content.

The second step is the matching phase. The parsed response is compared with the parsed descriptions of the informative base. If this search returns a perfect match or direct match, then a unique code is assigned, otherwise the software uses an algorithm to find the best partial matches, providing an indirect match.

CIRCE is developed by Istat. This will make it easier adding or changing its functionalities with respect to standardization parings and/or matching steps.
Please note: for the moment, both CIRCE user guide (Manuale Utente.pdf) and its GUI are in Italian. English versions will eventually be provided in the future.

Status: validated

Author: Istat

Licence: EUPL-1.1

GSBPM code: 5.2. Classify and code

Programming language: R, VB.NET

Language of the GUI: IT

Keywords: automated coding, weighting coding algorithms

Contact:

name: Laura Capparucci
email: capparuc@istat.it

SOFTWARE DEPENDENCIES

– R (version ≥ 3.1.1).

– Windows (version ≥7).

– Microsoft Framework .net 4 (only for the graphical user interface).

COPYRIGHT

Licensed under the European Union Public Licence (EUPL), version 1.0 or subsequent. You may not use this work except in compliance with the Licence. You may obtain a copy of the Licence at: at: http://ec.europa.eu/idabc/eupl.html. Unless required by applicable law or agreed to in writing, software distributed under the Licence is distributed on an “AS IS” basis, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the Licence for the specific language governing permissions and limitations under the Licence.

DISCLAIMER

Istat assumes no responsibility for the results arising from use of the instrument that is inconsistent with the methodological guidance contained in the documentation available.

DOWNLOAD

Release date: 28/07/2016

CIRCE version 1.0

TECHNICAL AND METHODOLOGICAL DOCUMENTATION

User manual – CIRCE v. 1.0

OTHER DOCUMENTATION

Macchia, S., P. Giovani, D. Perrone, M. Degortes, e L. Mazza. 2007. “Metodi e software per la codifica automatica dei dati“. Tecniche e strumenti, N. 4. Roma: Istat.

Cuccia, F., S. De Angelis, A. Laureti Palma, S. Macchia, S. Mastroluca, e D. Perrone. 2005. “La codifica delle variabili testuali nel 14° Censimento Generale della Popolazione“. Documenti, N. 1. Roma: Istat.

Macchia, S., and M. D’Orazio. 2001. “A system to monitor the quality of automated coding of textual answers to open questions“. Research in Official Statistics, Volume 4, N.2: 5-19.

De Angelis, R., S. Macchia, e L. Mazza. 2000. “Applicazioni sperimentali della codifica automatica: analisi di qualità e confronto con la codifica manuale”. Quaderni di ricerca – Rivista di statistica Ufficiale, N. 1.

Methods and software of the statistical process

CIRCE (Comprehensive Istat R Coding Environment)

Description

Information

Software and documentation