RELAIS (REcord Linkage At IStat)

  • Ascolta questa pagina usando ReadSpeaker
  • Condividi
  • Lascia un feedback

Description

RELAIS (REcord Linkage At IStat) is a toolkit providing a set of techniques for dealing with record linkage projects.

The purpose of record linkage is to identify the same real world entity that can be differently represented in data sources, even if unique identifiers are not available or are affected by errors. In statistics, record linkage is needed for several applications, including: enriching the information stored in different data-sets; de-duplicating data-sets; improving the data quality of a source; measuring a population amount by capture-recapture method; checking the confidentiality of public-use micro data. In fact, record linkage can be seen as a complex process consisting of several phases involving different knowledge areas; moreover, several different techniques can be adopted for each phase. We believe that the choice of the most appropriate technique not only depends on the practitioner’s skill but, most of all, it is application specific.

Moreover, in some applications, there is no evidence to prefer a given method to others or of the fact that different choices, at some linkage stage, could bring to the same results. This is why it is reasonable to dynamically select the most appropriate technique for each phase and to combine the selected techniques for building a record linkage work-flow of a given application. RELAIS is a toolkit relying on these ideas.

The principal features of RELAIS are:

  • It is designed and developed to allow the combination of different techniques for each of the record linkage phases, so that the resulting work-flow is actually built on the basis of application and data specific requirements.
  • It has been developed as an open source project, so several solutions already available for record linkage in the scientific community can be easily re-used. It is released under the EUPL license (European Union Public License).
  • It has been implemented by using two languages based on different paradigms: Java, an object-oriented language, and R, a functional language. This choice depends on our belief that a record linkage process is composed of techniques for manipulating data, for which Java is more appropriate, and of calculation-oriented techniques for which R is a preferable choice. The choice of Java and R is also in line with the open source philosophy of the RELAIS project.
  • It has been implemented using a relational database architecture, in particular it is based on a MySQL environment that is also in line with the open source philosophy of the RELAIS project.

The RELAIS project aims to provide record linkage techniques easily accessible to not-expert users. Indeed, the developed system has a GUI (Graphical User Interface) that on the one hand permits to build record linkage work-flows with a good flexibility. On the other hand it checks the execution order among the different provided techniques whereas precedence rules must be controlled.

Information

Status: validated
Author: Istat
Licence: EUPL-1.1
GSBPM code: 5.1 Integrate data
Programming language: R, Java
Language of the GUI: EN
Keywords: data integration, probabilistic record linkage, string comparators, blocking/sorting/indexing, deduplication, open source software
Contact: name: Luca Valentino
email: luvalent@istat.it

Software and documentation

SOFTWARE DEPENDENCIES

Java SE Development Kit (version ≥ 13)

R (version ≥ 3.4.0)

R packages: ROI, ROI.plugin.clp, slam, RODBC

MySQL Server (version ≥ 5.0)

MySQL Connector/ODBC (version ≥ 5.0)

COPYRIGHT

Copyright 2015 Istat

Licensed under the European Union Public Licence (EUPL), version 1.1 or subsequent. You may not use this work except in compliance with the Licence. You may obtain a copy of the Licence at: http://ec.europa.eu/idabc/eupl.html. Unless required by applicable law or agreed to in writing, software distributed under the Licence is distributed on an “AS IS” basis, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the Licence for the specific language governing permissions and limitations under the Licence.

DISCLAIMER

Istat assumes no responsibility for the results arising from use of the instrument that is inconsistent with the methodological guidance contained in the documentation available.

DOWNLOAD

TECHNICAL AND METHODOLOGICAL DOCUMENTATION

User manual – RELAIS v. 3.1

OTHER DOCUMENTATION

Cibella N., G.L. Fernandez, M. Guigò, F. Hernandez, M. Scannapieco, L. Tosco, T. Tuoto. 2009. Sharing Solutions for Record Linkage: the RELAIS Software and the Italian and Spanish Experiences. In Proceedings of New Techniques and Technologies for Statistics (NTTS) Conference, Eurostat, Brussels, 18-20 February 2009.

Eurostat. 2009. Theory and practice of developing a record linkage software. In “Insights on Data Integration Methodologies. ESSnet-ISAD workshop, Vienna, 29-30 May 2008“. Methodologies and working papers, Eurostat.

Cibella N., M. Fortini, M. Scannapieco, L. Tosco, T. Tuoto. 2007. RELAIS: Don’t Get Lost in a Record Linkage Project. In Proceedings of the FCSM 2007 Conference, Federal Committee on Statistical Methodology, Arlington, 5–7 November 2007.

Fortini M., P.D. Falorsi, C. Vaccari, N. Cibella, T. Tuoto, M. Scannapieco, L. Tosco. 2006. Towards an Open Source Toolkit for Building Record Linkage Workflows. In Proceedings of the International Workshop on Information Quality in Information Systems (IQIS), Chicago, 30 June 2006.

Last edit: 10 March 2020