Data integration

In livestock genetic resource conservation, decisions about conservation priorities are based on the simultaneous analysis of several different criteria that may contribute to long-term sustainable breeding conditions. This requires complementary data on population and evolutionary genetics, animal husbandry practices, socio-economic and environmental information, usually over a broad geographic range. These different sources and categories of data are often considered separately, although their integration facilitates and optimizes the processes used to establish priorities in the conservation of livestock genetic resources. These different types of information can be explored and compared according to their geographic coordinates.


Data integration consists of combining data sets obtained from different sources and providing the user with a unified view. Therefore, data must be stored in specific databases software named Geographical Information Systems (GIS) able to manage geographic information and to perform tasks specific to geographic coordinates (computing distances or spatial buffers around objects of interest, determining inclusions or exclusions, etc.). Geographic databases are also able to calculate elementary statistical operations.


GIS are specialized computer systems for the storage, retrieval, analysis, and display of large volumes of spatial data. GIS are designed to overlay complementary information and to study the relationships between the different information layers. A number of GIS software exist, including statistical packages or easily connectable to standard statistical software (Animal Genetics 41, Suppl. 1, 2010).

Exploratory spatial data analysis (ESDA) is a specific category of GIS tools to facilitate the understanding of the geographic distribution of genetic diversity among livestock breeds as well as its variation according to different environmental parameters, or to diverse socio-economic situations. This approach employs a variety of mostly graphical techniques to maximize insight into a data set to uncover underlying structures, extract important variables, or detect outliers and anomalies. Instead of assuming a known model and checking if data conform, EDA proposes a more direct approach of allowing the data itself to reveal its underlying structure, stimulated by spontaneous successive rough hypothesis outlines produced by researchers.


With the integration of separate categories of data and the implementation of statistical comparison of their behaviour, a major challenge is to understand the relationships between the chosen variables. For that purpose, a) the right variables must be chosen to describe the system being considered, b) the dependent and the independent variables must show sufficient variation, and c) spatial covariations of different variables must be detected either by using univariate analysis such as correlation, one factor ANOVA, or multivariate approaches.

The goal of the latter is to arrange objects or variables in relation to each other (ordination, scaling), to classify objects into groups (classification, clustering, prediction), or to test hypotheses about relationship between response and predictor variables.

Multi-criteria analysis

Multi-criteria decision analysis combines the information from several criteria in order to form a single evaluation integrated index. This is useful to support decision makers, who usually face several and often conflicting evaluations like prioritization in conservation processes. The approach includes qualitative as well as quantitative aspects of the problem to be solved in the decision-making process. It can be used to rank options, to identify a single preferred one, to list a limited number of alternatives for a subsequent evaluation, or simply to distinguish acceptable from unacceptable effects of the different options.