Model aggregates millions of health and environmental data for epidemiological study

A cohort study is one that follows a large group of people and assesses, for example, the health effects of risk factors to which they are exposed. On the issue of the environment, it will help determine the impact on people of air pollution, landfills or climate change. Although there are great challenges involved in its development, analytical engineering provides insight on how to meet them in a more agile and effective manner.

The potential of cohort studies is based on a rigorous methodology to avoid any bias, errors in data collection or errors in interpretation of results.

In this regard, Laura María Uribe Díaz, Master in Analytical Engineering from the Medellín Campus of the National University of Colombia (UNAL), explains that these are some of the great challenges they present. “The main one is that they require multiple sources of information, for example, daily experiences of groups of individuals, clinical history, data from environmental monitoring stations, etc.”

With his postgraduate thesis, the researcher developed a method that makes it possible to organize, consolidate and manipulate these millions of data, so that epidemiological studies can easily record the information, organize it and finalize it. Present a conclusion.

The solution was found through information architecture (AI), a discipline that can be understood with the analogy of a house. “To build this, various materials are needed, in this case, it is all the information needed for the study (survey results, medical records, etc.). The challenge is to organize the house well, so that the end user (researchers) know how to find themselves inside it”, he said.

AI can also be thought of as building a database with different dimensions, like a 3D cube, where each face collects information from different sources.

“For example, we have the clinical history. This goes with the legal underpinnings and standards; thus, a simple number like 0104 may mean chronic obstructive pulmonary disease (COPD). We provide clear, consistent Let’s take advantage of all of this to collect dimensions in a way and without duplicating data”, he mentioned.

When you already have all the dimensions organized and the data complete, it is necessary to think about how to connect everything.

“For this we have chosen a relational model, which involves presenting information in rows and columns, in a simple way, respecting the type of data (if they are numbers, decimals, alphabets). Making the choice with the fact that was that this model is more standardized and may be more accessible to researchers who are not typically computer experts”, he explains.

The researcher must understand concepts close to their field of work, analytical engineering, and health, epidemiology, and the environment. Similarly, they had to design dozens of maps and flowcharts to determine what kind of information coming from, for example, surveys, was useful and how to link it.

The final result was consolidated into one software called MySQL Workbench, freely accessible (open source). “However, as this is a ‘model’ of a house, the technology used is not very relevant. Despite this, the connections we suggest should work.”

Although the model they proposed was based on a specific environmental epidemiological study, the one being carried out in Medellín, the proposal serves as a guide for future analyzes related to health, the environment, or any topic Which involves handling huge amount of data. Making it that represents opportunities for all kinds of researchers and decision makers.

