"If we want to make comparisons, the data needs to have the same basis"

authorship
image editing
photo shows press conference with ministers marcos bridges, of science and technology, and henrique mandetta, of health. they are sitting at a table and mandetta is talking
New data on the evolution of the coronavirus pandemic is released every day in the country

News about the coronavirus pandemic emerges all the time. Every day, new data regarding the number of confirmed cases of Covid-19 and deaths caused by the disease are released in Brazil and other countries. Analyzing the data, establishing comparisons and drawing conclusions about the pandemic situation in the country seems to be inevitable, but if there is no unity between the different data sources or clarity in the way they are organized and disseminated, the result may be the idea of a more optimistic or pessimistic scenario than it really is, in addition to harming the planning and execution of public health policies.

The conclusion comes from professors at Unicamp who work with data science and analysis of numerical information in different areas of research and are taking advantage of the opportunity arising from the unprecedented scenario caused by the pandemic to reflect on the ways in which data about the virus and the disease have been collected and made available to researchers and the press. They also propose solutions that reduce the effects of analyzes and comparisons that could cause more harm than the context itself already causes. 

"People look at results, but they don't question where it comes from"

Accustomed to working with data obtained from various sources in the area of ​​neuroscience, Rickson Mesquita, professor at the Gleb Wataghin Institute of Physics (IFGW) knows that the first information necessary for good analyzes to be carried out is precisely how these data were obtained, what methods were used, which area or period of time they relate to. In other words, it is not enough to read the data, you need to know the path and conditions that made it relevant information. 

It is precisely this awareness that made the professor stop to reflect on how this has been done with the data released about the coronavirus, mainly in relation to the number of confirmed cases of Covid-19 in Brazil and around the world. In a text published on your profile on the platform Medium, he argues that comparisons made between the numbers of confirmed cases in Brazil and the situation in other countries require caution and may be somewhat unproductive. This is because the way in which the disease is diagnosed and reported varies between each country, which can generate false symmetries. "It doesn't make much sense to analyze case numbers if they are being reported in different ways, this creates even more uncertainty for people. When people try to compare data from different countries, the confusion becomes even greater", analyzes Rickson. 

In the professor's view, the number of deaths caused by Covid-19 would be a parameter that offers less risk of disagreements, as they are completed events. From this, he proposes other ways of comparing the situation of countries, such as the total number of deaths and the evolution of deaths day by day, which allows us to check how the pandemic behaves around the world. Rickson also criticizes other comparisons and relativizations, such as justifying a supposed more positive situation in Brazil due to the country's large territorial extension. According to him in the text, the message that should be extracted from data that still shows a low number of contamination in a large population is that there is the possibility of many people still contracting the coronavirus and, therefore, care must be maintained. . 

montage shows two graphs, one with the evolution in the number of deaths per day from the coronavirus and the other with the total number of deaths
Graphs based on data from the European Center for Disease Control show the evolution in the number of deaths per day in countries and the total number of deaths recorded

Rickson analyzes that it is important for people to know the source of data provided by official health bodies and also by the media. According to him, this is a scenario in which the dissemination of information occurs quickly and by anyone and, therefore, requires responsibility and awareness of the effects on people's lives. "People need information to become aware of what is happening, but the information must be accurate. Social networks end up giving scope for any analysis among a population where mathematics can be a great difficulty, so when people see a number , they tend to accept it without question. This can be a problem", warns the professor. 

Solution: create your own database

Working with data is also part of the daily life of Paula Dornhofer Costa, professor at the Faculty of Electrical and Computer Engineering (FEEC) from Unicamp. It is part of a research project that crosses health data with climate variables, identifying how temperature, humidity and weather conditions can influence people's health. Professor Eliana Cotta de Faria, from the Faculty of Medical Sciences (FCM), and Ana Maria Heuminski de Ávila, researcher at the Center for Meteorological and Climate Research Applied to Agriculture (Cepagri). 

As happened with other researchers from different areas, the coronavirus pandemic caught the attention of her and her students, who began to gather the data provided by the Ministry of Health in CSV format, "Comma-Separated Values", in which numerical information is available in spreadsheets and separated by commas. In this format, it is possible to process it using specific software, which facilitates the work of analyzing and drawing conclusions. 

However, she says that the warning sign came when, on March 18, the data stopped being made available by the Ministry in this format with the justification that the dissemination system would be improved. "There was this data blackout, we were unable to access the spreadsheets, and the Ministry of Health began publishing daily confirmed cases in the states through bulletins in which the information comes in the body of the text, which is not ideal for those who want to do the processing automatically", reports the teacher. Currently, the Ministry has once again made available on your site national and state data in CSV format, but the concern about being without access again caused the team to mobilize.

Unable to collect data from a single source, the group began searching for information from each state. The conclusion was that there is a large discrepancy in the way each federation unit collects and disseminates its data. As it is a type of compulsorily notifiable disease, all confirmed cases reach the Ministry of Health. However, the major concern is the delay in which this data may arrive due to these differences, which may hinder the adoption of necessary measures. "Some states release information only via Twitter, others release epidemiological bulletins in PDF files, which are difficult to read on a machine. Still others are released only via the G1 portal, we can't even access the information provided directly by the state health department ", explains Paula. 

photo shows hand with gloves holding boxes of tests used to detect Covid 19
According to Paula, discrepancies in data could delay the adoption of measures such as testing the population

The solution found was not only for the group's work, but also to facilitate the work of other researchers, it was to create a new unified database. In it, the team provides updated data every day that shows not only the numbers of confirmed Covid-19 cases in the country and between states, but also the historical series of each location. Paula comments that, from this, they can already observe relevant aspects about the evolution of the pandemic between the states, such as a similarity between the numbers recorded in Ceará and the Federal District. 

The database can be accessed by this link and its files are available to all interested parties. With experience, the professor highlights that the pandemic is an opportunity to draw the attention of health authorities to the importance of optimizing and clarifying the data that is released about the disease. "Our information reporting processes are very poor, Brazil does not have a large unified system that allows information to flow quickly from municipalities to states and the country as a whole. This is quite bad for making quick decisions , as is necessary in a pandemic like this", analyzes Paula. 

cover image
Montage with photos shows a scene from a press conference where the ministers of science and technology, Marcos Pontes, and health, Henrique Mandetta, are present, and next to a hand moving boxes of tests for Covid 19, click enter to access

twitter_icofacebook_ico

Internal Community

Delegation learned about research carried out at Unicamp and expressed interest in international cooperation

The show class with chef and gastrologist Tibério Gil on the role of nutrition and gastronomy in contemporary women's health, this Thursday (7), opened the program that runs until Friday (8)

news

According to Maria Luiza Moretti, despite the progress seen in recent years, the occupation of command positions is still unequal between men and women

There will be four years of partnership, with six places offered each year in the first two periods; the offer increases to nine beneficiaries in the following two years

The publications are divided in a didactic manner into the themes General Women's Health, Reproductive Health, Obstetric Health and Adolescent Women's Health

Culture & Society

For rector Antonio Meirelles, a political commitment in favor of the solution is necessary and the Brazil can play an extremely important role in global environmental solutions 

 

Writer and columnist, the sociologist was president of the National Association of Postgraduate Studies and Research in Social Sciences in the 2003-2004 biennium