News about the coronavirus pandemic emerges all the time. Every day, new data regarding the number of confirmed cases of Covid-19 and deaths caused by the disease are released in Brazil and other countries. Analyzing the data, establishing comparisons and drawing conclusions about the pandemic situation in the country seems to be inevitable, but if there is no unity between the different data sources or clarity in the way they are organized and disseminated, the result may be the idea of a more optimistic or pessimistic scenario than it really is, in addition to harming the planning and execution of public health policies.
The conclusion comes from professors at Unicamp who work with data science and analysis of numerical information in different areas of research and are taking advantage of the opportunity arising from the unprecedented scenario caused by the pandemic to reflect on the ways in which data about the virus and the disease have been collected and made available to researchers and the press. They also propose solutions that reduce the effects of analyzes and comparisons that could cause more harm than the context itself already causes.
"People look at results, but they don't question where it comes from"
Accustomed to working with data obtained from various sources in the area of neuroscience, Rickson Mesquita, professor at the Gleb Wataghin Institute of Physics (IFGW) knows that the first information necessary for good analyzes to be carried out is precisely how these data were obtained, what methods were used, which area or period of time they relate to. In other words, it is not enough to read the data, you need to know the path and conditions that made it relevant information.
It is precisely this awareness that made the professor stop to reflect on how this has been done with the data released about the coronavirus, mainly in relation to the number of confirmed cases of Covid-19 in Brazil and around the world. In a text published on your profile on the platform Medium, he argues that comparisons made between the numbers of confirmed cases in Brazil and the situation in other countries require caution and may be somewhat unproductive. This is because the way in which the disease is diagnosed and reported varies between each country, which can generate false symmetries. "It doesn't make much sense to analyze case numbers if they are being reported in different ways, this creates even more uncertainty for people. When people try to compare data from different countries, the confusion becomes even greater", analyzes Rickson.
In the professor's view, the number of deaths caused by Covid-19 would be a parameter that offers less risk of disagreements, as they are completed events. From this, he proposes other ways of comparing the situation of countries, such as the total number of deaths and the evolution of deaths day by day, which allows us to check how the pandemic behaves around the world. Rickson also criticizes other comparisons and relativizations, such as justifying a supposed more positive situation in Brazil due to the country's large territorial extension. According to him in the text, the message that should be extracted from data that still shows a low number of contamination in a large population is that there is the possibility of many people still contracting the coronavirus and, therefore, care must be maintained. .
Rickson analyzes that it is important for people to know the source of data provided by official health bodies and also by the media. According to him, this is a scenario in which the dissemination of information occurs quickly and by anyone and, therefore, requires responsibility and awareness of the effects on people's lives. "People need information to become aware of what is happening, but the information must be accurate. Social networks end up giving scope for any analysis among a population where mathematics can be a great difficulty, so when people see a number , they tend to accept it without question. This can be a problem", warns the professor.
Solution: create your own database
Working with data is also part of the daily life of Paula Dornhofer Costa, professor at the Faculty of Electrical and Computer Engineering (FEEC) from Unicamp. It is part of a research project that crosses health data with climate variables, identifying how temperature, humidity and weather conditions can influence people's health. Professor Eliana Cotta de Faria, from the Faculty of Medical Sciences (FCM), and Ana Maria Heuminski de Ávila, researcher at the Center for Meteorological and Climate Research Applied to Agriculture (Cepagri).
As happened with other researchers from different areas, the coronavirus pandemic caught the attention of her and her students, who began to gather the data provided by the Ministry of Health in CSV format, "Comma-Separated Values", in which numerical information is available in spreadsheets and separated by commas. In this format, it is possible to process it using specific software, which facilitates the work of analyzing and drawing conclusions.
However, she says that the warning sign came when, on March 18, the data stopped being made available by the Ministry in this format with the justification that the dissemination system would be improved. "There was this data blackout, we were unable to access the spreadsheets, and the Ministry of Health began publishing daily confirmed cases in the states through bulletins in which the information comes in the body of the text, which is not ideal for those who want to do the processing automatically", reports the teacher. Currently, the Ministry has once again made available on your site national and state data in CSV format, but the concern about being without access again caused the team to mobilize.
Unable to collect data from a single source, the group began searching for information from each state. The conclusion was that there is a large discrepancy in the way each federation unit collects and disseminates its data. As it is a type of compulsorily notifiable disease, all confirmed cases reach the Ministry of Health. However, the major concern is the delay in which this data may arrive due to these differences, which may hinder the adoption of necessary measures. "Some states release information only via Twitter, others release epidemiological bulletins in PDF files, which are difficult to read on a machine. Still others are released only via the G1 portal, we can't even access the information provided directly by the state health department ", explains Paula.
The solution found was not only for the group's work, but also to facilitate the work of other researchers, it was to create a new unified database. In it, the team provides updated data every day that shows not only the numbers of confirmed Covid-19 cases in the country and between states, but also the historical series of each location. Paula comments that, from this, they can already observe relevant aspects about the evolution of the pandemic between the states, such as a similarity between the numbers recorded in Ceará and the Federal District.
The database can be accessed by this link and its files are available to all interested parties. With experience, the professor highlights that the pandemic is an opportunity to draw the attention of health authorities to the importance of optimizing and clarifying the data that is released about the disease. "Our information reporting processes are very poor, Brazil does not have a large unified system that allows information to flow quickly from municipalities to states and the country as a whole. This is quite bad for making quick decisions , as is necessary in a pandemic like this", analyzes Paula.