Data and regional press (2/3): The importance of questions and editorial choices

9 min readJul 1, 2021

[This article is a translation from original article published in march 2020]

Collecting new data, as seen in this article, is good. It is better to ask yourself beforehand what kind of data you need to carry out a survey, and how relevant it is.

First of all, it avoids work that will remain unexploited (collecting data that is not used but that makes the collection and processing more cumbersome). It also avoids realizing along the way that we forgot to ask for information that is essential for the investigation to be relevant.

However, each data journalist has his or her own style and approach.

What indicators should be used when creating a database?

The first thing to do is to think carefully about what you’re asking for and from whom. “You need to have a precise idea of the subject you are dealing with in order to set the right collection criteria”, explains Frédéric Sallet, journalist in charge of data and computer graphics at Sud Ouest. Therefore, it’s important to do your homework beforehand.

Starting a collection will always be a bit time consuming. If you find out along the way that you are missing an important element to complete your article, you may have to abandon the investigation. There is also a risk of erroneous analysis.

Arnaud Wéry, who regularly conducts datajournalism projects in L’Avenir’s weblab, agrees. He evokes an abandoned project on the price of parking lots in the city center due to a lack of precision in the criterias of the elements to be collected. “There was a risk of comparing apples and pears because of the different zones and rates, if people are residents in the zone, they don’t pay the same rate… It was too complicated to get anything out of it, so we didn’t.”

To determine the indicators that will be needed for the investigation, journalists need to ask precisely the main question they are seeking answers to. There can never be too much information, but there is often poorly structured data that needs a lot of cleanup.

Karen Bastien, co-founder of the agency WeDoData, thinks broadly when collecting data and always looks for the finest granularity. “If there are ten indicators on a subject, and I only need three, I still take all ten and see what I can automate.” This approach is a time investment up front, but can offer other related topics, different angles, ideas for base crossing. As for automation, it allows for regular updates to be very responsive.

Creating a barometer, the puzzle of indicators and their weighting

“Where is it nice to live?” A regular story in the national magazines, a very interesting topic locally, one that gets reactions, generates audience and leads to interaction. Inspired by an article in Le Parisien, Aurore Malval of Nice Matin worked on a barometer of “homemade” quality of life. To create this tool, the journalist chose eight themes to measure: transportation, environment/life style, safety, health, cost of living, education, sports/leisure and shops and services.

Apparently, eight major criteria that are fairly consensual and obvious. Beforehand, the journalist surveyed Nice Matin readers and web users, asking them what was important to them in terms of quality of life. “This allowed me to weight the scores,” she says.

But as soon as you get into the details of the mechanics, everything gets complicated. What indicators should be used to measure security, for example? The number of car thefts? of two-wheelers? of physical aggressions? of caravan thefts? of surveillance cameras? of policemen and gendarmes per 1000 inhabitants…? And for the living environment, what to choose ? the proximity of the sea ? the surface of green spaces ? the motorway noise of the A8 ? the air quality (except that all the communes do not measure it, or do not measure the same thing)… ? In short, it’s enough to pull out one’s hair.

“I chose the indicators according to what seemed relevant and allowed me to compare the communes,” says Aurore Malval.

Another factor to keep in mind is that seasonality affects the results. In concrete terms, the populations of coastal cities increase very strongly in the summer with the arrival of tourists and secondary residents. Mechanically, this impacts the figures for many indicators, which is not the case — or much less so — in the hinterland municipalities. How can this fact be taken into account to create a relevant barometer? Again, this is a real question.

In the end, this work (determining the indicators, collecting the data and beginning the analysis) took the journalist about a month. She also modified her indicators along the way. The fifteen or so paid web articles and the six print pages linked to this ranking generated “good audiences” and numerous reactions. Questions about the methodology came mainly from the municipalities themselves.

Among the advice to be retained from this experience, and also suggested by Karen Bastien, it is very useful to read other similar surveys to dissect the methodologies, the choices made, and to know (and to avoid) the mistakes possibly made by colleagues. Finally, a last piece of advice: you have to know how to stop! “Otherwise your ranking will never come out.

In the case of an existing database

If the database already exists, questions also arise. At WeDoData, everyone pitches their work. It is at this moment, when the story takes shape, that the others ask questions: why did you take such and such a CSP and not such and such another one? Why didn’t you cross-reference such and such a database?… “The answer can be because I don’t have it, or because it doesn’t seem relevant for such and such a reason. Like at an editorial conference,” explains Karen Bastien.

In the project of automated articles on earthquakes, the Nice Matin team also had to determine criteria. From what force to trigger an article? The choice was made for a magnitude of 2, mainly for volume reasons. In one day, there can be more than fifteen earthquakes with a magnitude of less than 1, for which the tremor is not felt.

On their side, the journalists associated to the Bxl’Air Bot project had to choose which readings they wanted the system of the journalist and developer Laurence Dierickx to get (the local authorities communicate with smileys on the quality of the air in Brussels but do not integrate all the existing sensors). Why take these and not those or why not all the sensors (knowing that some are placed in more polluted areas than others)?
The Bxl’Air Bot has taken the air quality indices during one year. The journalists have drawn a dense file including comparisons over time and between municipalities.

The Bxl’Air Bot recorded air quality indices for a year. The journalists have compiled a dense dossier including comparisons over time and between municipalities.

Laurence Dierickx also underlines the questions while collecting the datas (since it is done regularly): at what time should the information be collected? The sensors will not give the same information at 8am in the middle of a traffic jam as at 11am. Should the data be collected every day at the same time or not? There were real journalistic choices to be made.

The Stuttgarter Zeitung was not satisfied with the official sensors already in place because they do not cover the whole city. In this example, as in the one from Brussels, it is interesting to question the locations of these official sensors in relation to the daily road traffic.

Regaining the public’s trust

These journalistic choices must be explained to Internet users in a clear way, to improve transparency towards readers and gain trust. The media must be able to argue its data criteria because they are editorial choices, just like a choice of angle or title for an article.

Moreover, specialized journalists (columnists) have an important role. In association with the data journalists, they question the database, they point out what may be missing with regard to their knowledge of the subject, providing the context.
Working with researchers and experts

A regional general information media, which rarely has columnists in all fields, can also rely on experts and researchers to validate the chosen methodology. Moreover, they bring a precious help to understand and analyze correctly the data.

Working with researchers and experts

A regional general information media, which rarely has columnists in all fields, can also rely on experts and researchers to validate the methodology. Moreover, they provide valuable assistance in understanding and correctly analyzing the data.

The Stuttgarter Zeitung mentioned above, works on fine particles in partnership with a university laboratory. They also collaborate with Open Knowledge for the collection and structuring of the database (which is in open data), as well as its analysis. Owni relied on the expertise of members of the NGO France Libertés to validate the information sent by internet users in their survey on the price of water.

Vincent Lastennet, from Le Télégramme, regularly exchanges with INSEE statisticians. As for Julien Vinzent, journalist at MarsActu, he worked in collaboration with an academic on the data of degraded buildings in Marseille.

An important point for a collaboration to work is to agree on the roadmap beforehand (availability, objectives of the collaboration, timetable, method…) to avoid frustrations and disappointments. University or association time is not the same as media time and vice versa.

In datajournalism, there is journalism. Yes, I know you can read, but it’s always important to remember that. Journalism. So editorial choices, applied to the data collected, crossed, analyzed. These choices reflect the editorial line of the media, and they must be explained to the public.

This transparency allows, in my opinion, to reinforce the trust of the internet users. If, in addition, you work on service-related subjects, if you provide a service to them, you create loyalty. And these two elements, trust and loyalty, pave the way to subscription or membership.

Anna Bateson, Chief Customer Officer of the Guardian, indicated during a conference in April 2019, that the Panama Papers type investigations are one of the levers that make people want to support the title and become a member.

Data articles are the number one most read article in the New York Times annual ranking.

In the rrench regional medias that play this card, data articles have a very positive influence on the image of the title, giving it an image of high quality and very good feedback from readers, with longer reading times than with more traditional articles. At Nice-Matin, when a data article is published in the paid zone, people log into their account or click on the ad to see the article. The data content, provided it is well titled (in an interactive form), generates an action. “We are around 20% of the use of the paywall”, explains Damien Allemand.

At FigData, the Figaro’s data service, the observation is clear: “People are ready to pay,” says Stéphane Saulnier, the service’s editor in chief.

One data article (“Are you rich?”) generated the most subscriptions in the history of Le Figaro.

In comparison, “a good data article usually generates 40 subscriptions in a month. This one did a little more than 200 subscriptions over the same period,” explains Stéphane Saulnier. This cold (long tail) topic is republished regularly. With each republication, it re-generates subscriptions.

The article “Are You Rich?” published on April 18, 2019, has generated 628 subscriptions as of March 11, 2020.

Since the beginning of 2019, all of FigData’s production has gone premium and the results are interesting:
➡️ average reading time of a data article? 4mn30, with peaks at 7 or 8mn depending on the subject.
➡️ internet users who come back, who read articles that have been published for several days or weeks because they are perennial content, and finally articles that are often dense and rich and present original angles.

“This content is worth subscribing to in order to access it,” Stéphane Saulnier.

Why does it work for them and not for others? Perhaps because this type of subject and its treatment meet the expectations of Le Figaro’s readership (mostly retired or pre-retired people in the upper classes), and probably also because they have found an organization that works well: the three data journalists analyze, the specialized journalists write, contextualize and use their address book to enrich the analyses with interviews.

Complementarity and collaboration. Two key notions and two work habits that are often culturally distant from journalists, but which we regularly see pay off.