A schematic of our experimental design is presented in Supplementary Fig. 1. We invited food and nutrition specialists to take part in a survey via an online interface that we developed. The survey set-up was as follows: the individuals taking the survey (hereafter, evaluators) were given a description of the NOVA classification system and its food assignment criteria. Then, the evaluators indicated whether they wanted to assess one or two lists of foods. In this study, we defined an “assignment” as the act of assigning a food to one of the four NOVA groups (NOVA1, NOVA2, NOVA3, or NOVA4). Evaluators were also asked to rate their level of confidence in each of their assignments. Using these data, we explored the relationships between the most common assignment (NOVAmaj) made by the evaluators (e.g., NOVA1maj = the most common assignment for a given food was NOVA1) and food nutritional quality. The latter was determined using several nutrient profiling systems.
One list containing 120 foods (hereafter, marketed foods) that were accompanied by detailed ingredient information. These marketed foods came from an official database of commercially available packaged foods in France . We focused on three categories of foods, namely fresh dairy products, bread products, and mixed dishes, because they contain marketed foods commonly consumed in France  and are thought to display diversity in recipes and formulations. Forty food products were randomly selected from each category, using a weighted approach to ensure product representativeness within categories (e.g., the number of sandwich breads in the sample reflected the proportion of sandwich breads within the bread product market as a whole). The products were identified using generic descriptors; no brand names were employed for reasons of confidentiality. From the database, we also acquired information on the foods’ ingredients (including food additives) and nutrient content (i.e., as presented on food packaging).
One list containing 111 foods (hereafter, generic foods) that came from a dietary survey that was performed as part of the Three-City Study (Bordeaux cohort) and that combined a food frequency questionnaire with a 24-h dietary recall approach [23, 24]. We identified the most frequently consumed food based on the 24-h recall findings for each of the 54 FFQ food categories (e.g., apples were the most frequently consumed fruit); this information was used to create a list of 54 generic foods.
So that both lists were of similar size and structure, the list of generic foods was expanded by adding foods from the dairy (n = 22), bread (n = 13), and mixed dish (n = 22) categories using the 24-hour recall data. In other words, while the list of marketed foods included a wide range of products from three categories, the list of generic foods contained a few foods from multiple food categories. Additional information about the foods in both lists, and foods which overlapped between the two lists, are available in the supplementary materials (see Supplementary Tables 1 and 2).
Since NOVA is mostly used by specialists, we specifically attempted to invite evaluators with at least a basic background in nutrition and/or the food sciences. We targeted four main groups: researchers with expertise in human nutrition, researchers with expertise in food technology, health professionals who provide nutritional guidance (i.e., medical doctors and dieticians), and skilled research and development professionals working in the food industry. We invited people to take part in the study by directly contacting scientific and/or clinical societies, research institutes, and professional associations; we requested that our invitation be restricted to their professional networks. Anyone wishing to participate could immediately log into the online survey interface; the names and affiliations of evaluators were kept fully anonymous, as per the European General Data Protection Regulation and French regulatory requirements. The exact number of invited professionals was not available.
The survey could be accessed from November 27, 2019 to February 8, 2020. The first page presented the survey’s objective. It was followed by a thorough description of the NOVA classification system and its food assignment criteria, directly translated into French from the two original articles written by NOVA’s creators [9, 10]. Links were provided to these publications and to a list of all the additives used in Europe (E number and technological functions). The entire survey is available in the online supplementary materials (OSM1 to OSM4).
Evaluators were first asked to self-assess their expertise in human nutrition and food technology using a Likert scale (0–6) and to indicate whether they wanted to work on one or both food lists. If evaluators wanted to work on a single list, they were given List of marketed foods or List of generic foods at random. If evaluators wanted to work on both lists, List of marketed foods and List of generic foods appeared in a randomized order. In each list, foods were presented in blocks (i.e., five per page). Food occurrence within a given block was random, and blocks were presented at random. Returning to previous pages was not possible. Evaluators were asked to assign each food to a NOVA group and then rate their level of confidence in their assignment, on a scale from low to high (four levels). We ran a pilot version of the survey using 10 outside volunteers who represented the different types of desired evaluators. The goal was to verify survey feasibility and to estimate the time needed for its completion (~1 h/list).
At present, several systems are used to assess food nutritional quality. Here, we employed the Nutri-Score system , the SAIN,LIM system , and the Nutrient Rich Food (NRF) Index (version 9.3) [27, 28] to generate profiles for each food in the two lists; we also estimated their energy density levels (kcal/100 g). We obtained the nutritional information for these calculations from the OQALI database  and the CIQUAL database; when a food was absent from the databases, we used the information for the most similar food that was present.
Briefly, the Nutri-Score system considers a food’s levels (per 100 g) of more beneficial nutrients (i.e., protein, fiber, and percentages of fruits, nuts, vegetables, olive oil, canola oil, and walnut oil) and less beneficial nutrients (i.e., energy, total sugar, sodium, and saturated fat). The food is then assigned to one of five classes, which range from A (highest nutritional quality) to E (lowest nutritional quality). In the SAIN,LIM system, SAIN stands for “score of nutritional adequacy of individual foods” and expresses the density of five beneficial nutrients (i.e., protein, fiber, vitamin C, calcium, and iron) per 100 kcal of a food. LIM stands for “limit” and expresses the levels of three less beneficial nutrients (sodium, free sugars, and saturated fatty acids) per 100 g of a food. Using thresholds for each score, four classes can be defined: 1 = high SAIN, low LIM (the best class); 2 = low SAIN, low LIM; 3 = high SAIN, high LIM; and 4 = low SAIN, high LIM (the worst class) . The NRF Index arrives at a continuous composite nutritional score by subtracting the LIM score (expressed per 100 kcal instead of per 100 g) from the density score of nine beneficial nutrients (i.e., protein, fiber, vitamins A, C, and E, iron, calcium, potassium, and magnesium) per 100 kcal of a food. Higher scores indicate higher nutritional quality .
Identical but separate analyses were performed for each list.
To ensure the evaluators displayed caution and honesty when completing the survey, we performed a quality control test using five foods per list for which the NOVA group should have been obvious. From the list of marketed foods, we selected beef bourguignon and potatoes, fruit dairy dessert, Chinese fried rice, toasted bread with fruit chips (expected assignment of NOVA4 for all four foods), and plain yogurt (expected assignment of NOVA1) (see Supplementary Table 1). From the list of generic foods, we selected apple, lettuce, egg (expected assignment of NOVA1 for all three), butter (expected assignment of NOVA2), and soda (expected assignment of NOVA4). When evaluators arrived at an erroneous assignment for more than one test food, their data were excluded from the analysis.
We also excluded any data from evaluators who failed to assess all the foods on the list(s), which allowed us to better ensure that evaluators were committed to their task, and to limit confusion that may arise from statistical analyses based on different sample sizes of foods and/or evaluators.
Description of the data
The NOVA system does not provide “gold-standard references” to which the evaluators’ assignments could be compared. To describe the raw data obtained (i.e., the assignments), we calculated, for each food, the percentage of assignments in each of the four NOVA groups and, for each list, the number of foods assigned to one, two, three, or four different NOVA groups.
NOVA assignment patterns
Each evaluator assessed each food, assigning it to a NOVA group. Using these data, we obtained a frequency table for each food list (Supplementary Tables 1 and 2) on which a correspondence analysis (CA) was performed. The spatial pattern of these NOVA assignments was then graphically represented. Furthermore, for each food list, the degree of association between the foods and the NOVA assignments was quantified using Cramer’s V. The value of this coefficient varies from 0 to 1, where 1 signified that the NOVA assignments were 100% consistent for each food (i.e., all the evaluators assigned a given food to the same group).
Consistency among evaluators
We estimated Fleiss’ κ to quantify the degree of agreement in the evaluators’ NOVA assignments; we used an overall sample based on the mean of 1000 bootstrapped samples . Fleiss’ κ can range from 0 to 1, where 1 indicates full agreement. Each bootstrapped sample was stratified by professional background. There were at least 10 evaluators representing each type of professional background, leading to a bootstrapped sample size of 70 evaluators. This strategy was chosen to detect whether experts with similar professional expertise might be evaluators with higher concordance.
Food clusters arising from NOVA assignments
We performed hierarchical clustering on principal components (HCPC) to identify clusters of foods that displayed similar distributions of assignments among the four NOVA groups. If each food had been assigned to the same NOVA group by all the evaluators, the HCPC analysis would have yielded four clusters, each 100% composed of foods exhibiting the same NOVA assignment. Consequently, the clusters of foods reflected differences in assignment distributions, helping us identify similar and dissimilar distribution patterns. For example, HCPC could yield (i) a cluster in which most foods had been assigned to a given group or (ii) a cluster in which most foods had been assigned to three or four groups. The latter case would be a sign that evaluators were highly inconsistent in their assignments, and the foods in such clusters would merit further examination.
The clustering algorithm utilized Ward’s method , and the number of clusters was set to obtain the smallest amount of within-cluster variation possible. We determined the percentage of NOVA1, NOVA2, NOVA3, and NOVA4 assignments in each cluster.
We identified evaluators who produced atypical assignments based on a recent method described by Lindskou et al. ., which detects outliers in contingency tables. Then, excluding the data from these outlier evaluators, we performed sensitivity analyses on the Fleiss’ κ values and the HCPC results to verify the robustness of our main analyses.
NOVA assignments and nutritional quality
We first defined the most common assignment made by the evaluators for each food (i.e., NOVAmaj). For instance, for a food that could have been assigned NOVA1 by 5% of evaluators, NOVA2 by 15% of evaluators, NOVA3 by 35% of evaluators and NOVA4 by 45% of evaluators, we retained that this particular food was mainly assigned NOVA4 (i.e. NOVA4maj). Using chi-squared tests, we explored the relationships between NOVAmaj categories and their Nutri-Score and SAIN,LIM classes. Then, for each NOVAmaj category, the distributions of the values of Nutri-Score, SAIN, LIM, NRF 9.3, and energy density were graphed using boxplots, and statistical comparisons among NOVAmaj categories were carried out using a non-parametric test (Kruskal-Wallis).
We performed all the statistical analyses using R (v. 4.0.2.); we employed the package Irr to assess consistency among evaluators; FactoMineR to perform the CA and the HCPC; and DeskTool to calculate the Cramer’s V values. For all statistical tests, an alpha level of 5% was used.