Sensors and data collection
To be effective, realistic, useful and relevant, the models developed in themes 1 and 2 must have access to abundant, high-quality data for their design, calibration or validation. However, as already mentioned in these two themes, it is extremely rare to have access to the quantity and quality of data required: in many of the areas in which UMMISCO and, more generally, the IRD are involved, data is in short supply, as the collection or production of quality data has until now been a luxury that few governments or institutions could afford, either through lack of resources or lack of interest. Data collection is therefore a major challenge, particularly in developing countries, where the acquisition of reliable data is essential to the design of models capable of supporting decision-makers. While themes 1 and 2 propose to develop complex strategies for generating or approximating synthetic data when data is not available, this theme will focus on the scientific assets and challenges of data production and collection in a context profoundly renewed by the arrival of low-cost embedded and connected solutions.
Scientific objectives and background
Today, we are witnessing a profound change in the way data is used and acquired. The advent of low-cost, low-tech and FabLabs is opening up new perspectives: the reduced cost of many environmental sensors (water, air, soil) facilitates their dissemination and densification across the territories studied, and encourages the continuous production of large volumes of environmental data. Reference“ measurement systems are beginning to give way to an interconnected ecosystem of sustainable, repairable, replaceable and sometimes mobile micro-sensors. Whereas in the past, data were produced solely by governmental or scientific institutions, today they are the fruit of the active, combined participation of players with sometimes divergent capacities, interests and objectives. This proliferation of responsibilities and technological solutions is raising concerns among scientists and decision-makers, particularly as regards the quality and reliability of the data produced, and its usability for research and decision-making purposes. For example, the air quality agency in Dakar, one of the benchmarks in West Africa, was having difficulty maintaining its costly network of measuring stations, and recently considered the use of low-cost micro-sensors, only to discover that, having been designed in the North, they were not adapted to the local climate, that the data produced were erratic and that their maintenance was proving problematic.
In many other examples, the production, storage and quality of data are less and less under the control of the players who need them, and this has a direct impact on the quality of the models available to them. It is therefore becoming urgent to propose solutions that enable scientists to obtain reliable data to conduct their research. In highlighting this theme for its renewal, UMMISCO wishes to build on nearly ten years of development to propose innovative research into sustainable data acquisition, production and processing, with particular emphasis on
focus on the following four points:
- Development of sensors adapted to the specific needs of research and modeling activities in the South. The aim is to produce open or semi-industrial technological solutions that are sustainable, locally repairable and suitably priced, ensuring the production of quality data for researchers and stakeholders. The experience acquired by the unit through, for example, QameleO (Martiny et al. 2022) and its industrial partnerships, forms a solid basis for carrying out this activity and developing the scientific instruments that will enable us, in the future, to co-design models and data acquisition systems.
- Sensor deployment. In collaboration with local communities. To be effective and sustainable, sensor networks need to adopt a morphology that takes account of the territory, climate, local issues and the resources available for installation and maintenance. The aim here is to develop participatory methods for qualifying the sensor networks to be deployed according to local context and issues. The UMMISCO centers' links with local authorities and stakeholders, and the scientific expertise acquired in multi-source methods, will enable sensor networks to be qualified, as proposed in the Aircrowd-Africa project, for example.
- Training and appropriation. Sensor networks for stakeholders. A sensor network cannot abstract itself from the stakeholders who manage it. Acceptance of the instrument and its appropriation by these stakeholders is a sine qua non condition for ensuring its longevity and, for scientists, the continuation of acquisition in good conditions. The aim is therefore to build microcosms of committed and trained users, based on the UMMISCO network of partners, with a focus on collaboration with FabLabs belonging to the RFAO (Réseau de FabLabs d'Afrique de l'Ouest), such as Blolab (https://blolab.org/). This objective is in line with some of the objectives of theme 4 on citizen science.
- Storing, sharing and making data available. The use of data often exceeds the initial expectations that led to its production or acquisition. As a transversal theme to the case studies, we will set up an open platform for collecting, processing and sharing sensor data in real time. The foundations of this platform already exist within the Qameleo12 framework, but it will have to evolve with regard to the concepts of intellectual property and citation, drawing on the work carried out at IRD on DataSud.
How can we integrate the aspects of continuous, real-time observation, the acceptance and appropriation of these observations by communities, and their assimilation into decision-support models? To meet this challenge, which combines technological and social issues, and in line with its objectives of developing participatory science (see theme 4), UMMISCO will work to involve local communities in the very process of
sensor design, in particular via the FabLab networks in the South with which the unit has long-standing relationships. This design, as well as the long-term deployment of scientific sensors, will require us to overcome the following main obstacles:
- Develop low-cost, open-source sensors while ensuring measurement reliability and reproducibility. Sensors on the market are usually “black boxes” that cannot be modified by end-users, both for industrial reasons and to ensure measurement quality. Costs, maintenance times and dependence on suppliers make these solutions unsuitable for development contexts. For several years, UMMISCO has been developing open sensors such as QameleO or Waou, with the philosophy of enabling their construction, calibration and maintenance at local level (via its scientific partners or FabLabs). However, this approach comes up against the problem of reliability and reproducibility of acquisition: developing a sensor is not simply a matter of building an instrument, and it is important to establish protocols, qualification methods and models for fault detection and real-time recalibration (using AI in particular). The Waqatali (ANR Labcom 2022-2026) and AirCrowd (currently being submitted to the FID; AirCrowd Africa is currently being evaluated by BPIFrance) projects are part of this dynamic.
- Develop data assimilation techniques to integrate sensor data into simulation models in real time. Connecting the models developed in themes 1 and 2 to real, continuously-acquired data is a crucial challenge if these same models are to be used, for example, to support decision-makers in crisis situations. Data assimilation aims to automate the translation of a real measurement (temperature, humidity, CO2 concentration, etc.), made periodically and locally, into data, sometimes spatialized, which evolves with the temporality of the models using it (Ngom et al. 2021). This assimilation poses numerous technical and methodological problems, which the theme will address in the context of specific applications, but with the aim of producing generic methods.
- Designing and integrating embedded models within sensors. Two of the main challenges of the approaches we propose lie in (1) reducing the volume of data produced; (2) ensuring that the data produced is relevant to needs, particularly as these evolve. These two challenges come up against the same obstacle, which is the ability to embed in individual sensors sufficiently powerful computational capabilities and models to be able to process the data in situ, either to synthesize it (e.g. Embedding AI to count vehicles on a stretch of road, to count pirogues in images or to detect sound events in acoustic recordings), or to calibrate or recalibrate them according to external conditions, or to transform them (e.g. embedding machine translation and speech processing models in microchips to facilitate their on-line use by investigators).
UMMISCO 4 will develop sensors dedicated to the themes addressed by the various centers that make up the unit, promoting the capitalization and transfer of skills between centers: air quality sensors, underwater acoustics sensors, traffic flow sensors, noise sensors, with the aim of building indicators or data, such as the degree of atmospheric pollution, marine biodiversity, traffic density, etc.
- Air quality and health. The aim is to develop an innovative, scientifically-validated technological suite for better air quality measurement (QameleO stations), and better positioning of these sensors (TeleSense), with a view to ultimately responding to the worsening air pollution situation in developing countries, with its health, environmental and economic consequences for the population and territories concerned. Our project is also unique in that it involves the local population and stakeholders, encouraging changes in individual lifestyles and behaviours, and ultimately prompting public action.
- Urban ecology. The sustainability of cities cannot be achieved without coherent greening of urban space and intelligent planning using decision-support tools. The Waqatali joint laboratory is part of this dynamic, combining the Internet of Things, artificial intelligence, participative decision support and innovative advisory services for local authorities to increase the environmental and societal co-benefits of urban greenery. Today's cities and societies are undergoing profound changes in response to the climatic, environmental and societal challenges facing both North and South. What was once considered an ornamental feature is now becoming an infrastructure that provides services to residents, in the same way as roads, bus networks and fiber optics. Thinking of planted spaces as urban infrastructure is a new idea that we are defending. The plant infrastructure is then qualified by its benefits (temperature, air quality, etc.) and its costs (economic, spatial and hydric). Its design and implementation, from sensor to decision-making aid, are the fruit of concerted action between experts, decision-makers and city managers, within the framework of sustainable urban planning policies.
- African languages. Data collection for African languages (all poorly endowed) has highlighted the value of embedding machine learning models in sound sensors. The aim is to facilitate data collection by automating pre-processing tasks (noise elimination, signal segmentation, etc.) and certain labeling tasks such as tone detection and diarization. In UMMISCO 4, it is planned to continue with the work initiated in collaboration with the SYEL team at LIP6 (SU) to design and produce sound sensors based on programmable circuits for tone detection (Mba et al., 2022).
- Optimized irrigation. Irrigation strategies have had to adapt to environmental and climatic changes. Resilient irrigation strategies such as wastewater reuse or rainwater harvesting are becoming commonplace. To further improve their efficiency (i.e. save water while maximizing plant yields), approaches such as smart irrigation based on artificial intelligence and the use of advanced sensors (Navinkumar et al., 2021; Gao et al., 2023) are beginning to be tested. Modelling and sensors combined with agroecological practices are contributing to new approaches to sustainable irrigation, such as deficit irrigation developed for arid and semi-arid countries (Chalmers et al., 1981; Marsal et al., 2000; Kang et al., 2023, Bur et al. 2022).
- Cardiac biosignals. In this area, UMMISCO 4 aims to develop translational applications based on the automatic analysis of electrocardiograms by AI models, in particular to prevent the risk of sudden death. However, ECG acquisition in a clinical environment remains highly complex for developing countries, and very often neither available nor sufficient. Recently, the development of ECG sensors in the form of patches or wearable devices has enabled continuous ECG recording, but these sensors are often very expensive and not adapted to AI models trained on a clinical sensor context. The aim of UMMISCO will be to overcome these obstacles, by co-developing ECG recording devices with colleagues in the South, and adapting AI models so that they are effective with this type of data and validated in clinical studies.
In addition to the internal and cross-functional tasks assigned to all UMMISCO 4 themes, this theme will promote a range of practical sensor-related activities. First of all, this will involve designing, producing and distributing video tutorials to train researchers and users in data collection on the unit's themes. A practical seminar, focusing on one of these themes, will then be organized each year to complement these videos. Finally, regular hackathons will be organized in partner centers or FabLabs on topics such as pre-processing collected data to annotate it and improve its quality for machine learning.