Dictionary Series: What do we mean when we talk about frontier data?
By Viviana Cañón Tamayo, Frontier Data- programme specialist at UNICEF HQ
Disclaimer: ideas here belong to the author and do not represent UNICEF
What do we mean when we talk about Frontier Data? Before we get into definitions, here are three real-life projects and examples of frontier data:
Artificial Intelligence used to identify schools from satellite imagery - more about this project.
Forecasting models designed for Zika (using mobility data) - results of work on forecasting models for emerging diseases were published in Nature Communications. This work included using mobility data with other Big Data sources to develop an ensemble of 16 forecasting models in the context of the Zika epidemic. Internal dissemination of this work was done with some UNICEF Country Offices and researchers working on this space. Some of the models have been used during Covid by some of UNICEF’s research partners
Developing a methodology to understand mental health causes and needs using social media data (NOTE: This project is under development; therefore, non-public information is available yet).
What do they have in common, and why do we categorize them as frontier data?
I define Frontier Data as an umbrella term that covers Big Data such as web searches, social and news media, Call Data Records (CDRs) from phone companies, satellite imagery, plus new data techniques (such as Artificial Intelligence and collective intelligence, geospatial – GIS, etc.).
It is not only about data and techniques, it is about the combination of both. So let’s decompose the term Frontier Data into three parts to understand this: (1) the use of Big Data, (2) the use of new data technologies and (3) the combination of the two.
1) The use of Big Data
I understand big data as “a huge volume of data that cannot be stored and processed using the traditional computing approach within a given time frame.” (“Big Data explained in plain and simple English”. These are also broadly data which exist in a digital format but are generated in various ways. From the study commissioned by the UK Department for International Development (DFID) published in October 2020, Big Data is classified into different categories depending on their origin:
Active human-driven content (social media).
Passive Human data (location data from mobile phones).
Geophysical (remote sensing data).
Digital administration (Government records).
I would add to this classification synthetic data known as “data that is created manually or artificially apart from the data generated by real-world events. Various algorithms and tools are there to help us generate synthetic data, which is used in a wide variety of ways. This is generally needed to validate the model and to compare behavioral aspects of real data with the ones generated by the model.”(SINGH Kajal, “Synthetic data – key benefits, types, generation methods, and challenges”).
Big Data can also be classified into the categories of structured (i.e. a CSV file), semi-structured (i.e. data that is present within the emails) and unstructured data (i.e. the image file).
In the examples mentioned above, the data used were geophysical (satellite imagery), digital data coming from governments, passive human data (from mobile phones), and social media data.
2) The use of new data technologies
New data technologies include everything from blockchain and Artificial Intelligence (AI) to the Internet of Things (IoT) and Virtual Reality (VR).
In the examples above, the technologies used were Machine Learning (ML) and predictive analytics.
3) The combination of both: novel data and data technologies
To classify an idea, a project, or an initiative, I believe you need to be sure you have big data and new data technologies, as described in this blog post. All three projects from the example at the beginning have both of these elements.
What is frontier data in the development and humanitarian sector? As the recent report from MacGovern states:
I believe we haven’t seen the full potential of frontier data for the world, and we need to make sure we do the necessary to make that happen. The first steps to do that would be to: invest in talent and do everything to fix fundamental issues related to the existing data and models so they can be used in the developing world.
Want to find out more?
Drop us a line: Hello@dataforchildren.ed.ac.uk