How to get value from your environmental data: A guide
By: Jon Coello & Pete Redshaw
“Data is not information. Information is not knowledge.”
Clifford Stoll
“‘Data is’ sounds better than ‘data are’, so forgive us for our poor grammar.”
Us
We’ll leave it to a policymaker or in-house sustainability lead to say how information gaps can impede their ability to make good decisions with confidence – whether it’s implementing measures to offset emissions and environmental impacts or finding ways to reduce physical risks and enhance resilience. Time and time again, we’ve heard how uncertainty about the best course of action can lead to decision-making paralysis and a failure to really act on important environmental issues.
When it comes to our roles as software engineers and data scientists: How can we make sure that environmental data are used as the basis for clear, impactful decision making?
After all, data alone is no guarantee of meaning, insight, or action. We’ve all spent evenings staring at spreadsheets waiting for a pattern or story to emerge. The latest AI tools show promise for helping us extract insights from our data and ideate on modelling and data processing approaches, but we have to use them judiciously and as part of a broader process. This is particularly true where deep understanding of the underlying system behaviour, transparency, explainability, and reproducibility are required, and regulatory requirements must be considered.
We find the answer lies in following a broad scientific process to maximise the value of the data you have at your disposal. Getting clear on the question you want to answer, identifying trusted scientific models that embody a wealth of hard-won knowledge about the underlying environmental processes in question, utilising publicly available third party data, then using some smart techniques for combining your selected models and data to gain insights and make predictions.
In this short guide, we’ll explore how you can use your primary data to build solutions and accelerate the time to impact.
It’s time to go back to basics. How do you get value from your environmental data?
(1) Decide on the insights you want to provide.
As modellers, we often hear: ‘Can’t you just build me a model? I’ll figure out what to do with it later’. But a model is only as useful as the question it’s designed to answer.
A well-defined question – I want to understand X – gives us the direction we need. It tells us what kind of model to build, what level of resolution is required, and how much data is necessary. That’s the crux of it: if you know what you need to understand, and at what level of granularity, you can shape the model accordingly.
Let’s take farming as an example. Are you trying to forecast the likely crop yield from a piece of land in the next year, or predict how climate change will impact crop yields in the UK over the next decade? The questions are related but the choice of models, input data, and temporal and spatial resolution will differ significantly.
Data alone is not information, and it only becomes valuable when we extract meaning from it.
Transforming raw data into something meaningful – something actionable – requires careful design. In that sense, it’s a product-thinking challenge. In order to engineer it into a product, you need to figure out what’s most valuable to your target audience and deliver that value as quickly and inexpensively as possible.
(2) Know what publicly available datasets you can use to add value to your primary data.
Data is now more accessible than ever. Just a decade ago, accessing something as simple as weather data meant downloading .csv files from region-specific websites and spending days wrangling the raw numbers into a usable format. Today, tools like Google Earth Engine provide up-to-date, pre-processed, global datasets that can save a huge amount of engineering effort, at very affordable prices.
Since 2015, Sentinel-2 has made high-resolution satellite imagery updated every six days at a 10-metre resolution available for most of the surface of the Earth. This has given rise to a huge range of proprietary and publicly available datasets derived from this raw data, giving us a vast amount of insights about what is happening on the surface of the Earth.
These are just two examples of how much high-quality data is out there to be used in innovative ways. You can almost always amplify the value of the primary data you collect, whether it’s high quality field-data, or information gathered from your users, by intelligently combining it with carefully selected publicly available data.
The key is knowing where to look and who to ask. And if you’re not sure what’s available, talk to people who do.
(3) Choose the right models to give your data meaning, enable forecasting, hindcasting, and ‘filling in the gaps’.
Data is often noisy and sparse, both in time and space. Errors creep in that make raw data difficult to interpret. But a well-fitted model can turn scattered data points into meaningful underlying patterns, manage the noise, bridge the gaps, and enable you to make forecasts and hindcasts (simulations of past conditions).
This is where trusted models come in. Soil carbon models help estimate carbon sequestration. Crop growth models predict agricultural yields under shifting climate conditions. Models of industrial networks can help us to reason about the impacts of environmental policies. Models help us to understand risks such as flooding, erosion, landslides, and susceptibility to fires. They help us to model the spread of diseases.
Models encode understanding that helps us to extract meaning from data. Knowing what models exist in a field requires diving into the academic literature and understanding enough of the context to select the right model for your problem. Knowing how to use them requires understanding the maths and enough about software engineering to produce efficient implementations, which are often not readily available.
The right model can help you to transform your primary data into actionable insight, helping you or your end user identify trends, anticipate change, and make informed decisions.
(4) Use probabilistic inference techniques to combine trusted models, third-party data, and your high-value primary data to extract maximum insight value with quantified uncertainty.
Your primary data is hugely valuable. It’s precise, specific, and directly relevant to your use case. In contrast, global datasets, while powerful, often come with greater uncertainty. A 10-metre resolution satellite dataset might seem impressive, but sometimes, you need much higher precision, or access to information that cannot be measured through or remote sensing, to capture the full picture.
The key is working within a framework that acknowledges and accounts for uncertainty. Large open-source datasets may lack the detail of primary data but still hold immense value, especially when their uncertainty can be quantified. By integrating both, you can strike the right balance: using broad datasets for scale and context, while relying on high-resolution primary data where accuracy matters most, or when additional crucial information must be gathered.
Process models provide another way to understand reality – this time from the perspective of the behaviour of a system that is encoded in the model, and the parameters that have been derived from experimental data. However, when applied to specific scenarios, they almost always make inaccurate projections. The key to extracting maximum insight lies in effectively combining process models with data, taking into account the uncertainty in both. Bayesian inference techniques provide a powerful framework for achieving this.
Bayesian inference techniques
A good modelling approach doesn’t just generate projections – it acknowledges and adapts to new data. Calibration and ongoing data assimilation helps ensure that your model fits the data and updates as new data becomes available, rather than relying on assumptions that may drift from reality. Bayesian inference techniques provide a powerful way to calibrate models using baseline data and continually assimilate new data as it becomes available.
Notes on Bayesian inference
Bayesian inference techniques provide a structured way to integrate new data into a model, continuously refining its accuracy and predictive power. Instead of simply accepting initial projections as fixed, Bayesian techniques allow us to fit models to noisy data and continually adjust projections as more data becomes available.
Bayesian inference techniques also quantify uncertainty in a robust way that balances the expected behaviour of a system encoded in a process model with data about the system’s behaviour in the real world. As more data, and in particular ‘time series data’, are used for model calibration, the model starts to incrementally reflect reality more accurately, including the uncertainty we have about the state of the real world system and how it might change in the future. This gives us a sophisticated way to explore a range of possible outcomes and reason about probability, uncertainty and risk.
Using these approaches, we can build models that not only make predictions but also learn, refine, and adapt to the real world as data becomes available.
(5) Less is more. Optimise your data collection to improve your user experience (UX) and reduce costs.
One of the keys of optimising data collection is understanding how to minimise the amount of new data required to deliver valuable insights, while reducing the potential hassle for users and keeping down data collection costs.
This is where a structured framework comes in. For instance, in soil carbon modelling, we can estimate future conditions with existing data. If we add a new data point, how much does it actually refine the prediction? Or, if we go back to our farming example and add a new dataset on precipitation levels, will this additional data significantly improve accuracy? This is essential to evaluate whether or not the new dataset is worth the investment.
The same principle applies to user interactions. Ideally, we should ask the fewest possible questions while still achieving a high degree of confidence in our results. As with so many things, our experience developing models tells us that the Pareto principle (80/20 rule) can apply here: 80% of the insight typically comes from around 20% of the data. It is always worth weighing up the value of collecting additional data against the hassle to users of asking them to provide it.
By using this approach, we can streamline data collection, reduce user burden, and ensure that every piece of data collected serves a clear purpose.
(6) Design data for regulatory compliance, and ensure it is (dis)aggregated in a way that enables reporting use cases.
Where applicable, you’ll also need to understand the regulations and reporting guidelines that govern your data. The way you aggregate or disaggregate information can determine how easily it aligns with different reporting requirements.
The reality is that every reporting framework has slightly different requirements, even though they often rely on similar underlying data. The key is structuring data efficiently so that it can be sliced and reformatted to meet multiple standards without too much additional effort.
For example, if you’re tracking carbon emissions for a farm, you might initially report at the farm level. But what if someone asks for emissions per tonne of crop yield? If your data was collected at a granular level, it’s easy to aggregate upwards to the farm (and potentially, regional or national level – you get the idea!). However, if you only captured farm-level data, breaking it down further may not be possible. Thinking ahead about granularity ensures flexibility when meeting different reporting needs.
Understanding emission scopes and reporting requirements
Understanding Scope 1, 2, and 3 emissions is critical for ensuring that data is structured correctly.
- Scope 1: Direct emissions from owned sources (e.g. fuel burned onsite)
- Scope 2: Indirect emissions from purchased electricity
- Scope 3: Upstream and downstream emissions (e.g. transportation, supply chain activities)
Accounting in a way that allows breakdown by emissions scope, keeping upstream and downstream impacts separate, and accounting for emissions and removals (i.e. sequestration) separately can make it much easier to satisfy the requirements of different reporting standards.
It’s all about assigning and sharing responsibility, and not everyone assigns responsibility for environmental impacts in the same way. Because different standards treat scopes differently, companies need to ensure correct disaggregation so that emissions are attributed to the right entity and appear on the correct balance sheet.
Key standards to consider:
- GHG Protocol – The global framework for carbon accounting
- FLAG (Forest, Land, and Agriculture Guidelines) – Sector-specific guidance for land-based emissions
- CSRD (Corporate Sustainability Reporting Directive) – A key framework for EU-based reporting
- Science Based Targets initiative (SBTi) – Sustainability commitments framework for companies and financial institutions
- Voluntary carbon market standards e.g. Verra – For offsetting and credit trading
(7) Distill, simplify, and visualise results to ensure that the data ‘lands’ with the user and tells a compelling story.
Simplicity is key. No matter how smart your analysis, if the insights aren’t clear, they won’t drive action. You can do a heap of really good work, then fall at the last hurdle when delivering information that’s hard to digest.
It circles back to the core question: What insight are you trying to convey? You can build an exceptional model and generate highly accurate results, but if the final output is too complex, it loses impact. The challenge is expressing complex information in a language that’s as intuitive to as many people as possible.
Making results relatable
Saying ‘Emissions from this process are X per tonne’ means little without context. Is that good or bad? What does that compare to?
The key is to frame the data in ways that resonate with the user’s world. When considering carbon footprinting and other forms of environmental accountancy, expressing emissions or reductions as a proportion of something relatable like a person, industry or nation’s annual carbon footprint rather than simply an absolute number is usually a good start. Developing compelling visuals also helps information to ‘land’ with users.
A well-structured model can transform data into information, but the final step – making it compelling, intuitive and relatable – is where you can create knowledge and drive action.
Move fast and fix things
Getting value from environmental data is as much about strategy and design as it is about science and technology. The good news is that by defining clear objectives, using trusted models, leveraging publicly available datasets, embracing decision making under uncertainty, and creating a customer-centric user experience, you can transform raw datasets into an impactful product.
And the even better news?
We can help you do that.
Find out more about partnering with Cirevo on your product journey here.