Dear AI, We Have Trust Issues

This article was originally posted as part of Dave Thau's series on Medium.

This essay is the third in a series on the promise and challenges of using AI and machine learning to create a planetary environmental management system.

So far in this series, I’ve talked about the motivation behind an AI-enabled planetary-scale environmental management system (which I’m calling G.AI.A), and described why environmental monitoring, though critical, is not sufficient. In addition to monitoring, our system will need to inform our decisions and help us understand their impacts. It will need to imagine futures, and provide us with some capacity to know which of those futures are possible, which are probable, which are unlikely, and which are impossible. It should also offer insights into tradeoffs, helping us understand which stakeholders are positively and negatively impacted by decisions.

All of this information is advice, and any good advisor must be able to justify their suggestions, to explain why they’re sound and why they’re appropriate. When working with people, we can simply ask them to produce the evidence and reasoning behind their recommendations. For an AI system, it’s less straightforward.

The need for a system to be able to explain itself is a small part of a bigger issue: how can you trust AI? As AI is being deployed more widely, many flavors of ethical issues have arisen, and every organization deploying AI systems — companies like Baidu, Deloitte, Google, and Microsoft, as well as intergovernmental agencies like the United Nations, the OECD, and the G20 — is struggling to describe and implement its version of trustable AI. These attempts overlap in numerous ways, but they all underscore the need for AI to be explainable, just, accountable, replicable, and useful. Ensuring each of these features is a challenge, and there are entire bodies of research and literature behind them. I’ll focus on just the first two here: explainability and justice.

AI, explain yourself!

Since the early days of AI, researchers have known that AI systems need to be able to explain their recommendations. The creators of MYCIN, an early 1970s AI designed to diagnose and recommend treatments for blood diseases, learned from doctors that the ability to explain its diagnoses was one of the most important features the doctors would need to see before trusting the system. The system performed as well as experts, and better than general practitioners, and the explanations were seen as informative when the system was used as intended. This work has been followed by 50 years of additional research on explainability, and we still have a great deal to learn about how explanation works and how to elucidate good explanations from AI systems.

MYCIN made its decisions by following a system of rules, and explained itself by listing the rules it used in a decision and showing how it was that the rule applied. Rule-based AI systems are the easiest to explain. This extends to systems like decision trees, which are commonly used to classify environmental data. These systems, such as Classification and Regression Trees (CART) or the Random Forests technique, result in rules that can be read and understood by people. We can read these rules and understand both the information used to make an assessment and roughly how the system used that information to draw its conclusions. These systems are considered to be interpretable. As an example, consider an AI system built to identify animals photographed using motion-sensitive cameras. A (not very good) interpretable model might include a rule like ‘if it has wings, then if it is more than 4 inches long, it is a bird, otherwise, it is an insect.’

These rule-based systems are the easy case. But most of the approaches getting attention currently, neural networks and Bayesian methods, are not interpretable in the same way. These are the methods that are beating humans in our most challenging games like Chess, Go, Poker, and Starcraft, that your smart phone uses to categorize pictures you take, your email applications use to predict what you’ll type next, and, switching to conservation, they’re also the systems that sites like Wildlife Insights use to identify species in images taken by motion-sensitive cameras. These systems learn from massive amounts of data, and the internals of the resulting models are incomprehensible to humans. We can see that they’re performing well, but it is challenging to understand the details of why they are performing well. This lack of interpretability has led to an explosion of approaches to making these opaque systems explainable. Many of these methods are being rolled out into publicly available AI explainability tools like Google’s Explainable AI, Amazon’s SageMaker Clarify, and Fiddler’s Explainable AI. These tools don’t describe exactly how the AI systems work, but they do provide clues about what sorts of information the systems are using to make their decisions. Going back to our motion-sensitive camera example, an explainability tool for a species detecting algorithm might inform a user that the wing area and size of an image were major contributors to the system’s classification of something as a bird.

What’s at stake for G.AI.A?

The explainability tools described above help people working on AI understand which aspects of a problem the AI is focusing on. One of the critical issues they address is the problem of AI models being biased by their input. There’s a common adage in computer science: garbage in, garbage out. If the data used to train an AI model is biased, the model itself will be biased. Such biases have been discovered repeatedly and include hiring systems that were biased in the candidates they selected (more men than women), facial recognition systems that perform significantly worse for people of color, and a deployed system that mis-predicted the likelihood that someone would return to prison twice as frequently for Black prison inmates (incorrect 45 percent of the time versus 23 percent of the time). Returning to the topic of medicine, a recent report showed that every one of sixty-two machine learning algorithms for detecting COVID-19 failed to be useful, often because of poor quality data or biased designs. The study authors emphasize the need for transparent methodologies in order to establish trust in the developed algorithms.

Tracking down the sources of these biases and errors can be very difficult and time consuming. This is true for any complex software system. For example, after NASA tried and failed to identify a bug in Toyota’s Camry throttling system that led to a fatal accident, it took 18 months and a team of engineers to find the error (here are the slides presented at the trial).

These are all high stakes cases, and the higher the stakes, the more critical the need for clear explanations. G.AI.A, our AI-enabled planetary environmental management system, will need explanations for decisions it supports around infrastructure development, natural disaster response, energy system deployment, land use change, impact investment, and pollution reporting. Errors in suggestions in any of these areas could have catastrophic impacts on people, communities, and nature. A good example from my home state of California occurred in 2018 when an algorithm misclassified air quality around the state capitol during wildfire season. The organization that reported good air quality in the midst of several forest fires does not provide details about how their algorithm works, but it does claim that machine learning and big data are involved.

Machine learning and other artificial intelligence approaches are currently being applied to identifying species from motion detecting cameras, helping national park guards patrol for poachers, predicting likely future forest loss, and developing species habitat models. In each of these cases, the systems are making suggestions that will need explanations. In many cases the systems are acting with incomplete information and will explain not only their suggestions, but how certain they are that they are correct. This communication of uncertainty is at least as complicated as the ability to explain.

Who wants to know?

Biases, such as the ones described above, occur and persist to a considerable degree because of ‘the people in the room.’ The gender and ethnic imbalances in the population of AI practitioners are well-established. National imbalances in AI development are also clear. Fewer than 10 countries were represented in a recent report on the top 100 AI companies, all in the Northern hemisphere (and 85% in North America). These imbalances negatively impact the success of AI projects and must be addressed. In the G.AI.A context, when talking about a planetary system for environmental monitoring, we must ask who contributes, who controls, and who benefits. Only 60% of the world’s population is online, and of the countries the UN considers less developed countries, only 20% of the population is online. Some of the planet’s best land managers are among the least connected. At least 32% of the Earth’s land is held by Indigenous people and local communities, and Indigenous people are at least as good, and often better than other agencies at sustainability. These communities need to play an active role in the development and support of any system for global environmental management.

As seen in the figure above, the number of people online is steadily growing, but we have a long way to go before there’s global equity in digital access. As we develop G.AI.A we must work toward addressing these imbalances over AI development and access.

Is it worth it?

Ensuring equitable access to the development and benefits of AI will be a challenging and ongoing effort. Equally challenging will be determining the tradeoffs between the costs and benefits of AI. As reported by the Global E-waste Statistics Partnership in their 2020 E-waste monitor report, 53.6 metric tons of electronic waste was generated in 2019. That’s 7.3 kilograms (16 pounds) per person on the planet. Around 20% of that is IT related e-waste. According to the European Framework Initiative for Energy and Environmental Efficiency in the ICT Sector, almost 10% of all energy used and 4% of carbon emissions come from IT. AI models are expensive to create. According to researchers from Google and UC Berkeley, a single training of OpenAI’s GPT-3 language model emitted 552 metric tons of carbon dioxide, roughly equivalent to 3 round trip flights between San Francisco and New York. Kate Crawford’s “The Atlas of AI” enumerates a multitude of environmental impacts of AI and the technologies that support it. The question is, do the environmental gains attributable to AI outweigh the environmental costs? Do the societal gains outweigh the societal costs? How can we even measure that? Going back to GPT-3, machine learning models are trained very infrequently, compared to how often the models are used, so perhaps 3 cross-country round trip flights is a price worth paying. The question is, how do we know?

Measuring the impacts of environmental efforts is notoriously difficult. Attributing measured impacts to specific sources is even harder. That said, unless we actively monitor both the positive and negative environmental and societal impacts of the technologies we deploy, we won’t be able to decide whether or not they’re worth it. The good news is that this isn’t news. On the environmental side, most of the large cloud computing providers are trying to drive down the carbon footprint of their data centers, and AI researchers, including the authors of the aforementioned Google/UC Berkeley paper, are working to build more energy efficient machine learning frameworks. On the societal side, technology companies are conscious of and trying to address societal issues like child labor. These efforts are a good start, and we as consumers, producers, and regulators of technologies must insist that they are continued and expanded. With regards to G.AI.A, measurement and evaluation of the underlying environmental and societal impacts of the system must be built into the system.

Steps to ensure responsibility and transparency for G.AI.A.

The trustworthiness of AIs has been a known problem since the early days and will certainly exist for our AI-backed planetary environment management system. AI providers know this, and are working to address it. No matter what they come up with, it falls to the users and developers of AIs to use the explanation tools provided, to identify and remove bias, and to adjust their approaches accordingly.

Turning to the question of who contributes to, controls, and benefits from G.AI.A., we need to ensure that the power of AI is not concentrated in the hands of a few. Returning to the difference between interpretability and explainability, Cynthia Rudin, one of the leading researchers in the field, argues that interpretable models, those that you can explain to a person, can be as powerful as the black-box models that come out of neural net and deep learning approaches. They are also more transparent and, since the simplest models can be written down on a sheet of paper, are easier to share. Whatever explainability tools AI vendors develop are still operating on opaque and often corporate-controlled models which are inherently untrustworthy because you can never know how they really work. Her proposal to encourage responsible governance is to mandate that, for high-stakes decisions, interpretable models should be deployed unless there is a significantly better performing black-box model.

Where a non-interpretable model is used, it should be clearly documented. Google’s Model Cards approach is a step in this direction. This will help ensure that models are not used in unintended ways. Harkening back to the medical system MYCIN, although the system made good decisions, and explained them well, the system wasn’t adopted because it didn’t fit in well with the doctors’ workflow. Researchers attempted to convert the system into a teaching tool, but they found that the explanations the system gave lacked the depth needed to be educational. An even better form of documentation would be to provide documentation like an Open Science ‘research object’ (see these results from Cyverse for example).

Finally, as with everything having to do with our environment, we need to ensure that we work in a sustainable and equitable manner. Many of the intergovernmental organizations I mentioned at the beginning of this piece are focusing on that. One outcome of their work is the recently launched Digital Planet for Sustainability effort, which is developing a roadmap for an inclusive and green digital future.

Predictions and advice

This post focused on our fundamental trust issues with artificial intelligence systems and described a few ways to overcome them. The next post will turn to the kinds of suggestions, predictions, and advice that we can hope to get from G.AI.A, and will address some of the challenges we’ll face even when we have trustworthy artificial intelligence. Stay tuned!

Thanks to my esteemed reviewers, commenters, and contributors: Adia Bey, Aurélie Shapiro, Azalea Kamellia, Bob Lalasz, Dan Morris, Debora Pignatari Drucker, Diana Anthony, Erik Lindquist, Gregoire Dubois, Holly Grimm, Karin Tuxen-Bettman, Kirsten Menger-Anderson, Johanna Prüssmann, Mayra Milkovic, Nasser Olwero, Nicholas Clinton, Sophie Galloway, Tanya Birch, and Tyler Erickson.

Add the first post in this thread.

Want to share your own conservation tech experiences and expertise with our growing global community? Login or register to start posting!

article / 14 November 2022