You are browsing the site of a past edition of the AIES conference (2021). Navigate to present edition here.

Poster session 3

Thu 20 11:00 PDT Thu 20 12:00 PDT

#235 Are AI Ethics Conferences Different and More Diverse Compared to Traditional Computer Science Conferences?

Daniel Acuna, Lizhen Liang

Even though computer science (CS) has had a historical lack of gender and race representation, its AI research affects everybody eventually. Being partially rooted in CS conferences, “AI ethics” (AIE) conferences such as FAccT and AIES have quickly become distinct venues where AI’s societal implications are discussed and solutions proposed. However, it is largely unknown if these conferences improve upon the historical representational issues of traditional CS venues. In this work, we explore AIE conferences’ evolution and compare them across demographic characteristics, publication content, and citation patterns. We find that AIE conferences have increased their internal topical diversity and impact on other CS conferences. Importantly, AIE conferences are highly differentiable, covering topics not represented in other venues. However, and perhaps contrary to the field’s aspirations, white authors are more common while seniority and black researchers are represented similarly to CS venues. Our results suggest that AIE conferences could increase efforts to attract more diverse authors, especially considering their sizable roots in CS.

#63 Person, Human, Neither: The Dehumanization Potential of Automated Image Tagging

Pınar Barlas, Kyriakos Kyriakou, Styliani Kleanthous, Jahna Otterbacher

Following the literature on dehumanization via technology, we audit six proprietary image tagging algorithms (ITAs) for their potential to perpetuate dehumanization. We examine the ITAs’ outputs on a controlled dataset of images depicting a diverse group of people for tags that indicate the presence of a human in the image. Through an analysis of the (mis)use of these tags, we find that there are some individuals whose ‘humanness’ is not recognized by an ITA, and that these individuals are often from marginalized social groups. Finally, we compare these findings with the use of the ‘face’ tag, which can be used for surveillance, revealing that people’s faces are often recognized by an ITA even when their ‘humanness’ is not. Overall, we highlight the subtle ways in which ITAs may inflict widespread, disparate harm, and emphasize the importance of considering the social context of the resulting application.

#50 Explainable AI and Adoption of Financial Algorithmic Advisors: An Experimental Study

Daniel Ben David, Yehezkel Resheff, Talia Tron

We study whether receiving advice from either a human or algorithmic advisor, accompanied by five types of Local and Global explanation labelings, has an effect on the readiness to adopt, willingness to pay, and trust in a financial AI consultant. We compare the differences over time and in various key situations using a unique experimental framework where participants play a web-based game with real monetary consequences. We observed that accuracy-based explanations of the model in initial phases leads to higher adoption rates. When the performance of the model is immaculate, there is less importance associated with the kind of explanation for adoption. Using more elaborate feature-based or accuracy-based explanations helps substantially in reducing the adoption drop upon model failure. Furthermore, using an autopilot increases adoption significantly. Participants assigned to the AI-labeled advice with explanations were willing to pay more for the advice than the AI-labeled advice with "No-explanation" alternative. These results add to the literature on the importance of XAI for algorithmic adoption and trust.

#74 Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty

Umang Bhatt, Javier Antoran, Yunfeng Zhang, Vera Liao, Prasanna Sattigeri, Riccardo Fogliato, Gabrielle Gauthier Melançon, Ranganath Krishnan, Jason Stanley, Omesh Tickoo, Lama Nachman, Rumi Chunara, Madhulika Srikumar, Adrian Weller, Alice Xiang

Algorithmic transparency entails exposing system properties to various stakeholders for purposes that include understanding, improving, and contesting predictions. Until now, most research into algorithmic transparency has predominantly focused on explainability. Explainability attempts to provide reasons for a machine learning model’s behavior to stakeholders. However, understanding a model’s specific behavior alone might not be enough for stakeholders to gauge whether the model is wrong or lacks sufficient knowledge to solve the task at hand. In this paper, we argue for considering a complementary form of transparency by estimating and communicating the uncertainty associated with model predictions. First, we discuss methods for assessing uncertainty. Then, we characterize how uncertainty can be used to mitigate model unfairness, augment decision-making, and build trustworthy systems. Finally, we outline methods for displaying uncertainty to stakeholders and recommend how to collect information required for incorporating uncertainty into existing ML pipelines. This work constitutes an interdisciplinary review drawn from literature spanning machine learning, visualization/HCI, design, decision-making, and fairness. We aim to encourage researchers and practitioners to measure, communicate, and use uncertainty as a form of transparency.

#103 Fairness and Machine Fairness

Clinton Castro, David O’Brien, Ben Schwan

Prediction-based decisions, which are often made by utilizing the tools of machine learning, influence nearly all facets of modern life. Ethical concerns about this widespread practice have given rise to the field of fair machine learning and a number of fairness measures, mathematically precise definitions of fairness that purport to determine whether a given prediction-based decision system is fair. Following Reuben Binns (2017), we take “fairness” in this context to be a placeholder for a variety of normative egalitarian considerations. We explore a few fairness measures to suss out their egalitarian roots and evaluate them, both as formalizations of egalitarian ideas and as assertions of what fairness demands of predictive systems. We pay special attention to a recent and popular fairness measure, counterfactual fairness, which holds that a prediction about an individual is fair if it is the same in the actual world and any counterfactual world where the individual belongs to a different demographic group (cf. Kusner et al. 2018).

#253 Reconfiguring Diversity and Inclusion for AI Ethics

Nicole Chi, Emma Lurie, Deirdre K. Mulligan

Activists, journalists, and scholars have long raised critical questions about the relationship between diversity, representation, and structural exclusions in data-intensive tools and services. We build on work mapping the emergent landscape of corporate AI ethics to center one outcome of these conversations: the incorporation of diversity and inclusion in corporate AI ethics activities. Using interpretive document analysis and analytic tools from the values in design field, we examine how diversity and inclusion work is articulated in public-facing AI ethics documentation produced by three companies that create application and services layer AI infrastructure: Google, Microsoft, and Salesforce. We find that as these documents make diversity and inclusion more tractable to engineers and technical clients, they reveal a drift away from civil rights justifications that resonates with the “managerialization of diversity” by corporations in the mid-1980s. The focus on technical artifacts — such as diverse and inclusive datasets — and the replacement of equity with fairness make ethical work more actionable for everyday practitioners. Yet, they appear divorced from broader DEI initiatives and relevant subject matter experts that could provide needed context to nuanced decisions around how to operationalize these values and new solutions. Finally, diversity and inclusion, as configured by engineering logic, positions firms not as “ethics owners” but as ethics allocators; while these companies claim expertise on AI ethics, the responsibility of defining who diversity and inclusion are meant to protect and where it is relevant is pushed downstream to their customers.

#14 Designing Shapelets for Interpretable Data-Agnostic Classification

Riccardo Guidotti, Anna Monreale

Time series shapelets are discriminatory subsequences which are representative of a class, and their similarity to a time series can be used for successfully tackling the time series classification problem. The literature shows that Artificial Intelligence (AI) systems adopting classification models based on time series shapelets can be interpretable, more accurate, and significantly fast. Thus, in order to design a data-agnostic and interpretable classification approach, in this paper we first extend the notion of shapelets to different types of data, i.e., images, tabular and textual data. Then, based on this extended notion of shapelets we propose an interpretable data-agnostic classification method. Since the shapelets discovery can be time consuming, especially for data types more complex than time series, we exploit a notion of prototypes for finding candidate shapelets, and reducing both the time required to find a solution and the variance of shapelets. A wide experimentation on datasets of different types shows that the data-agnostic prototype-based shapelets returned by the proposed method empower an interpretable classification which is also fast, accurate, and stable. In addition, we show and we prove that shapelets can be at the basis of explainable AI methods.

#67 Who Gets What, According to Whom? An Analysis of Fairness Perceptions in Service Allocation

Jacqueline Hannan, Huei-Yen Winnie Chen, Kenneth Joseph

Algorithmic fairness research has traditionally been linked to the disciplines of philosophy, ethics, and economics, where notions of fairness are prescriptive and seek objectivity. Increasingly, however, scholars are turning to the study of what different people perceive to be fair, and how these perceptions can or should help to shape the design of machine learning, particularly in the policy realm. The present work experimentally explores five novel research questions at the intersection of the "Who," "What," and "How" of fairness perceptions. Specifically, we present the results of a multi-factor conjoint analysis study that quantifies the effects of the specific context in which a question is asked, the framing of the given question, and who is answering it. Our results broadly suggest that the "Who" and "What," at least, matter in ways that are 1) not easily explained by any one theoretical perspective, 2) have critical implications for how perceptions of fairness should be measured and/or integrated into algorithmic decision-making systems.

#161 Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End

Ramaravind Kommiya Mothilal, Divyat Mahajan, Chenhao Tan, Amit Sharma

Feature attributions and counterfactual explanations are popular approaches to explain a ML model. The former assigns an importance score to each input feature, while the latter provides input examples with minimal changes to alter the model’s predictions. To unify these approaches, we provide an interpretation based on the actual causality framework and present two key results in terms of their use. First, we present a method to generate feature attribution explanations from a set of counterfactual examples. These feature attributions convey how important a feature is to changing the classification outcome of a model, especially on whether a subset of features is \textit{necessary} and/or \textit{sufficient} for that change, which attribution-based methods are unable to provide. Second, we show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency. As a result, we highlight the complementarity of these two approaches.Our evaluation on three benchmark datasets — \adult, \lclub, and \german~— confirms the complementarity. Feature attribution methods like LIME and SHAP and counterfactual explanation methods like \wachtershort and DiCE often do not agree on feature importance rankings. In addition, by restricting the features that can be modified for generating counterfactual examples, we find that the top-k features from LIME or SHAP are often neither necessary nor sufficient explanations of a model’s prediction. Finally, we present a case study of different explanation methods on a real-world hospital triage problem.

#136 Measuring Group Advantage: A Comparative Study of Fair Ranking Metrics

Caitlin Kuhlman, Walter Gerych, Elke Rundensteiner

Ranking evaluation metrics play an important role in information retrieval, providing optimization objectives during development and means of assessment of deployed performance. Recently, {\it fairness of rankings} has been recognized as crucial, especially as automated systems are increasingly used for high impact decisions. While numerous fairness metrics have been proposed, a comparative analysis to understand their interrelationships is lacking. Even for fundamental statistical parity metrics which measure group advantage,it remains unclear whether metrics measure the same phenomena, or whenone metric may produce different results than another. To address these open questions, we formulate a conceptual framework for analytical comparison of metrics.We prove that under reasonable assumptions, popular metrics in the literature exhibit the same behavior andthat optimizing for one optimizes for all. However, our analysis also shows that the metrics vary in the degree of unfairness measured, in particular when one group has a strong majority. Based on this analysis, we design a practical statistical test to identify whether observed data is likely to exhibit predictable group bias. We provide a set of recommendations for practitioners to guide the choice of an appropriate fairness metric.

#141 A Framework for Understanding AI-Induced Field Change: How AI Technologies Are Legitimized and Institutionalized

Benjamin Larsen

Artificial intelligence (AI) systems operate in increasingly diverse areas, from healthcare to facial recognition, the stock market, autonomous vehicles, and so on. While the underlying digital infrastructure of AI systems is developing rapidly, each area of implementation is subject to different degrees and processes of legitimization. By combining elements from institutional theory and information systems-theory, this paper presents a conceptual framework to analyze and understand AI-induced field-change. The introduction of novel AI-agents into new or existing fields creates a dynamic in which algorithms (re)shape organizations and institutions while existing institutional infrastructures determine the scope and speed at which organizational change is allowed to occur. Where institutional infrastructure and governance arrangements, such as standards, rules, and regulations, still are unelaborate, the field can move fast but is also more likely to be contested. The institutional infrastructure surrounding AI-induced fields is generally little elaborated, which could be an obstacle to the broader institutionalization of AI-systems going forward.

#270 Participatory Algorithmic Management: Elicitation Methods for Worker Well-Being Models

Min Kyung Lee, Ishan Nigam, Angie Zhang, Joel Afriyie, Zhizhen Qin, Sicun Gao

Artificial intelligence is increasingly being used to manage the workforce. Algorithmic management promises organizational efficiency, but often undermines worker well-being. How can we computationally model worker well-being so that algorithmic management can be optimized for and assessed in terms of worker well-being? Toward this goal, we propose a participatory approach for worker well-being models. We first define worker well-being models: Work preference models—preferences about work and working conditions, and managerial fairness models—beliefs about fair resource allocation among multiple workers. We then propose elicitation methods to enable workers to build their own well-being models leveraging pairwise comparisons and ranking. As a case study, we evaluate our methods in the context of algorithmic work scheduling with 25 shift workers and 3 managers. The findings show that workers expressed idiosyncratic work preference models and more uniform managerial fairness models, and the elicitation methods helped workers discover their preferences and gave them a sense of empowerment. Our work provides a method and initial evidence for enabling participatory algorithmic management for worker well-being.

#114 Feeding the Beast: Superintelligence, Corporate Capitalism and the End of Humanity

Dominic Leggett

Scientists and philosophers have warned of the possibility that humans, in the future, might create a ‘superintelligent’ machine that could, in some scenarios, form an existential threat to humanity. This paper argues that such a machine may already exist, and that, if so, it does, in fact, represent such a threat.

#280 Towards Accountability in the Use of Artificial Intelligence for Public Administrations

Michele Loi, Matthias Spielkamp

We argue that the phenomena of distributed responsibility, induced acceptance, and acceptance through ignorance constitute instances of imperfect delegation when tasks are delegated to computationally-driven systems. Imperfect delegation challenges human accountability. We hold that both direct public accountability via public transparency and indirect public accountability via transparency to auditors in public organizations can be both instrumentally ethically valuable and required as a matter of deontology from the principle of democratic self-government. We analyze the regulatory content of 16 guideline documents about the use of AI in the public sector, by mapping their requirements to those of our philosophical account of accountability, and conclude that while some guidelines refer processes that amount to auditing, it seems that the debate would benefit from more clarity about the nature of the entitlement of auditors and the goals of auditing, also in order to develop ethically meaningful standards with respect to which different forms of auditing can be evaluated and compared.

#210 Unpacking the Expressed Consequences of AI Research in Broader Impact Statements

Priyanka Nanayakkara, Jessica Hullman, Nicholas Diakopoulos

The computer science research community and the broader public have become increasingly aware of negative consequences of algorithmic systems. In response, the top-tier Neural Information Processing Systems (NeurIPS) conference for machine learning and artificial intelligence research required that authors include a statement of broader impact to reflect on potential positive and negative consequences of their work. We present the results of a qualitative thematic analysis of a sample of statements written for the 2020 conference. The themes we identify broadly fall into categories related to how consequences are expressed (e.g., valence, specificity, uncertainty), areas of impacts expressed (e.g., bias, the environment, labor, privacy), and researchers’ recommendations for mitigating negative consequences in the future. In light of our results, we offer perspectives on how the broader impact statement can be implemented in future iterations to better align with potential goals.

#121 The Deepfake Detection Dilemma: A Multistakeholder Exploration of Adversarial Dynamics in Synthetic Media

Aviv Ovadya, Sean McGregor, Claire Leibowicz

Synthetic media detection technologies label media as either synthetic or non-synthetic and are increasingly used by journalists, web platforms, and the general public to identify misinformation and other forms of problematic content. As both well-resourced organizations and the non-technical general public generate more sophisticated synthetic media, the capacity for purveyors of problematic content to adapt induces a \newterm{detection dilemma}: as detection practices become more accessible, they become more easily circumvented. This paper describes how a multistakeholder cohort from academia, technology platforms, media entities, and civil society organizations active in synthetic media detection and its socio-technical implications evaluates the detection dilemma. Specifically, we offer an assessment of detection contexts and adversary capacities sourced from the broader, global AI and media integrity community concerned with mitigating the spread of harmful synthetic media. A collection of personas illustrates the intersection between unsophisticated and highly-resourced sponsors of misinformation in the context of their technical capacities. This work concludes that there is no “best” approach to navigating the detector dilemma, but derives a set of implications from multistakeholder input to better inform detection process decisions and policies, in practice. \end{abstract}

#38 Fair Bayesian Optimization

Valerio Perrone, Michele Donini, Muhammad Bilal Zafar, Robin Schmucker, Krishnaram Kenthapadi, Cedric Archambeau

Fairness and robustness in machine learning are crucial when individuals are subject to automated decisions made by models in high-stake domains. To promote ethical artificial intelligence, fairness metrics that rely on comparing model error rates across subpopulations have been widely investigated for the detection and mitigation of bias. However, fairness measures that rely on comparing the ability to achieve recourse have been relatively unexplored. In this paper, we present a novel formulation for training neural networks that considers the distance of data observations to the decision boundary such that the new objective: (1) reduces the disparity in the average ability of recourse between individuals in each protected group, and (2) increases the average distance of data points to the boundary to promote adversarial robustness. We demonstrate that models trained with this new objective are more fair and adversarially robust neural networks, with similar accuracies, when compared to models without it. We also investigate a trade-off between the recourse-based fairness and robustness objectives. Moreover, we qualitatively motivate and empirically show that reducing recourse disparity across protected groups also improves fairness measures that rely on error rates. To the best of our knowledge, this is the first time that recourse disparity across groups are considered to train fairer neural networks.

#187 Measuring Model Fairness under Noisy Covariates: A Theoretical Perspective

Flavien Prost, Pranjal Awasthi, Alex Beutel, Jilin Chen, Nick Blumm, Ed H. Chi, Li Wei, Aditee Kumthekar, Trevor Potter, Xuezhi Wang

In this work we study the problem of measuring the fairness of a machine learning model under noisy information. Focusing on group fairness metrics, we investigate the particular but common situation when the evaluation requires controlling for the confounding effect of covariate variables. In a practical setting, we might not be able to jointly observe the covariate and group information, and a standard workaround is to then use proxies for one or more of these variables. Prior works have demonstrated the challenges with using a proxy for sensitive attributes, and strong independence assumptions are needed to provide guarantees on the accuracy of the noisy estimates. In contrast, in this work we study using a proxy for the covariate variable and present a theoretical analysis that aims to characterize weaker conditions under which accurate fairness evaluation is possible.Furthermore, our theory identifies potential sources of errors and decouples them into two interpretable parts $\epsobs$ and $\epsun$. The first part $\epsobs$ depends solely on the performance of the proxy such as precision and recall, whereas the second part $\epsun$ captures correlations between all the variables of interest.We show that in many scenarios the error in the estimates is dominated by $\epsobs$ via a linear dependence, whereas the dependence on the correlations $\epsun$ only constitutes a lower order term. As a result we expand the understanding of scenarios where measuring model fairness via proxies can be an effective approach. Finally, we compare, via simulations, the theoretical upper-bounds to the distribution of simulated estimation errors and show that assuming some structure on the data, even weak, is key to significantly improve both theoretical guarantees and empirical results.

#151 A Step Toward More Inclusive People Annotations for Fairness

Candice Schumann, Susanna Ricco, Utsav Prabhu, Vittorio Ferrari, Caroline Pantofaru

The Open Images Dataset contains approximately 9 million images and is a widely accepted dataset for computer vision research. As is common practice for large datasets, the annotations are not exhaustive, with bounding boxes and attribute labels for only a subset of the classes in each image. In this paper, we present a new set of annotations on a subset of the Open Images dataset called the MIAP (More Inclusive Annotations for People) subset, containing bounding boxes and attributes for all of the people visible in those images. The attributes and labeling methodology for the MIAP subset were designed to enable research into model fairness. In addition, we analyze the original annotation methodology for the person class and its subclasses, discussing the resulting patterns in order to inform future annotation efforts. By considering both the original and exhaustive annotation sets, researchers can also now study how systematic patterns in training annotations affect modeling.

#226 FaiR-N: Fair and Robust Neural Networks for Structured Data

Shubham Sharma, Alan Gee, David Paydarfar, Joydeep Ghosh

This paper presents a fairness principle that can be used to evaluate decision-making based on predictions. We propose that a decision rule for decision-making based on predictions is fair when the individuals directly subjected to the implications of the decision enjoy fair equality of chances. We define fair equality of chances as obtaining if and only if the individuals who are equal with respect to the features that justify inequalities in outcomes have the same statistical prospects of being benefited or harmed, irrespective of their morally irrelevant traits. The paper characterizes – in a formal way – the way in which luck is allowed to impact outcomes in order for its influence to be fair. This fairness principle can be used to evaluate decision-making based on predictions, a kind of decision- making that is becoming increasingly important to theorize around in light of the growing prevalence of algorithmic decision-making in healthcare, the criminal justice system, and the insurance industry, among other areas. It can be used to evaluate decision-making rules based on different normative theories and is compatible with the broadest range of normative views according to which inequalities due to brute luck can be fair.

#32 Machine Learning and the Meaning of Equal Treatment

Joshua Simons, Sophia Adams Bhatti, Adrian Weller

Approaches to non-discrimination are generally informed by two principles: striving for equality of treatment, and advancing various notions of equality of outcome. We consider when and why there are trade-offs in machine learning between respecting formalistic interpretations of equal treatment and advancing equality of outcome. Exploring a hypothetical discrimination suit against Facebook, we argue that interpretations of equal treatment which require blindness to difference may constrain how machine learning can be deployed to advance equality of outcome. When machine learning models predict outcomes that are unevenly distributed across racial groups, using those models to advance racial justice will often require deliberately taking race into account.We then explore the normative stakes of this tension. We describe three pragmatic policy options underpinned by distinct interpretations and applications of equal treatment. A status quo approach insists on blindness to difference, permitting the design of machinelearning models that compound existing patterns of disadvantage. An industry-led approach would specify a narrow set of domains in which institutions were permitted to use protected characteristics to actively reduce inequalities of outcome. A government-led approach would impose positive duties that require institutions to consider how best to advance equality of outcomes and permit the use of protected characteristics to achieve that goal. We argue that while machine learning offers significant possibilities for advancing racial justice and outcome-based equality, harnessing those possibilities will require a shift in the normative commitments that underpin the interpretation and application of equal treatment in non-discrimination law and the governance of machine learning.

#175 Comparing Equity and Effectiveness of Different Algorithms in an Application for the Room Rental Market

David Solans, Francesco Fabbri, Caterina Calsamiglia, Carlos Castillo, Francesco Bonchi

Machine Learning (ML) techniques have been increasingly adopted by the real estate market in the last few years. Applications include, among many others, predicting the market value of a property or an area, advanced systems for managing marketing and ads campaigns, and recommendation systems based on user preferences. While these techniques can provide important benefits to the business owners and the users of the platforms, algorithmic biases can result in inequalities and loss of opportunities for groups of people who are already disadvantaged in their access to housing.In this work, we present a comprehensive and independent algorithmic evaluation of a recommender system for the real estate market, designed specifically for finding shared apartments in metropolitan areas. We were granted full access to the internals of the platform, including details on algorithms and usage data during a period of 2 years.We analyze the performance of the various algorithms which are deployed for the recommender system and asses their effect across different population groups.Our analysis reveals that introducing a recommender system algorithm facilitates finding an appropriate tenant or a desirable room to rent, but at the same time, it strengthen performance inequalities between groups, further reducing opportunities of finding a rental for certain minorities.

#259 Differentially Private Normalizing Flows for Privacy-Preserving Density Estimation

Chris Waites, Rachel Cummings

Normalizing flow models have risen as a popular solution to the problem of density estimation, enabling high-quality synthetic data generation as well as exact probability density evaluation. However, in contexts where individuals are directly associated with the training data, releasing such a model raises privacy concerns. In this work, we propose the use of normalizing flow models that provide explicit differential privacy guarantees as a novel approach to the problem of privacy-preserving density estimation. We evaluate the efficacy of our approach empirically using benchmark datasets, and we demonstrate that our method substantially outperforms previous state-of-the-art approaches. We additionally show how our algorithm can be applied to the task of differentially private anomaly detection.

#88 Who’s Responsible? Jointly Quantifying the Contribution of the Learning Algorithm and Data

Gal Yona, Amirata Ghorbani, James Zou

A learning algorithm A trained on a dataset D is revealed to have poor performance on some subpopulation at test time. Where should the responsibility for this lay? It can be argued that the data is responsible, if for example training A on a more representative dataset D’ would have improved the performance. But it can similarly be argued that A itself is at fault, if training a different variant A’ on the same dataset D would have improved performance. As ML becomes widespread and such failure cases more common, these types of questions are proving to be far from hypothetical. With this motivation in mind, in this work we provide a rigorous formulation of the joint credit assignment problem between a learning algorithm A and a dataset D. We propose Extended Shapley as a principled framework for this problem, and experiment empirically with how it can be used to address questions of ML accountability.

#46 RelEx: A Model-Agnostic Relational Model Explainer

Yue Zhang, David Defazio, Arti Ramesh

In recent years, considerable progress has been made on improving the interpretability of machine learning models. This is essential, as complex deep learning models with millions of parameters produce state of the art performance, but it can be nearly impossible to explain their predictions. While various explainability techniques have achieved impressive results, nearly all of them assume each data instance to be independent and identically distributed (iid). This excludes relational models, such as Statistical Relational Learning (SRL), and the recently popular Graph Neural Networks (GNNs), resulting in few options to explain them. While there does exist work on explaining GNNs, GNN-Explainer, they assume access to the gradients of the model to learn explanations, which is restrictive in terms of its applicability across non-differentiable relational models and practicality. In this work, we develop \emph{RelEx}, a \textit{model-agnostic} relational explainer to explain black-box relational models with only access to the outputs of the black-box. \emph{RelEx} is able to explain any relational model, including SRL models and GNNs. We compare \emph{RelEx} to the state-of-the-art relational explainer, GNN-Explainer, and relational extensions of iid explanation models and show that \emph{RelEx} achieves comparable or better performance, while remaining model-agnostic.