Sat 8, Spotlight I (w/coffee)

10:00 – 10:25
Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification

ABSTRACT. Modern face recognition systems leverage datasets containing images of hundreds of thousands of specific individuals’ faces to train deep convolutional neural networks to learn an embedding space that maps an arbitrary individual’s face to a vector representation of their identity. The performance of a face recognition system in face verification (1:1) and face identification (1:N) tasks is directly related to the ability of an embedding space to discriminate between identities. Recently, there has been significant public scrutiny into the source and privacy implications of large-scale face recognition training datasets such as MS-Celeb-1M and MegaFace, as many people are uncomfortable with the idea of their face being used to train dual-use technologies such as face recognition systems. However, the actual impact of an individual’s inclusion in such a dataset on a derived system’s ability to recognize them has not previously been studied. In this work, we audit ArcFace, a state-of-the-art, open source face recognition system, in a large-scale face identification experiment with more than one million distractor images. We find a rank-1 face identification accuracy of 79.71% for individuals present in the model’s training data and an accuracy of 75.73% for those not present. This modest difference in accuracy demonstrates that modern face recognition systems are biased towards individuals they are trained on, which has serious privacy implications when one considers that all major open-source face recognition training datasets do not gather informed consent from individuals during their collection.

Monitoring Misuse for Accountable ‘Artificial Intelligence as a Service’

ABSTRACT. AI is increasingly being offered ‘as a service’ (AIaaS). This entails service providers offering customers access to pre-built models, for tasks such as object recognition, text translation, text-to-voice conversion, and facial recognition, to name a few. The offerings enable customers to easily integrate a range of powerful ML-driven capabilities into their applications. Customers access these models through the provider’s APIs, sending particular data to which the model is applied, and results returned.

However, there are many situations in which the use of ML can be problematic. AIaaS services typically represent generic functionality, available to customers at ‘a few clicks’. Providers may therefore, for reasons of reputation or responsibility, seek to ensure that the AIaaS services they offer are being used by customers for ‘appropriate’ purposes.

This paper introduces and explores a concept in which AIaaS providers uncover situations of possible service misuse by their customers. Illustrated through topical examples, we consider the technical usage patterns that could signal situations warranting scrutiny, and raise some of the legal and technical challenges of monitoring for misuse. In all, by introducing this concept, we indicate a potential area for further inquiry from a range of perspectives.

Steps Towards Value-Aligned Systems

ABSTRACT. Algorithmic (including AI/ML) decision-making artifacts are an established and growing part of our decision-making ecosystem. They have become near-indispensable as tools to help manage the flood of information we need to make timely effective decisions in an increasingly complex world. The current literature is awash with examples of how individual artifacts violate societal norms and expectations (e.g. violations of fairness, privacy, or safety norms). Against this backdrop, we highlight the need for principled frameworks for assessing value misalignment in AI-equipped sociotechnical systems. One trend in research explorations of value misalignment in artifacts is the focus on the behavior of singular tech artifacts. In this discussion, we outline and argue for a more structured systems-level approach for assessing value-alignment in sociotechnical systems. The discussion focuses primarily on fairness audits. We use the opportunity to highlight how adopting a system perspective improves our ability to explain and address value misalignments better. Our discussion ends with an exploration of priority questions that demand attention if we are to assure the value alignment of whole systems, not just individual artifacts.

Social and Governance Implications of Improved Data Efficiency

ABSTRACT. Many researchers work on improving the data efficiency of machine learning. What would happen if they succeed? This paper explores the social-economic impact of increased data efficiency. Specifically, we examine the intuition that data efficiency will erode the barriers to entry protecting incumbent data-rich AI firms, exposing them to more competition from data-poor firms. We find that this intuition is only partially correct: data efficiency makes it easier to create ML applications, but large AI firms may have more to gain from higher performing AI systems. Further, we find that the effect on privacy, data markets, robustness, and misuse are complex. For example, while it seems intuitive that misuse risk would increase along with data efficiency — as more actors gain access to any level of capability — the net effect crucially depends on how much defensive measures are improved. More investigation into data efficiency, as well as research into the “AI production function”, will be key to understanding the development of the AI industry and its societal impacts.

Hard Choices in Artificial Intelligence: Addressing Normative Uncertainty through Sociotechnical Commitments

ABSTRACT. As AI systems become prevalent in high stakes domains such as surveillance and healthcare, researchers now examine how to design and implement them in a safe manner. However, the potential harms caused by systems to stakeholders in complex social contexts and how to address these remains unclear. In this paper, we explain the inherent normative uncertainty in debates about the safety of AI systems. We then address this as a problem of vagueness by examining its place in the design, training, and deployment stages of AI system development. We adopt Ruth Chang’s theory of intuitive comparability to illustrate the dilemmas that manifest at each stage. We then discuss how stakeholders can navigate these dilemmas by incorporating distinct forms of dissent into the development pipeline, drawing on Elizabeth Anderson’s work on the epistemic powers of democratic institutions. We outline a framework of sociotechnical commitments to formal, substantive and discursive challenges that address normative uncertainty across stakeholders, and propose the cultivation of related virtues by those responsible for development.

Auditing Algorithms: On Lessons Learned and the Risks of Data Minimization

ABSTRACT. In this paper, we present the Algorithmic Impact Assessment (AIA) of personalized wellbeing recommendations delivered through Telefónica Alpha’s app REM!X. The main goal of the AIA was to identify potential algorithmic biases in the recommendations that could lead to the discrimination of protected groups. The assessment was conducted through a qualitative methodology that included five focus groups with developers and a digital ethnography relying on users comments reported in the Google Play Store. To minimize the collection of personal information, as required by best practice and the GDPR, the REM!X app did not collect gender, age, race, religion, or other protected attributes from its users. This limited the algorithmic assessment and the ability to control for different algorithmic biases. Nevertheless, based on indirect evidence, the AIA identified four hypothetical domains that put the levels of fairness and discrimination at risk. Our analysis provided important insights about the impact of color blindness on algorithmic audit and transparency, and how to address it

Conservative Agency via Attainable Utility Preservation

ABSTRACT. Reward functions are easy to misspecify; although designers can make corrections after observing mistakes, an agent pursuing a misspecified reward function can irreversibly change the state of its environment. If that change precludes optimization of the correctly specified reward function, then correction is futile. For example, a robotic factory assistant could break expensive equipment due to a reward misspecification; even if the designers immediately correct the reward function, the damage is done. To mitigate this risk, we introduce an approach that balances optimization of the primary reward function with preservation of the ability to optimize auxiliary reward functions. Surprisingly, even when the auxiliary reward functions are randomly generated and therefore uninformative about the correctly specified reward function, this approach induces conservative, effective behavior.

Deepfake for Medical Video De-Identification: Privacy Protection and Diagnostic Information Preservation

ABSTRACT. Data sharing for medical research has been difficult as open-sourcing clinical data may violate patient privacy. Creating openly available datasets on medical videos, especially videos where faces are necessary for diagnosis, is infeasible unless the ethical requirements are met. Traditional face de-identification methods wipe out facial information entirely, making it impossible to analyze facial behavior. Recent advancements on whole-body keypoints detection also rely on facial input to estimate body keypoints. Both facial and body keypoints are critical in some medical diagnoses, and keypoints invariability after de-identification is of great importance. Here, we propose a solution using deepfakes, the face swapping technique. While this swapping method has been criticized for invading privacy and portraiture right, it could conversely protect privacy in medical video: patients’ faces could be swapped to a proper target face and become unrecognizable. However, it remains an open question that to what extent the swapping de-identification method affects the automatic detection of body keypoints. In this study, we apply deepfake technique to Parkinson’s Disease examination videos to de-identify subjects, and quantitatively show that: face-swapping as a de-identification approach is reliable, and it keeps the keypoints almost invariant, significantly better than traditional methods. This study proposes a pipeline for video de-identification and keypoint preservation, clearing up ethical restrictions for medical data sharing. This work could make open source high quality medical video datasets more feasible and promote future medical research that benefits our society.

Adoption Dynamics and Societal Impact of AI Systems in Complex Networks

ABSTRACT. We propose a game-theoretical model to simulate the dynamics of AI adoption on scale-free networks with and without link rewiring. This formalism allows us to understand the impact of the adoption of AI systems for society as a whole, addressing some of the concerns on the need for regulation. Using this model we study the adoption of AI systems, the distribution of the different types of AI (from selfish to utilitarian), the appearance of clusters of specific AI types, and the impact on the fitness of each individual. We suggest that the entangled evolution of individual strategy and network structure constitutes a key mechanism for the sustainability of utilitarian and human-conscious AI. Differently, in the absence of rewiring, a minority of the population can easily foster the adoption of selfish AI and gains a benefit at the expense of the remaining majority.

Proposal for Type Classification for Building Trust in Medical Artificial Intelligence Systems

ABSTRACT. This paper proposes the establishment of “Medical Artificial Intelligence (AI) Types (MA Types)” that classify AI in medicine not only by technical system requirements but also implications to healthcare workers’ roles and us-ers/patients. MA Types can be useful to promote discussion regarding the purpose and application of the clinical site. Although MA Types are based on the current technologies and regulations in Japan, but that does not hinder the potential reform of the technologies and regulations. MA Types aims to facilitate discussions among physicians, healthcare workers, engineers, public/patients and policymakers on AI systems in medical practices.

Balancing the Tradeoff Between Clustering Value and Interpretability

ABSTRACT. Graph clustering groups entities — the vertices of a graph — based on their similarity, typically using a complex distance function over a large number of features. Successful integration of clustering approaches in automated decision-support systems hinges on the interpretability of the resulting clusters. This paper addresses the problem of generating interpretable clusters, given features of interest that signify interpretability to an end-user, by optimizing interpretability in addition to common clustering objectives. We propose a $\beta-$interpretable clustering algorithm that ensures that at least $\beta$ fraction of nodes in each cluster share the same feature value. The tunable parameter $\beta$ is user-specified. We also present a more efficient algorithm for scenarios with $\beta=1$ and analyze the theoretical guarantees of the two algorithms. Finally, we empirically evaluate our approaches using four real-world datasets. The interpretability of the clusters is complemented by generating explanations in the form of labels denoting the feature values of the nodes in the clusters, using frequent pattern mining.

Contextual Analysis of Social Media: The Promise and Challenge of Eliciting Context in Social Media Posts with Natural Language Processing

ABSTRACT. While natural language processing affords researchers an opportunity to automatically scan millions of social media posts, there is growing concern that automated computational tools lack the ability to understand context and nuance in human communication and language. This article introduces a critical systematic approach for extracting culture, context, and nuance in social media data. The Contextual Analysis of Social Media (CASM) approach considers and critiques the gap between inadequacies in natural language processing tools and differences in geographic, cultural, and age-related variance of social media use and communication. CASM utilizes a team-based approach to the analysis of social media data, explicitly informed by community expertise. We use of CASM to analyze Twitter posts from gang-involved youth in Chicago. We designed a set of experiments to evaluate the performance of a support vector machine us-ing CASM hand-labeled posts against a distant model. We found that the CASM-informed hand-labeled data outperforms the baseline distant labels, indicating that the CASM labels capture additional dimensions of information that content-only methods lack. We then question whether this is helpful or harmful for gun violence prevention.