Academic Insights

Shaozeng Zhang: How Anthropological Methods Can Break Open the AI 'Black Box

January 28, 2026By The Future Anthropologists
Shaozeng Zhang: How Anthropological Methods Can Break Open the AI 'Black Box

By Future Anthropologist

Many people mistakenly believe that anthropology is merely a discipline for “studying the Other,” for seeking out “distant” or “exotic” cultural remnants. In fact, anthropology has never been synonymous with backwardness. It is not confined to the past but oriented toward the future. What it investigates is not “spectacle,” but the deep logic of human societies. Especially in an era of rapid development in technology, media, and globalization, anthropology is becoming active at the very forefront of contemporary life in unprecedented ways, with more and more anthropologists engaging in research on artificial intelligence.

Genevieve Bell is one of the most important contemporary anthropologists and technology experts. Her work spans anthropology, design, artificial intelligence, and future technological development. She is best known as Intel’s former Chief Anthropologist and is currently a professor at the Australian National University (ANU). She also founded the 3A Institute (Autonomy, Agency and Assurance Institute), dedicated to AI ethics and the governance of emerging technologies. Bell firmly believes that technology is not a neutral tool, but a product embedded in culture, society, history, and power relations. She emphasizes that understanding how humans use and imagine technology requires anthropological fieldwork methods and cultural insight.

At the University of Cambridge, anthropologist Beth Singler studies the “religious imagination of artificial intelligence”: why do people tend to deify AI? At MIT, anthropologist Stefan Helmreich examines “gender and power in AI voice recognition”: why are voice assistants so often female, and how does this relate to the cultural coding behind technology? At the École des Hautes Études en Sciences Sociales (EHESS) in France, anthropologist Dominique Boullier focuses on the “reorganization of human behavior in an algorithmic society.” When social media captures attention through recommendation systems, are our choices truly free? Algorithms shape how we understand the world and reorganize the public sphere.

Anthropologist Zhang Shaozeng at Oregon State University proposes renewing big data analysis from an anthropological perspective. Epistemologically, big data can be redefined through archaeology, and methodologically, anthropology’s comprehensive and rigorous approaches are what big data should draw from, helping to re-examine the issue of data “authenticity.”

Image 1

On July 12, the 2025 online lecture initiated by the “Future Anthropologists” project group invited Zhang Shaozeng to present a talk titled “Enhancing the Human in Artificial Intelligence: Participating in the Design of Machine Learning Algorithm Models with Anthropological Methods.” Zhang shared an interdisciplinary research project he has conducted over the past several years, which originated from a seemingly accidental yet symbolically meaningful encounter across disciplines.

In early 2019, Zhang met a professor specializing in artificial intelligence in computer science at the same university during a parent meeting at their children’s kindergarten. Although they came from different disciplinary backgrounds, casual conversation while waiting for the meeting led them to discover a shared concern about the ethical issues of artificial intelligence, which sparked the possibility of collaboration. In particular, when discussing the 2015 incident in which Google’s image recognition system mistakenly labeled two African Americans as “gorillas,” both resonated strongly with the problem of algorithmic bias in AI. On the one hand, the computer scientist felt deep regret over such technical failures; on the other hand, the anthropologist recognized that the incident was closely related to anthropology’s long-standing concern with the categories of “human” and “kind.” If artificial intelligence is a core technology shaping the future society in which their children will live, then as parents they also bear responsibility for designing better AI and shaping a better future society. This apparently accidental yet inspiring dialogue gave rise to the idea for a cross-disciplinary research project.

The two scholars subsequently applied jointly for the Early-Concept Grant for Exploratory Research (EAGER) under the U.S. National Science Foundation (NSF), aiming to explore how theoretical knowledge and methodological experience from both disciplines could be incorporated into the design process of artificial intelligence. The project officially launched in 2020 and lasted three years, with the core objective of developing more humanized AI systems. Notably, the term “humanized AI” did not originally come from the research design itself, but emerged later from participant feedback in user-experience evaluations, reflecting the public’s intuitive expectation for the integration of technology and the humanities.

The project team consisted of two principal investigators (from computer science and anthropology) and several graduate students, forming a research group grounded in interdisciplinary collaboration. Within this framework, the project not only focused on technical innovation, but also emphasized cross-disciplinary dialogue and methodological sharing, introducing ethical concerns and knowledge structures from the social sciences—especially anthropology—into AI research.

The project coincided with the global COVID-19 pandemic. Although the public health crisis imposed many constraints, because the research relied primarily on algorithm design and remote collaboration rather than traditional on-site ethnographic fieldwork, it was still able to proceed. In fact, the timing created opportunities for interdisciplinary online cooperation. Nevertheless, the early stage was not easy. Although both principal investigators were highly motivated, differences in research language, paradigms, and operational styles between anthropology and computer science posed challenges to effective collaboration. To address this, the team established an intensive collaboration rhythm, holding biweekly meetings from the outset and gradually breaking the research goals into operational sub-tasks through continuous discussion.

To further strengthen the methodological framework and interdisciplinary integration, the team also organized two rounds of expert consultations, inviting scholars with both anthropological and computational backgrounds as advisors. For example, Professor Paul Dourish, although based in a computer science department, has deep anthropological training; Dr. Melissa Cefkin, with a PhD in anthropology, has long worked in high-tech companies and previously served as Chief Scientist of Nissan’s autonomous driving team. Feedback from these experts positively influenced both the theoretical depth and practical orientation of the project.

At the methodological level, the research team reflexively clarified the basic structure and historical lineage of the two core concepts, “algorithm” and “artificial intelligence.” Although “AI” has become a highly popular term, many computer scientists prefer more specific technical terms such as machine learning and deep learning to avoid the ambiguities caused by overly broad concepts. The project focused on optimizing machine learning model design, especially on introducing anthropological ethical concerns and inductive logic into the entire process of algorithm training.

The team pointed out that machine learning is essentially a process of training, calibrating, and optimizing a base model through iterative cycles of data input and output. This process shares methodological principles with ethnographic approaches such as Grounded Theory, which follow a “data-driven, theory-generating” logic. In machine learning, data are fed into the model, outputs are observed, and the algorithm is adjusted; similarly, ethnographic research repeatedly collects field materials, induces meanings, and gradually builds concepts and theory. This shared logic provides a productive methodological foundation and dialogue space between anthropology and AI research.

Therefore, a basic starting point of the study was to rethink the role and agency of humans in algorithm construction. Instead of treating AI as a “black-box technology” detached from sociocultural contexts, the project advocates advancing more scientifically rigorous and socially grounded technological practices through methodological reflection and collaboration.

To address the tendency of machine learning models to fail under high uncertainty, the researchers designed and implemented an “abstention option.” The development of AI represents a new wave of automation—automation of human intellectual activity—so traditional AI systems naturally pursue the idea that “the algorithm must produce an output.” This project, however, moves in the opposite direction by introducing an “unable to judge” response into the model. When the system encounters insufficient information or marginal cases, it can acknowledge the limits of its judgment and pass such cases to human analysts. This strategy deepens the idea of human-in-the-loop, transforming AI from a closed, authoritative black box into an open, adjustable system of human–machine collaboration.

Correspondingly, the researchers optimized the algorithm’s user interface, enabling operators to intervene in and evaluate the system during training and decision-making, thereby increasing user control and trust. To further enhance social adaptability and user acceptance, the team conducted online user-experience studies in both China and the United States, exploring how users in different cultural contexts understand, prefer, and ethically evaluate AI systems. This process enriched the social-semantic layer of algorithmic output and provided a practical basis for subsequent fairness-weight adjustment strategies.

The study particularly emphasizes that “fairness” is not a single, stable technical parameter, but a value-laden and plural concept whose meaning varies greatly across philosophical traditions, social structures, and historical experiences. Therefore, instead of defining fairness statically, the researchers propose treating it as a variable weight that users can adjust according to their own definitions or application needs.

Image 1

In computer science, the human-in-the-loop mechanism has become a popular design strategy, but in practice it is often limited to late-stage user testing. This project seeks to extend the strategy structurally, foregrounding the “human” at multiple core stages such as data selection, algorithm modeling, and risk assessment. It highlights the subjective positions of AI system designers, the structural social biases embedded in data, and grants humans the capacity to interpret and intervene in machine-learning algorithms.

The research process also revealed both the challenges and potentials of interdisciplinary collaboration. For example, in communication with computer science colleagues, the researchers noticed that what social scientists call a “case” is often referred to as a “sample” in computing, whereas in social science a “sample” usually means a collection of cases obtained through scientific sampling. Such terminological differences initially caused communication barriers, but they also pushed collaborators to clarify epistemic boundaries and promoted deeper disciplinary integration.

At the model-design stage, the researchers further attempted to introduce fairness as a quantifiable weight into the algorithmic structure. This means that when the model encounters highly uncertain inputs or potential discriminatory risks, its judgment automatically triggers a cautious mechanism: lowering decision confidence or transferring the case to human intervention. This logic not only enhances the ethical controllability of the system, but also responds to real problems at the technology–society interface.

However, any attempt to “parameterize” fairness risks standardization and compression of complexity. Justice is expressed differently across philosophical, political, social, and cultural dimensions. Therefore, the authors argue that the operationalization of fairness should ultimately be defined and evaluated by specific user communities rather than imposed unilaterally by developers.

The study selected two widely used machine-learning datasets—the Adult Dataset and the COMPAS Dataset—and critically examined their historical origins and social contexts from an anthropological perspective. These datasets are frequently used in academic research and for AI training and benchmarking. The Adult Dataset originates from the 1994 U.S. Census and contains socioeconomic data from two counties in Florida, mainly used to predict whether an individual’s annual income exceeds $50,000. The COMPAS Dataset consists of questionnaires used in the criminal justice system to predict recidivism risk among released individuals.

Although such standard datasets are structured and reusable, they are deeply embedded in specific historical, cultural, and technical contexts. For example, the original COMPAS questionnaire includes items such as: “How hard is it for you to find a job above minimum wage compared to others?” While seemingly neutral and quantifiable, this question contains strong subjective judgments shaped by region, ethnicity, and structural social factors. Since the data were collected in the U.S. South in the 1990s—a region long marked by racial and economic tensions—the questionnaire likely encoded racial bias and structural inequality from the outset.

The researchers treat these datasets as human artifacts, and use path-tracking methods from Science and Technology Studies (STS) to reconstruct their formation processes. This approach borrows from ethnographic and archaeological analyses of material culture, aiming to expose the social relations and political metaphors already encoded in data—what can be called encoded bias. The study further shows that even benchmark datasets regarded as “objective” are constrained by historical, cultural, and regional conditions. Using them uncritically in AI design risks reproducing or amplifying structural bias.

Through collaboration with computer scientists, the team identified recurring hard cases during model training. For instance, three respondents with nearly identical backgrounds were classified differently by an income-prediction model, exposing uncertainty at decision boundaries. These cases motivated the introduction of the abstention option: when facing highly uncertain inputs, the model chooses “cannot predict” rather than forcing a decision. This reduces systemic error risk and creates a practical pathway for cooperation between AI and human experts.

Based on these insights, the team designed an operational user interface allowing end users to adjust between fairness and prediction accuracy. Early versions used a continuous slider, but user testing showed that tiered settings were more practical. The system therefore adopted graded options such as “high fairness” and “medium fairness” for contexts like bank-loan evaluation. This design breaks the traditional AI black box by visualizing parameters and outcomes and granting users limited intervention rights.

Final cross-cultural user tests in China and the United States showed that users perceived higher transparency and controllability, especially when dealing with socially sensitive issues such as race and distributive justice.

Overall, by integrating data archaeology, critical path tracking, explainable model design, and human-intervention mechanisms, the study demonstrates how anthropological methods can innovate AI design and enhance its ethical grounding.

During the discussion session, many scholars and students participated, including Huang Yu (Minzu University of China) and Xiong Zhiying (Hubei Minzu University).

Huang Yu raised an important issue about benchmarking datasets in AI. She noted that mainstream AI research relies heavily on benchmarking models to train and evaluate systems. This tradition comes from applied science, emphasizing predictive accuracy and efficiency while neglecting value dimensions such as fairness and justice. Historically, benchmarking is closely linked to U.S. commercial and military systems, for example in credit assessment. It prioritizes coverage and efficiency rather than ethical legitimacy. She referenced Ordered Society, which critiques how AI systems structure and manage information in ways that embed power logics. Through constant prediction and ranking, these systems shape social “order,” which is not always neutral or just.

Zhang Shaozeng acknowledged the early usefulness of benchmarks in standardizing and comparing AI systems. However, as AI matures, researchers increasingly realize that benchmark datasets are not neutral universal tools but situated instruments carrying historical, cultural, and political bias. He described them as traces of human behavior, produced by specific societies and power structures. In practice, this bias ranks different groups’ language, logic, and cognition.

Using large language models as an example, Zhang noted that AI often favors white middle-class English grammar and expression, marginalizing Black English, immigrant English, and non-standard accents. This reflects an implicit definition of “best expression.” Such bias may slide toward a form of new eugenics, privileging certain ways of thinking and speaking as more worthy of replication.

Zhang admitted that even interdisciplinary teams cannot easily define the “best” fairness. Therefore, the project adopts a “lazy method”: letting future user communities participate in defining fairness in AI systems. This respects social complexity—fairness standards cannot be detached from real contexts, class differences, and cultural backgrounds. He emphasized that addressing AI ethics cannot rely on computer science or anthropology alone, but requires interdisciplinary cooperation. Encouraging humanities scholars to engage in AI design is not only academically meaningful, but also a moral responsibility.

Next, Xiong Zhiying questioned whether traditional anthropological notions of culture—often static, homogeneous, and typological—remain adequate in rapidly changing, hybrid societies. Facing technological transformations such as AI, he asked whether anthropology needs more dynamic and complex analytical frameworks.

In response, Zhang stressed that anthropological methods apply not only to distant societies, but also to our own technological practices, such as AI design and use. He agreed that U.S. anthropology’s culture concept, influenced by 19th-century German traditions, emphasized typology and stability, which now struggles to capture reality. In AI research, cultural change is so rapid that seeking “typical culture” becomes outdated.

Drawing from his own China–U.S. user studies, Zhang observed surprisingly similar responses because AI is still new in both societies, though subtle differences remained: Chinese users were more proactive, U.S. men more conservative, and women more open. This shows strong heterogeneity even within the same society, such as gender differences. He encouraged maintaining openness, flexibility, and critical use of the culture concept when studying emerging technologies, rather than clinging to static and typological frameworks.