Despite its ambiguities and inadequacies, language is the most significant and universal ways we can attune other human beings to our own experiences. It is the building block of our understanding of each other, a key that turns communication into understanding in a world where every single human being, all eight billion of us have a rich internal life, carved out by our encounters with the world and our responses to it.
It was unsurprising then, that pioneers of artificial intelligence such as Alan Turing, used language – a significant centrepiece of human cognition – as the ultimate benchmark for a machine to prove it could perform as intelligently as a human. This model, however, also imagines as its baseline a human who is consistent with only one typology: undifferentiated by culture, conditioning, gender, geography – unsurprisingly, given the lack of imagination on the part of “great” philosophers and thinkers to be unable to envision anyone apart from a white cis man as the singular template for humanity. As later critics of this language-first approach argued, the central flaw of this model was its elision of phenomenological, embodied experience: human beings are not brains in jars, but are fundamentally shaped by their perception of and interaction with the worlds around them, through their physical bodies and their sensory encounters.
Alison Adam, a groundbreaking feminist thinker in technoscience and information systems, challenged this perspective in a landmark essay written in 2000, where she privileges identity as core to human subjectivity, which had been traditionally “deleted” in the disciplinary trajectories of AI, due to an:
The intersectional implications of Adam’s ideas have been fortunately built upon by feminist thinkers who grapple with the persistent reality that central to the hype around AI are (still) men – while there might be some non-white faces in the mix, these coteries are still typified by those in possession of extreme privilege, by way of generational wealth, race, caste and class. Those who occupy these spaces of “professional masculinity” employ and fund many like them – educated in elite institutions, possessing an apolitical technoutopian outlook – who replicate these limitations in the technologies that they build, that now form the significant building blocks that most of the AI in the world is trained on: large language models (LLMs).
While AI has been powering our technologies for many decades now, the generative AI moment, typified by ChatGPT and its ilk is significant in its ability to allow technology users to actively interact with AI first hand. Generative AI, trained on large language models which allow it to understand human speech with more context and nuance rather than merely translating individual words to make meaning, are striving to appear as intuitive as human beings as a true metric of their success. In commercial and transactional settings, such abilities can be welcome, in a world where consumers want fast and efficient service (though as the personal experiences of many demonstrate, this is not necessarily always the case!) In a world which dedicates less and less of its resources to people in crisis, or on the margins, a logical alternative might be to try and stem the gap in service and care provision using similar technologies, and this is an idea that is being increasingly embraced by organisations delivering frontline services both in India and elsewhere.
The inadequacy of existing AI powered tools to meet the needs of those who are encountering gender-based or sexual violence can be profoundly and poignantly demonstrated by the Third Eye Portal’s excellent project, a Dictionary of Violence. This brilliantly conceived exercise draws on the expertise of experienced grassroots case workers and aims to build a lexicon of the words that acknowledge the complexity of their multifaceted uses and contexts that are used to describe the subjectivities, experiences and encounters of the women who encounter violence in these rural areas. By finding the vocabularies that allow sharing, empathy and understanding, the suffocating silence of victimhood emerges into a sisterhood of survival, helping to give voice to sometimes equivocal concepts in “everyday words, common words, some unique words”.
Even if we were to leave aside the improbability of these women having access to digital devices that would allow them privacy and time enough to seek help online, the existence of a project like this also exposes the violence that is also being waged upon the unrepresented through the technologies that are being constantly flagged as a shortcut and panacea to difficult and complex situations. Indian languages are only now beginning to be represented somewhat adequately by LLMs, with only eleven major Indian languages with usable datasets, and there is a very real danger of some minority languages to not be represented at all, given that investment flows for these vastly expensive projects from sites of incumbent political and economic power in the country. Indigenous categories of queerness particular to the Indian subcontinent, for example, also stand to be undermined by the hegemony of western, anglocentric models to understand and describe sexualities and sexual ways of being.
The opportunity for participation for those who are currently under-represented in the datasets that power these AI models is still grotesquely lacking, and the diversity that could promise any sort of shift in the subjectivities represented in these models is still very very far from being a reality in India. Much of this is due to the narratives that drive AI “innovation”: that glorify mega-scale undertakings in the dangerous belief that bigger is inevitably better. However, given that marginalised and minority communities do not necessarily preserve their histories only through codified language, the potential for these legacies to be as visible online as other hegemonic cultures is drastically reduced.
In order for any sort of meaningful shift to happen, there needs to be an acknowledgement that the current trajectory of AI development will only serve the causes and needs of a global elite. A glimmer of hope in what feels like a desperate darkness is the idea of small language models: that are more economically and environmentally sound as they can run on relatively low-bandwidth infrastructures, unlike the energy guzzling demands of generative AI models. These are based on curated distillations from large language models and are currently being used in scenarios that privilege community input and co-creation in countries like Brazil, Indonesia, New Zealand, Norway and for a range of African languages.
While this alternative is proving to be a somewhat feasible response to the violence of LLMs, it also forces communities to visibilise and externalise their cultural practices and heritage, and forces them to violate traditional protocols that privilege secret and sacred knowledges particular to those communities. These processes have to be careful and community led, and allow these communities to leverage their data to their advantage, a concept known as data sovereignity.
Nevertheless, these developments do contain within them the possibilities of more representative digital spaces – the potential of projects like the Dictionary of Violence to act as nuanced datasets is vast. There must be an understanding that it is only through these participatory engagements that AI will be able to bear earnest witness to the myriad variety of gendered subjectivities that characterise our experience of being human in this world: to name and define concepts that are inextricable from the lifeblood that powers our existence.
References:
- Adam, A. (2000). Deleting the subject: A feminist reading of epistemology in Artificial Intelligence. Minds and Machines, 10(2), 231–253. https://doi.org/10.1023/a:1008306015799
- Dictionary of violence. The Third Eye. (2025, February 27). https://thethirdeyeportal.in/the-learning-lab/dictionary-of-violence/
- Tanner, B. & Kerry C.F. Chinasa (2025, March 19). Can small language models revitalize indigenous languages? Brookings. https://www.brookings.edu/articles/can-small-language-models-revitalize-indigenous-languages/
Cover image by Christopher Bill on Unsplash