What is the main finding of the UJ word embeddings study on SA news?

The study confirms socioeconomic race bias: Black-associated terms link to poverty/crime, White to wealth/safety across all 39 outlets analyzed.

How do word embeddings detect racism?

Word embeddings like Word2Vec map words by context. Race dimension (Black minus White vectors) shows stereotype associations via cosine similarity.

Which outlets showed strongest bias?

Business/finance (Moneyweb, Business Day) White-positive; community/gov (Daily Sun, SABC) Black-negative. All biased systemically.

What corpus was used?

27,140 COVID-19 vaccination articles (2020-2023) from 76 outlets, 39 with ≥100 articles, via Media Monitoring Africa.

SAHRC 2000 inquiry key outcomes?

Found institutional racism via stereotypes, lack of diversity; recommended training, black recruitment, stronger codes. Full report .

Correlates with prior studies (r=0.75), human ratings (r=0.61, α=0.54). WEAT p<0.001.

Limitations of the method?

Word2Vec static, no context; COVID focus; binary Black/White.

Implications for SA media?

Targeted audits, diversity hires, NLP training for editors.

Health bias findings?

Weaker; unhealthy terms nearer Black pole (marginal p), possibly COVID effect.

BERT models, full archives, multi-race dimensions, speaker embeddings.

Authors and affiliation?

Nnaemeka Ohamadike, Kevin Durrheim, Mpho Primus; University of Johannesburg Psychology.

Racism in SA News: Word Embeddings Audit | AcademicJobs

Q: Limitations of the method?

Word2Vec static, no context; COVID focus; binary Black/White.

Q: Implications for SA media?

Targeted audits, diversity hires, NLP training for editors.

Q: Health bias findings?

Weaker; unhealthy terms nearer Black pole (marginal p), possibly COVID effect.

Q: Future research?

BERT models, full archives, multi-race dimensions, speaker embeddings.

people on street during — Photo by palesa on Unsplash

A groundbreaking new preprint from researchers at the University of Johannesburg has brought computational rigor to a long-standing debate in South Africa: the persistence of racial bias in news media coverage. Titled "Tracing Bias to Its Sources: A Word Embedding Audit of Racism in South African News Outlets," the study employs word embeddings—a natural language processing (NLP) technique—to dissect language patterns across 39 major news platforms. By analyzing over 27,000 COVID-19 vaccination articles published between January 2020 and May 2023, the authors reveal not just the existence of socioeconomic race bias but pinpoint which outlets contribute most.

The findings confirm what qualitative studies have suggested for decades: South African news language associates concepts of poverty, welfare dependency, and crime more closely with Black-associated terms, while wealth, investment, and safety cluster around White-associated ones. This "inferential racism," as termed in prior research, operates subtly through word associations rather than overt slurs, making it harder to detect but no less damaging to public perceptions and social cohesion.

Historical Roots of Media Bias in Post-Apartheid South Africa

South Africa's news media carries the scars of apartheid, when outlets were state tools for white supremacy propaganda, suppressing Black voices and justifying segregation. The 1994 democratic transition spurred reforms, but challenges persisted. The South African Human Rights Commission's (SAHRC) landmark 2000 "Faultlines" inquiry—sparked by complaints over cartoons and columns stereotyping Black people as corrupt or incompetent—concluded that media institutions were "characterised as racist" due to cumulative effects on dignity and equality. It highlighted underrepresentation (few Black sub-editors), stereotypes in crime/poverty coverage, and white-dominated ownership.

Follow-up studies by the Media Monitoring Project (MMP) in 1999, titled "The News in Black and White," documented similar patterns: Black South Africans framed as victims or perpetrators, whites as experts or benefactors. Despite diversification efforts, a 2022 analysis by Govenden found "inferential racism" enduring into 2014. Commercial pressures, niche targeting (e.g., business papers for affluent readers), and legacy newsroom cultures have slowed transformation, with self-regulation bodies like the Press Ombudsman handling few racism complaints effectively.Read the full SAHRC Faultlines report.

Understanding Word Embeddings: A Crash Course in AI Bias Detection

Word embeddings are vector representations of words in a high-dimensional space, capturing semantic relationships based on co-occurrence in text. Popularized by Google's Word2Vec (developed in 2013), the model learns that words like "king" - "man" + "woman" ≈ "queen" through context prediction. To detect bias, researchers construct "dimensions" like race (average vectors of "Black/African/Zulu" minus "White/European/Afrikaner") and project stereotype vocabularies onto it. Positive cosine similarity means closer to Black pole; negative to White.

In bias tests like the Word Embedding Association Test (WEAT), terms like "poverty" or "grant" scoring high on the Black pole indicate negative racial association. This method scales to millions of words, revealing implicit biases invisible to manual review. While static (lacking sentence context, unlike BERT), it's robust for large corpora and validated against human judgments.

The Study's Methodology: From Corpus to Outlet Vectors

The researchers curated a corpus from Media Monitoring Africa's database: 27,140 articles on COVID-19 vaccinations from 76 outlets, filtered to 39 with ≥100 articles each. Articles were prefixed with outlet names (e.g., "News24_news24") for embedding. Ten bootstrap Word2Vec Skip-gram models were trained per resample (3,900 articles, 100/outlet), using Gensim (dimension 200, window 10).

Race dimension from validated pairs (Black-white, African-European, etc.). Socioeconomic stereotypes from MMP (1999), Talbot & Durrheim (2012), and ChatGPT curation: Black pole ("township," "SASSA grant," "R350 relief," "NSFAS"); White ("investor," "taxpayer," "privilege"). Health stereotypes tested too. Outlet vectors averaged, projected onto race dimension for bias scores. Validated via WEAT (effect size -1.04, p<0.001), correlation with prior study (r=0.75), and 26 South Africans' ratings (r=0.61, α=0.54).Access the full preprint.

Main Findings: Consistent Socioeconomic Race Bias Across the Board

The race dimension replicated prior biases: Black pole near poverty/crime ("informal settlement," "gangster," "load shedding"), White near prosperity ("economy," "business rescue"). All 39 outlets biased, no outliers—bias is systemic.

Business/finance (Moneyweb, Business Day, Fin24): Strongest White-positive (high wealth links), low Black-negative.
Metropolitan/community/gov (Daily Sun, Cape Argus, SABC): Strongest Black-negative (welfare/crime), low White-positive.
Mid-tier: The Conversation, Bhekisisa balanced but biased both ways.

Health bias weaker (WEAT -0.36, marginal p), unhealthy terms ("HIV," "TB") nearer Black pole, possibly COVID-amplified.

Photo by National Cancer Institute on Unsplash

Chart showing race bias scores across South African news outlets from word embeddings study

Business Outlets vs. Community Papers: Divergent Worlds?

Business media (e.g., Financial Mail cosine -0.15 White pole) reflects audience (affluent, white-skewed), prioritizing investor concerns. Community papers (Daily Sun 0.12 Black pole) cover townships/grants, mirroring demographics but reinforcing stereotypes. SABC neutral-ish, but all embed bias via topic selection and framing.

Human Validation and Robustness Checks

South Africans rated 40 socioeconomic terms on race association (1 Black-7 White), correlating strongly with embeddings (r=0.61). Sensitivity tests (different seeds, dimensions) confirmed stability. Health terms weaker human correlation (r=0.27), suggesting subtler bias.

Implications for Journalism and Democracy

This audit shows bias as institutional, not individual—newsrooms' audience targeting perpetuates divides. In SA's fragile democracy, skewed portrayals erode trust, fuel polarization (e.g., farm murders vs. township crime). COVID coverage amplified inequities: Black poverty vs. white privilege frames hindered unified response.Published companion paper in EPJ Data Science.

Towards Solutions: AI Audits, Diversity, Training

Scalable Monitoring: Embeddings enable regular audits, holding outlets accountable.
Diversity Quotas: More Black editors/sub-editors per SAHRC recs.
Training: NLP literacy, bias workshops (SANEF-led).
Policy: Strengthen Press Council, incentivize balanced coverage.

Higher Education's Role: NLP for Social Justice

UJ's Psychology Dept exemplifies interdisciplinary higher ed impact: NLP meets social psych. Tools empower researchers tracking bias evolution. For academics, explore careers in computational social science via research positions or faculty roles.

Photo by Jan Antonin Kolar on Unsplash

Visualization of race dimension in word embeddings from SA news study

Limitations and Future Directions

Static embeddings miss context; future BERT/XLNet analyses. COVID corpus limits generalizability—expand to full archives. Include Coloured/Indian dimensions. Longitudinal tracking post-study.

This UJ preprint revives SAHRC calls for media transformation, offering AI as ally against subtle racism. With elections looming, unbiased reporting vital for cohesion. Researchers urge newsrooms: audit language, diversify, prioritize society over niches. Explore SA higher ed opportunities at AcademicJobs South Africa.

Historical Roots of Media Bias in Post-Apartheid South Africa

Understanding Word Embeddings: A Crash Course in AI Bias Detection

The Study's Methodology: From Corpus to Outlet Vectors

Main Findings: Consistent Socioeconomic Race Bias Across the Board

Business Outlets vs. Community Papers: Divergent Worlds?

Human Validation and Robustness Checks

Implications for Journalism and Democracy

Towards Solutions: AI Audits, Diversity, Training

Higher Education's Role: NLP for Social Justice

Limitations and Future Directions

Word Embeddings Audit Exposes Persistent Racism in South African News Media

UJ Preprint Pinpoints Race Bias Across 39 Outlets Using COVID Coverage Corpus

Frequently Asked Questions

📊What is the main finding of the UJ word embeddings study on SA news?

🔤How do word embeddings detect racism?

📰Which outlets showed strongest bias?

🦠What corpus was used?

⚖️SAHRC 2000 inquiry key outcomes?

✅How validated?

⚠️Limitations of the method?

🛠️Implications for SA media?

🏥Health bias findings?

🔮Future research?

👥Authors and affiliation?

🔗Preprint DOI?

Cognitive Learning Theories: Latest Research Papers & Insights | AcademicJobs

Browse by Faculty

Browse by Subject

Assistant/Associate Professor of Psychology

WHO Raises Alarm Over Rapid Spread of Rare Ebola Strain in Congo

Brazil Maintains 13th Global Position in Scientific Publications Amid Challenges

Rising Demand for Clinical Research Coordinators in Brazil's Job Market in 2026

Brazil's $4 Billion AI Healthcare Initiative: Advancing Precision Medicine

UAEU Leadership and Flexibility: Insights from the UAE Model Academic Session

UAE Dh1 Billion Space Cooperation Programme Sparks New R&D Research Initiatives

Promote Your Research… Share it Worldwide