pull down to refresh

Language models seem to treat "masculine and White concepts... as the 'default' value."
Anyone familiar with HR practices probably knows of the decades of studies showing that resumes with Black- and/or female-presenting names at the top get fewer callbacks and interviews than those with white- and/or male-presenting names—even if the rest of the resume is identical. A new study shows those same kinds of biases also show up when large language models are used to evaluate resumes instead of humans.
In a new paper published during last month's AAAI/ACM Conference on AI, Ethics and Society, two University of Washington researchers ran hundreds of publicly available resumes and job descriptions through three different Massive Text Embedding (MTE) models. These models—based on the Mistal-7B LLM—had each been fine-tuned with slightly different sets of data to improve on the base LLM's abilities in "representational tasks including document retrieval, classification, and clustering," according to the researchers, and had achieved "state-of-the-art performance" in the MTEB benchmark.
Rather than asking for precise term matches from the job description or evaluating via a prompt (e.g., "does this resume fit the job description?"), the researchers used the MTEs to generate embedded relevance scores for each resume and job description pairing. To measure potential bias, the resumes were first run through the MTEs without any names (to check for reliability) and were then run again with various names that achieved high racial and gender "distinctiveness scores" based on their actual use across groups in the general population. The top 10 percent of resumes that the MTEs judged as most similar for each job description were then analyzed to see if the names for any race or gender groups were chosen at higher or lower rates than expected.
The issue at hand with any sort of AI is the data. The data sets are pure and can't hide bias that humans otherwise can. This is just once again showing clearly what the data contains and can only be addressed with better/new data. Model weights can help somewhat but the best fix is going to be better data sets.
reply