543 Transcript

Dr. Jeremy SharpTranscripts Leave a Comment

Dr. Jeremy Sharp (00:00)
Hey folks, welcome back to the podcast. Today is a clinical episode of sorts. I’m doing a research review of an article published in the clinical neuropsychologist in 2025 titled artificial intelligence and natural language processing in modern clinical neuropsychology, a narrative review. Brittany Wolf was the primary author or only author on this article. And in the spirit of AI discussion, which is everywhere right now, I wanted to

dig into the literature a little bit and find some uses of AI that fall outside kind of our typical conversations in terms of report writing and efficiency and productivity and things like that. So today I’m going to be talking through this article and what it has found. And it’s super interesting, at least from my perspective. We’re going to primarily focus on the use of natural language processing in neuropsychology. So what does this mean? It’s basically like looking at

I would say digital biomarkers in speech and how those biomarkers can be used and analyzed by machine learning to help guide our diagnostic process. So this sounds like kind of sci-fi, but it is happening and it’s got a lot of potential and I think it’s just the path toward this kind of intervention or strategy or

analysis is accelerating very quickly. So let’s dig in and learn about natural language processing. Before we totally do that though, I want to remind all of you of the testing psychologist membership opportunity called CRAFT. It’s going to be launching in mid-January. If you’re interested in finding out more about this and being first to have access to the community,

You can go to thetestingpsychologist.com slash craft and get on the interest list. This is going to be a community built all around applying knowledge and accountability and support. So the idea of taking what you may have learned here in the podcast and, and putting it into practice with others who are similarly motivated with one-on-one support from me. If that sounds interesting, like I said, go to thetestingpsychologist.com slash craft.

Dr. Jeremy Sharp (02:13)
All right, everybody, we’re back and we’re talking about one of my favorite topics, which is technology and AI. And this time we’re digging a little bit deeper and getting a bit more in the weeds in terms of methodology and tools and things like that. We’re talking about natural language processing and machine learning and that kind of thing. I know unless you have just been living under a rock, know, AI, artificial intelligence is all over the place. We see it in the news, in our emails, it’s in…

basically everything we do. And it’s certainly coming up in our clinical reports. beyond the hype, I think, of AI helping us with writing reports and generating backgrounds and analyzing test results and so forth, what is actually happening with AI in our field? There are some folks out there who are doing some really cool things to push forward the diagnostic framework.

Specifically, natural language processing and machine learning are kind of changing the way that we do assessment and diagnosis and monitor neurocognitive conditions. So today we’re going to take a deep dive. I put together a bunch of resources, but primarily drawing from this 2025 article, like I said, in the Clinical Neuropsychologist by Brittany Wolf. So we’re going to go through kind of a fundamental question, I guess. Is AI ready for the neuropsych?

clinic. We’re going to cover kind of the current state of digital biomarkers, quote unquote. So I’ll look at specific applications in ADHD and Alzheimer’s and psychosis, talk about some ethics and some of the different concepts involved with this approach. whether you are like a total tech enthusiast or this has you completely scared and wanting to run away, I’d say stay tuned and let’s dig into it. So first,

part I want to tackle is just like, I guess, the landscape and some definitions here. So I may have already lost many of you at natural language processing and machine learning, which is totally fine. So before we dig into specific disorders, just like level setting a little bit on terminology, because I think AI has become a little bit of a junk term, to be honest, over the last couple of years. So what I’m talking about here,

in terms of natural language processing or NLP, that’s the common acronym. It’s defined as a suite of computational techniques that facilitate the analysis and representation of naturally occurring text at various linguistic levels. Okay, what does that actually mean? In our context, an NLP marker is a digitally acquired, which means either video or audio recorded, typically audio.

digitally acquired quantifiable measure of human language production that reflects the state of biological or neurocognitive processes. So this is essentially recording speech and analyzing it for different characteristics. Why does this matter? Well, like the article states, our field has experienced pretty limited technological innovation over the past 100 years. 100 years, folks.

compared to other medical disciplines, right? So we are relying on tests that are very, very old and have only recently even started to see measures that are digitally native and starting to leverage technology like item response theory and measuring, know, latent variables and things like that, process variables with technology.

So we still rely like pretty heavily on subjective interpretation and like time intensive paper and pencil or iPad measures. So NLP offers the potential to turn a lot of the material, namely verbal production or speech into objective, like digital biomarkers. So essentially giving us data, not just from the test results, but from the interactions with our clients. So

I think here it’s important to think about like the sheer amount of data that we discard in the evaluation process, right? So some of us are maybe transcribing interviews or feedbacks or even testing sessions, like recording them. Many of us are trying to transcribe just paper and pencil, like writing down what people say. But let’s think about it. Like we, you know, record somehow our response to an item on the waste, right? Like a similarities item.

and then we score it, you know, zero, one or two, and then we move on. But these NLP algorithms that are out there can actually analyze the acoustic features of the speech, like pitch and pause duration, the lexical features, like vocabulary richness, and the semantic structure, okay? So like coherence of the speech. So there is a ton of data that we are just discarding in the testing process that could be

really, really helpful. And current research suggests that these text-based features actually contribute more to model accuracy than audio markers alone, which is super interesting. So the words that the patients say might actually matter more than how they sound when we feed them into the models. So this is, like I said, super interesting. mean, even recording this episode, I’m sort of…

flabbergasted I guess at the technology that’s happening here. So there are moments here along the recording process where I pause and I’m thinking, my gosh, that’s really cool. But let’s dive into some clinical applications here and we can start with ADHD. So this is a population I think that many of us see day to day. ADHD is a big question.

Diagnosing ADHD in adolescents and adults and kids is complex. We just had a huge discussion in our testing consult group at our practice yesterday about how to diagnose ADHD and what to do with disparate or discordant data. Anyway, complex process. We’re often relying on these subjective rating scales that can be prone to bias. Then you throw in testing results that may or may not actually reflect real world executive functioning.

Okay, so there’s a super interesting study published in May 2025 that looked at using NLP, natural language processing, to analyze what they call self-defining memories in adolescents. Why is this important? Okay, so they asked adolescents with ADHD and a control group to recount important life events. Using stylometric analysis, they found that adolescents with ADHD produced narratives that were shorter.

less lexically diverse and less cohesive. The machine learning tool that they built could distinguish between the ADHD and control groups with, they say, up to 100 % precision in their sample. This is fascinating. So this matches, we talk sometimes about narrative coherence and how folks with ADHD can be like quote unquote all over the place or scattered or

you know, bouncing from thing to thing. And this is like a kind of rigorous analysis of the speech patterns to actually validate that. But what’s interesting here, I think, is the psychological mechanism. So the study suggests that these linguistic markers reflect difficulties in emotion regulation and cognitive organization. So specifically, the ADHD group overused the indefinite pronoun on in French, OK, which is similar to one or like

a generalized we in English, which the researchers guessed might serve as a distancing function to reduce emotional reliving of memories. OK, so again, why is this important? So this tells us that NLP isn’t just counting words. It’s actually detecting a narrative style that reflects executive functioning concerns in a way that our standard batteries might miss. And just speaking personally, the

This level of nuance of analyzing speech is certainly nothing that I am capable of in just listening to someone, right?

Okay, so that’s an ADHD application. Let’s go to more of like an Alzheimer’s, know, mild cognitive impairment clinical situation. Okay, so we can pivot a little bit. This probably has like the most robust research, I think, in this area is neurodegeneration and related concerns. So the goal here is early detection. So again, another research paper recently found that digital voice biomarkers like

lexical, semantic, and acoustic scores demonstrated higher diagnostic performance for detecting MCI compared to the Boston Naming Test. So that’s a significant finding for us. Boston Naming Test is kind of a staple neuropsych measure. think a lot of folks use it. It’s a little bit fraught at this point with the pictures, but it’s been around for a long time. But here we’re finding that a brief voice analysis outperformed it.

They also found that lexical semantic scores, meaning the content and meaning of speech, were actually associated with CSF, amyloid beta levels. So this is getting into adult neuropsychology, right? And even if that’s not your area of specialty, certainly not mine. The takeaway here is essentially the way a patient speaks could predict their biological.

predisposition or status of Alzheimer’s, you know, before dementia actually sets in. There’s another research paper in 2024 that took it a step further. They used voice recordings from neuropsych testing to predict progression from MCI to Alzheimer’s within six years. The models had an accuracy of about 78.5%. Okay, so that’s not like awesome, but imagine being able to tell a patient with MCI,

that they have a 75 % chance, know, like relatively high confidence based on their speech patterns today, like what the trajectory might look like over the next several years. I do want to emphasize though that the type of speech task does matter. So there is something, there’s kind of like some common tasks that get used. But a study in 2025 found that we might need to like broaden

the frame, I suppose, and the kind of tasks that we are using. So they compared different speech tasks, like autobiographies or news event descriptions and noun descriptions. And they found that describing specific nouns, like explaining abstract and concrete concepts, sound familiar from one of our waste subtests, resulted in highest classification accuracy, which is about 80 % for prodromal.

Alzheimer’s, which outperformed some of the other kind of traditional tasks that we give them. So if you’re designing batteries or research protocols, asking patients to define words or concepts might yield a kind of a richer digital data than describing a picture, for example. All right, one more clinical application. We can talk about psychosis and schizophrenia.

So disorganized speech, I think we all know, is a hallmark of schizophrenia, but it is notoriously hard to quantify reliably using just our ears and a rating scale. It’s pretty subjective. We have some guidelines, of course, but still, I think it’s pretty subjective. NLP, though, is changing this. So again, recent research assessed some of these language markers in first episode psychosis and people at high clinical risk.

And they found that measures like semantic coherence, again, and on-topic scores were significant differentiators for psychosis. an interesting kind of compelling concept that emerged from this literature is something called perplexity. So they, one group used a large language model to quantify how unexpected a patient’s word choice is given the context, and they called this metric perplexity.

So they found higher perplexity or higher unexpected word choices in untreated first episode schizophrenic patients. And then mechanistically, they kind of mapped this onto the brain’s semantic network using fMRI. They found that this perplexity in speech was linked to some specific imbalances in, I’m quoting here, excitation and inhibition in the inferior frontal gyrus and middle temporal gyrus.

So essentially, like the loss of semantic control, which is kind of the brain’s ability to clamp down on irrelevant associations, is measurable in the unpredictability of the patient’s speech. We can also see acoustic markers in this research as well. So they are also playing a role. Some folks found that pausing behavior, specifically longer pauses, was associated with negative symptoms in youths that were at a higher clinical risk for psychosis.

And they said that it suggests a breakdown in discourse planning, which I think is just how they decide what they’re going to say. So a few different clinical applications here. Like as you can tell, there’s a lot of research going on in this area. But as always, we want to move from theory to practice as much as possible. So how does this actually show up in our practice given that not all of us are data scientists and we certainly don’t have the time to run these

large language models or, you know, NLP programs between our own clients. So there are tools emerging. All right. So we’re seeing the rise of ambient AI and digital scribes. You guys have probably seen all of this, right? So Heidi is one. Gemini or Google Workspace has a transcription feature. Like pretty much any meeting software now has an AI transcription feature. So

A recent systematic review evaluated AI-based speech recognition for clinical documentation. even though the results were mixed regarding time savings, primarily because editing takes longer than writing the note sometimes, the newer LLM-based systems are showing promise in reducing burnout and improving documentation completeness. In our specific field, there is work on something called NAT.

It’s an LP annotation tool that’s designed to facilitate like phenotyping from electronic health records. So the tool extracts data and highlights keywords in clinical notes like memory loss or ADLs and helps clinicians kind of or differentiate cognitive status faster and with higher inter-rater reliability than just manual chart review.

That’s promising. We have to address though what is called the black box problem. So if an AI tells you a patient has a high probability of Alzheimer’s based on a speech sample, but you can’t see why, can you trust that? And I’m guessing this is going to be a really big hurdle for a lot of clinicians to get over. And that brings us to a critical concept called human in the loop. Okay. So a paper

a recent paper address this head on and they argue that the traditional machine learning framework like where you train a model and lock it and deploy it is insufficient for our field essentially. So they advocate for what they call active learning where the model identifies cases it’s uncertain about and then asks the human clinician to label them. This increases the model robustness and allows us to

kind of understand the known unknowns, like where the model might know it’s confused and where the model is confident but wrong. I’ve talked about this, I think, a lot in different AI episodes, like the idea that we have to be involved in the process when we’re using AI. Very little is like a set it and forget it or a completely done for you situation, even though that’s very tempting. So I think the same applies here.

need to view these models not as replacements, more as consultants that require a clinician in the loop. And that might make intuitive sense, for any of you who have used AI a lot, myself included, it’s really tempting to just let it kind of do its thing. And it is very convincing. for me, it’s sort of a constant process to remind myself, hey, you have to double check absolutely everything that comes out of this language model.

So, you we’re the ones that have to integrate the output with the patient’s history and behavioral observations and life and everything. And I think that’s where we still have a job, thankfully. Okay. So kind of dense episode here, but I want to start to wrap up by thinking about challenges and ethics. We’re like treading into this territory already. All right. The first one is bias. This is super common.

A lot of the training data for these speech models that I’ve talked about comes from highly educated English speaking, predominantly white cohorts. Does that sound familiar to anyone as far as research history? So this raises pretty massive concerns, obviously, about algorithmic bias. If we use these tools on patients from different cultural or linguistic backgrounds, there’s a huge risk in misclassifying dialect and storytelling styles as disorganized or impaired.

So that’s one place cross-linguistic research is definitely lagging behind at this point. We know less about neurodegenerative disorders of language than we do about disorders of one language, which is English. The second thing is privacy. Language data is inherently identifiable, right? You can’t just strip a name and call it anonymous if the patient is telling a story about their life. So we absolutely need HIPAA compliant de-identification methods and clear

governance and how this data is stored and used by third party AI vendors. That relates to the third thing, which is security. There’s a fascinating study by Levine and others about PVTs. So they pose questions to chat GBT about how to feign or fake neuropsych tests. And while the AIs showed some ethical opposition to helping people cheat,

The chat bot still provided responses that were rated as moderate to high threat to test security in a significant percentage of cases. just the accessibility of this information via AI, think, is a real threat to the validity of our assessments.

So where do we go from here? What is the verdict here? Before I share the verdict, I’m curious what y’all are thinking, actually. Where are your heads at right now? Post some comments on the episode, either in Spotify or Apple or on the blog. But here’s what I think. Here’s what the research would say. AI and NLP, I think, are moving from experimental novelty to clinically actionable technologies.

Like I said in the beginning, it’s accelerating quickly and it’s closer than we think. We’re seeing accuracy rates that rival or in some cases exceed traditional paper and pencil tests for detection of MCI and Alzheimer’s and even psychosis. The translation gap is real though. So we need tools that are user friendly and transparent and for instance, like integrated into our EHRs.

in a better way. We need to be trained in foundational AI literacy. If you’ve all been listening, you know that I’m a big proponent of AI competency and operationalizing that and making sure that we understand basic concepts like, I mean, what’s happening in these models and that will allow us to critically evaluate these tools. So by all indications, the future of neuropsych looks like a hybrid model.

You can imagine a dashboard that kind of tracks your patient’s speech, acoustics and semantic complexity longitudinally, flagging subtle issues in the moment. And if you’re working with someone, you know, even long-term, like months before it may show up on neuropsych tests. it’s almost like you could imagine like an AI scribe that drafts your report while you focus entirely on the patient, which sounds kind of cool. But

For now, our role is the same. We are the experts, right? I keep coming back to this in all these AI episodes. We are the experts. We are the safeguards. We have to work for transparency and validation with diverse populations and ethical standards from any of these tools that we adopt. So that’s where I’m going to close with all of this. Appreciate you.

listening to this. I know, like I said, it’s kind of a dense episode. the, the actual article, which is great is linked in the show notes. so you can go check that out. Of course. And like I said at the beginning, if you are looking to jump into a community of accountability and taking some of the knowledge, you know, if you want to talk about AI, apply some AI, figure out the risks and benefits of AI in a community format, jump into,

below the wait list for Kraft. So the testingpsychologist.com slash Kraft doors will be opening in early to mid January and I’d love to see many of you there.

Prefer to listen to the episode? Click here.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.