Dr. Jeremy Sharp (00:37)
Hey folks, I am really glad to have NovoPsych Psychometric sponsoring the show. If you do structured assessment work, then you will likely love NovoPsych. NovoPsych brings 150 plus standardized measures into one platform. What I particularly like is the extra layer of psychometric interpretation. So it helps you understand what scores actually mean. So the results are easier to communicate. If you are interested in high quality measures for personality, disability, ADHD, or autism,
You can try NovoPsych with a 15 day free trial via the link in the show notes, is novopsych.com slash testing psychologist. That’s N-O-V-O-P-S-Y-C-H.com slash testing psychologist.
Dr. Jeremy Sharp (01:23)
Hey everyone. Welcome back to the testing psychologist. Today kicks off a mini series on autism that I’m going to be tackling over the next probably month to six weeks. Most of the episodes will happen sequentially, there will be a couple of times where we have some business episodes or an interview thrown in there. But this is essentially a four part mini series on autism and the state of autism assessment here in 2026. So what are we going to tackle?
Episode one, which is today, is going to focus on the ADOS2. So how well is the ADOS2 actually working these days? Episode two, we’re going to focus on the female autism phenotype. Is this a different presentation or is it a diagnostic blind spot? Episode three is going to go into camouflaging and masking. So an overview of those concepts and kind of the state of the research right now. And then episode four is going to go a little bit deeper.
Take more of a philosophical bent around this question of, you know, can you truly mask autism spectrum symptoms? And if you can, is that some kind of indicator that maybe it’s not actually autism? So lots coming up over the next month to six weeks in this autism realm. I’m excited about this. Did a lot of research for these episodes and I think there’s some, some great content coming up. So for the episode today, this is
You know, essentially revisiting the idea that the ADOS 2 is the gold standard, quote unquote, instrument for assessing autism. It is the most widely used diagnostic instrument and we’re going to critically examine its performance and its limitations and appropriate clinical use. So we’ll start with an overview. We’ll talk about like overall diagnostic performance of the ADOS. We will talk about the intersection of clinical judgment and the ADOS results.
We’re going to talk about specificity issues and performance in quote unquote high functioning individuals. I know that term is not completely OK right now, but that’s why I put it in quotes. We’re going to talk about sex and race and ethnic bias in the ADOS. So we’re going to talk lots of different things today and then of course wrap up with some clinical recommendations and takeaways. So if you are in the autism spectrum assessment world and have wrestled with these questions then
Now’s a great time to follow or subscribe to the podcast. Make sure that you don’t miss any episodes coming up over the next several weeks on this topic. And if you are a practice owner who would like some support in your practice, Crafted Practice Retreat is coming up this summer. So we are into April, I think, at this point. And yeah, the retreat’s coming up in late July. This is the fourth year that I’m doing it. It’s all inclusive. It’s a business retreat.
You do not have to have everything together in your practice. In fact, it’s better if things are a little bit messy and you come in with something that you’d like to work on and really have some space to tackle. You can go to thetestingpsychologist.com slash crafted practice and get more info and schedule a pre-retreat call just to see if it’s a good fit. I’ve talked to several folks here recently and we’re just working through whether the retreat is a good fit.
In most cases, it usually is. So that’s a spoiler, but I’m happy to chat with you and talk through that and make sure that it’s the right choice for you this summer. All right. For now, let’s dive into the ADAS 2.
Dr. Jeremy Sharp (04:59)
Okay, we are back and we’re just going to jump right into it. So I’m going to start this episode with a, just an introduction, kind of an overview of the ADOS for folks who may not be familiar or who have maybe lost touch with some of the aspects of the ADOS. So the ADOS, the original ADOS was developed back in the, gosh, late nineties, maybe early two thousands. So I was trained on the original ADOS, if that gives you any idea of when I was in grad school and starting to do this work.
But ADOS2 has been out for many years now, I think late 2000s, maybe early 2010s. But we have gotten pretty familiar with it, and there’s been quite a bit of research on it. So here’s the thing. It is important, right? Autism spectrum disorder affects approximately 1 in 36 kids in the US at this point.
Diagnosis tends to rely pretty heavily on behavioral observation and clinical judgment, right? Like we still don’t have a blood test or a brain scan or any kind of reliable biomarker to diagnose autism. enter the ADOS2 and many other observational measures. So ADOS2 stands for Autism Diagnostic Observation Schedule. This is the second edition, like I said, and it has over the years become kind of the de facto gold standard for autism assessment instruments worldwide.
But we’re going to look at how well this gold standard instrument actually performs and for whom does it work or not work so well. Now I’ve done episodes in the past on the ADOS 2 and the ADIR and you know the authors of the ADOS, Cathy Lord and her colleagues, you know have come out and said, hey don’t rely solely on ADOS results. You know yes it’s a great instrument and it’s got to be part of a comprehensive battery. So we’ll put that out there at the beginning. I’m not
you know, offering any like hot takes necessarily here that go against, you know, the author’s desires. But there is some nuance and a lot of discussion, especially in recent years, about the utility of the ADOS with different populations and so forth. all that said, these are all just teasers. I’m going to go back to my overview here. So what is the ADOS 2 exactly? So essentially it’s a semi-standard, sorry, semi-structured
standardized assessment of communication and verbal interaction and play and restricted or repetitive behaviors. Typically, you know, it’s administered in about 60 minutes. There is a formal training that you should undertake to administer the ADOS, though I will say the training is kind of all over the place. You know, there many folks out there, myself included, who were trained by trainers and
you know, got experience through a lot of observation and so forth. But these days there are plenty of formal trainings to undergo and that is the preferred means of learning the ADOS. Now there are two sort of tiers. There’s the clinical training and then there’s the research grade training. If you can do the research training, of course that’s best. It is more time intensive, but it theoretically ensures more reliability in rating and scoring and things like that.
So either way, do the formal training if you haven’t. And typically, we administer this in about 40 to 60 minutes. There are four modules is what they call them. Actually, technically, there are five modules. There’s the Todd mod or toddler module for little, little kiddos. There’s module one for pre-verbal or single word users. This is typically pretty young, know, toddlers as well. Module two is when phrase speech comes into.
picture. Module 3 is fluent speech and then module 4 is fluent speech plus. So module 4 is typically reserved for teens and adults. For those with more advanced language. And you tend to select the ADOS module based on the level of language. Many of you know this, but I’m just reiterating some important points. And the scoring, it produces what they call algorithmic scores.
that result in either a classification of autism spectrum or autism. OK, now that language at this point, of course, with the DSM-5 is outdated, autism spectrum is the less severe outcome and autism is the formal, you know, full autism diagnosis, quote unquote. OK, so what’s appealing about this? Well, it’s appealing because it standardized structured observation opportunities
that might not occur naturally. So it created this series of interactions and activities that kind of pull for certain behaviors and give the clinician the opportunity to observe some of those key characteristics that we think align with autism spectrum disorder. It allows for comparison across clinicians and settings. It creates a common language for research and clinical practice. And it reduces, I think, but does not
eliminate by any means subjectivity in diagnosis. And anyone who has done the ADOS for any amount of time, I’m sure you recognize there is still quite a bit of subjectivity in the process. So let’s pivot over to some numbers and some statistics. OK, so what is the research actually saying about the ADOS? Well, there have been a number of meta-analyses over the years on the ADOS. And we’re going to start with sensitivity.
OK, so this is essentially, you know, how does the ADOS do at correctly identifying individuals who actually have autism spectrum disorder? The sensitivity is around 90 percent. OK, so it correctly identifies about nine out of 10 individuals who actually have autism based on independent diagnostic processes. Specificity, on the other hand, this is how well the autism correctly rules out
the individuals who do not have an independent autism diagnosis. And this is around, numbers vary, but about 70 to 85%. So this means that ADOS2 correctly rules out about seven to eight-ish individuals out of 10 who don’t have an autism diagnosis from an independent process. these numbers, again, came from meta-analytic reviews. They should be relatively stable.
So again, sensitivity of about 90 percent, specificity of about 75-ish percent. So how does this compare to other instruments? Well, the ADOS2 tends to outperform other standardized autism diagnostic tools in sensitivity while maintaining reasonable specificity.
So it compares pretty well in terms of sensitivity, it’s reasonable with specificity. One of the other major tools, the ADIR, which is Apparent Interview, has similar sensitivity, but is way more time intensive. Again, if you’ve done the ADIR, it can take hours. So similar, but takes longer. And then we have there are things like screening tools like the M-Chat, which is used in pediatric offices quite a bit.
not great with adults, you know, has a higher, much higher sensitivity, but much lower specificity. So the M-CHAT kind of deliberately casts a very wide net and is not, you know, not as, not as good at ruling out individuals who do not have autism.
OK, so we do have a little bit of a false positive problem, you could tell. So in one large study, about 29 % of the kids without autism received false positive ADAS-II classifications. So that’s one factor. The autism spectrum classification itself, so again, that’s the less severe.
quote unquote, classification from the scoring algorithm. So the autism spectrum classification showed pretty poor predictive directionality. So 56 % of those kids were diagnosed with autism, 44 % were not. So it’s pretty mixed. And what this means essentially is that among kids who score in that autism spectrum range, so they meet the threshold for
something but not a full autism threshold on the the ADOS. Nearly half of them don’t actually have autism. again, autism cutoff provides better specificity with favorable sensitivity. So you can have more confidence in your diagnosis if they reach the full, you know, the autism classification.
So a little bit of just a note on research versus clinical settings. The ADOS does tend to perform better in research settings compared to clinical settings. That makes sense. I think that’s true for pretty much any measure that we administer. But I sort of just made that up just based on my gut feeling. So somebody checked me on that. Evidence from real world clinical context is still relatively limited. research samples, obviously, are often more carefully selected and may not represent like the heteros.
of clinical populations. I think this is pretty.
So let’s transition to a discussion around clinical judgment and the ADOS2. So what do we do when these essentially disagree? So there’s a great study out of the Journal of the American Medical Association, JAMA, Pediatrics in 2022 that looked at what happens when developmental behavioral pediatricians make diagnoses based on clinical assessment alone versus incorporating the ADOS2 results as well.
So they found 90 % agreement between clinical judgment and ADOS2 informed diagnosis. So clinician diagnostic certainty was the most robust predictor of consistency. That just kind When clinicians were highly certain, the ADOS2 results rarely changed their minds. But when the clinicians were uncertain,
The ADOS2 sometimes helped, but sometimes added confusion. I would imagine that many of us have had this experience as well, where it’s almost like using CPTs in ADHD assessment. If you’re feeling pretty certain, it’s not going to add a whole lot. If you are uncertain, it can sometimes help, but it also can be confusing, depending on the outcome. So the evidence.
Consistently emphasizes that diagnostic tools should inform but not supersede clinical judgment. Right. This is what I alluded to at the beginning. ADOS2 should be regarded as more of an adjunctive aid within a comprehensive multidisciplinary assessment. All right. So just to make it super clear, a score below the cutoff should not automatically rule out autism. A score above the cutoff should not automatically confirm autism.
Now some of you may be thinking what does comprehensive assessment even mean.
Dr. Jeremy Sharp (16:53)
Hey, everyone. I’m really excited that NovoPsych Psychometrics is sponsoring the show. NovoPsych is a platform for psychologists who care deeply about assessment and testing and want their self-report measures to be the very best. NovoPsych has an extensive library of 150 standardized instruments with strong coverage across the presentations many of us assess every day, like disability, functional impact, autism, ADHD, and a wide range of symptom measures.
You can also use it for broad personality assessments like the Big Five or go deeper when you’re looking to understand personality pathology. What makes NovoPsych different isn’t just the range of scales, it is the quality of the experience. So I really appreciate the depth of psychometric info that it provides and the clear graphs and visualizations that make results easier to interpret and communicate. If you want to try NovoPsych psychometrics, you can access a 15 day free trial via the link in the show notes, which is
novopsych.com slash testing psychologist. That’s N-O-V-O-P-S-Y-C-H dot com slash testing psychologist.
Dr. Jeremy Sharp (18:01)
It means a great developmental history from caregivers if possible. It means cognitive and adaptive testing. means speech language evaluation if you can do it. It means clinical observation across contexts. It means self-report and it means assessment of co-occurring conditions. So pretty straightforward explanation of comprehensive evals. This is going to be
going to be necessary if you’re looking at autism. And the ADOS is just a
I want to talk a little bit more about the specificity problem and who gets misidentified in this process. So let’s start first with non-autism clinical populations. OK. So when in the research, like when the comparison group is typically developing children, specificity is reasonable. That’s that 70 to 85 % number. But.
And this is relevant for a lot of us in the clinical realm. When the comparison group is kids with other developmental or psychiatric conditions, specificity drops significantly to like 50%. So again, this is the clinically relevant part that we aren’t typically distinguishing autism from typical development. We’re distinguishing autism from ADHD and anxiety and language disorders and intellectual or cognitive concerns and that kind of thing.
So what are the things that can tend to produce false positives on the ADOS? One is social anxiety disorder. So avoidance and just awkwardness in social situations is going to score pretty highly on the ADOS. ADHD, of course, so inattention leading to missocial cues, impulsivity, disrupting the conversations, and maybe resulting in some inappropriate behavior. That can certainly score high on the ADOS. Language disorders.
Any communication difficulty is going to flag pretty clearly on the ADOS. Regarding intellectual disability, you know, this is we’re going to have developmental delays theoretically across domains. so, you know, the communication is one of those engagement, reciprocal behavior. These are all domains that we’re looking at on the ADOS. And so that’s going to be a concern as well. And there is some research around attachment stuff as well. So.
you know, does the kid have just social difficulties stemming from kind of early adversity or attachment concerns? So any of those disorders where they’re disrupting the communication and reciprocal interaction, those are going to score pretty highly on the ADOS. You can also run into some difficulties with repetitive or routine behavior. you know, kids with executive functioning concerns like rigidity, even some OCD stuff, you know, can
flag on the ADOS in those areas. So what does this mean as far as like clinical implication? Well, I mean in a specialty clinic where most of the referrals have some developmental concern, the ADOS2’s ability to distinguish autism from not autism is substantially weaker. And that’s where like we have to integrate the ADOS2 findings with a developmental history and pattern of difficulties in response to intervention.
So now I want to get into a discussion specifically around quote unquote high functioning individuals or individuals with low support needs, we might say. This has been an area of debate over the years. Like how does the ADOS do at identifying autism individuals who have lower support needs? Let’s define this population just for a second. again, quote unquote, quote unquote, high functioning typically refers to individuals with average or above average
intellectual ability and fluent language. OK, so this is typically we’re looking at like module three or module four, depending on the age of the individual. You may think of this like historically, you know, these are the individuals who fell on that Asperger’s syndrome classification before the DSM-5 kind of unified everything on the autism spectrum. All right. So what’s the good news? Good news is that module four shows pretty good diagnostic performance for
adults with an IQ over 70. That’s a big range of course, but that’s out there. The research is pretty good on that. The revised algorithms developed for the ADOS 2 substantially improved sensitivity for high functioning, low support ASD compared to the original ADOS. OK, so that’s the good news. Challenges though.
Earlier research on the original ADOS did note lower specificity and sometimes sensitivity for distinguishing kids with milder presentations of autism. The other thing, individuals with really strong verbal abilities can kind of talk their way through the assessment and come across as pretty capable. Learned social scripts and compensation strategies might not be captured by the algorithm items. OK, so those
And those of you who do a lot of ADAS, as you know, like not every single item that we score ends up in the algorithm. sometimes those, again, learned social scripts and compensation strategies might not be captured by the actual algorithm items. lastly, mean, the assessment is a relatively brief, like structured interaction. So individuals who struggle in unstructured or novel or prolonged social situations might appear competent. Might.
But this is where I’ll put in a plug where I run into a lot of folks who are doing kind of like modified ADOSes, especially module four, because you don’t want to make the person feel awkward in these kind of stilted activities or childish activities. But I think that’s actually where some of the gold comes from, because a lot of those activities are less structured. So I’m thinking if you are familiar with module four or three,
These are the activities outside the questions. So there are several question-based activities where you just are having a conversation or asking questions of the individual. The others are the ones that are more unstructured, more novel, and more likely to elicit some behaviors that might be valuable in the process of diagnosing autism.
So then we have like kind of a compensation paradox as well, where the very individuals who’ve developed the most sophisticated coping strategies are the ones most likely to be missed, right? And so it creates a weird kind of incentive where the harder that you’ve worked to adapt or the more intervention you may have had, the less likely you are to be recognized as being on the autism spectrum.
Let’s pivot just for a bit to the sex and race discussion. This has also come up a lot. OK, so sex differences in ADOS2 scores. Females with autism tend to score significantly lower than males on the ADOS2 total and subscale scores. Females are less likely to show atypicalities on most social communication items. And a 2024 study of
Module four results found that female scores did not correlate with ADIR scores indicating weakness when applied to adult women or female presenting individuals. There’s also a little bit of kind of item level analysis out there that we could talk about. This is again from JAMA in 2022. So the hand mannerisms items specifically demonstrated consistent bias across all the ADOS modules. So
Moderate effect sizes would suggest an underestimate of autism in females. The restrictive repetitive behavior algorithm includes only four items. And so, you know, if you have like 25 % of those items, like one out of four that consistently underestimate autism in females, that’s particularly problematic, I would say. There are also some factor structure differences where, you know, confirmatory factor analysis suggests that
Like the latent factor structure of the ADOS2 module 3 might differ between males and females. So it kind of challenges the assumption that the ADOS2 measures the same constructs identically across sexes. That’s essentially what we’re getting at here, that it may not be consistent. So then we get into this kind of interpretation debate. And this will be the topic of a future episode here in this mini series. it brings up these questions like, do these differences in performance
you know, reflect true phenotypic variation, like, you know, where female presenting individuals are exhibiting fewer or less intense autistic behaviors, or do they reflect more measurement bias where, you know, the tool, the ADAS, is less sensitive to female presentations? We think the answer is likely both, but the clinical consequence is the same, which is that females tend to be under-identified. Now,
Let’s talk a little bit about the race considerations as well. So the same JAMA network open study also examined the racial component. Some items showed differential functioning by race, though findings were less consistent than sex differences. was, I mean, the ADOS2 was developed and validated primarily on white samples. So read into that what you will. It seems problematic.
Cross-cultural validity, I think, remains like a huge area requiring more research to investigate that. So that seems pretty clear to me. The hope is that we’re going to be able to bring on an expert or two in this particular area in the not so distant future who can comment specifically on cross-cultural assessment with the ADOS2. All right. So I’ve thrown a lot of information at you.
I’m going to start to wrap up here with some clinical recommendations and takeaways just to pull it all together. So if you are a clinician and you’re using the ADOS2, one, never use the ADOS2 as the sole basis for diagnosis. It is one piece of a comprehensive eval. Scores below the cutoff do not rule out autism, especially in females, low support or high functioning individuals, and those who camouflage or mask.
We’re going to talk a lot about camouflaging and masking in episodes three and four of this mini series. The third thing, scores above the cutoff do not automatically confirm autism, especially in individuals with other conditions that might affect social behavior, which is quite a lot, to be honest. The fourth thing, you want to consider the base rate. So in a specialty clinic, false positives are more likely than in a general population screen, simply because we are running into many kids with developmental
and there’s a lot of overlap. The fifth thing, you definitely want to integrate developmental history. you know the ADOS2 captures current behavior, but autism is a lifelong neurodevelopmental disorder. So developmental history is super important. Even with adults, we have to do our best to get some kind of developmental history. And then the last thing is just being aware of sex differences where females may require lower thresholds or different behavioral examples.
So big picture, ADOS2 I think is a really valuable tool. We use it all the time in our practice. We’re not gonna stop. But it was developed based on a particular understanding of autism, one that was derived primarily from males. And as our understanding of autism’s heterogeneity expands, the diagnostic tools have to as well. mean, there a lot of folks asking about the ADOS3 and what’s gonna be different there. I know they’re working on it. Gold standard is certainly not a perfect standard.
OK, so it is at this point, I think, the best instrument, but not a perfect instrument by any means. So what’s coming up next? We are going to be talking more about this female autism phenotype. So like I said, one of the most significant limitations, I think, of the ADAS is working its performance with females. So it’s not just a measurement problem. It does reflect.
deeper questions about whether autism itself looks different in females. And so in the next episode, that’s what I’m going to be digging into this concept of female autism phenotype. So stay tuned. Like I said, subscribe or follow depending on where you are listening and make sure that you catch all the upcoming episodes. As always, thank you for being here.
Click here to listen to the podcast instead.
