This episode is brought to you by a PAR.
PAR offers the RIAS-2 and RIST-2 Remote to remotely assess or screen clients for intelligence. And they also offer in-person e-Stimulus Books for these two tests for in-person administration. Learn more at parinc.com.
Hey, everyone. Welcome back to another episode of the podcast. Today is a clinical interview day. We’re talking about a topic that all of us should care a lot about but many of us, including myself, don’t pay enough attention to. And that topic is measurement error in pediatric assessment.
[00:01:00] My guest today is Dr. Vanessa Torres van Grinsven. Vanessa currently works at the section for research methods of the Department of Special Education and Rehabilitation, Faculty of Human Sciences, at the University of Cologne. Ze runs the Consultancy Centre for Empirical Research of the Faculty of Human Sciences and teaches research methods from a qualitative, quantitative, and mixed methods perspective.I need to correct myself actually. As of April 1st, ze started working as an Assistant Professor in Methodology at the Faculty of Psychology of the Open University of the Netherlands, Department of Theory, Methods, and Statistics. Vanessa studied Social and Cultural Anthropology and Philosophy at the University of Barcelona in Spain and later on obtained her Ph.D. at Utrecht University in the Netherlands, on a research methodological [00:02:00] topic after which ze specialized in research methods for the social and behavioral sciences, including an interest in measurement issues and personality diversity.
I talk with Vanessa about the topic of measurement error in pediatric assessment. We get into, of course, the basics of measurement error: what that means and why we need to care about it, but more importantly, we talk about how it manifests in our testing. So we spend a lot of time on the intersection of personality and test performance and the idea or the debate, I suppose, about tests being completely objective and sterile, or whether there is a more organic interaction between the tests and the individuals taking them.
It’s a great episode. I think there’s a lot to take away as always, and Vanessa was a fabulous guest. So without further ado, let’s [00:03:00] listen to my conversation with Dr. Vanessa Torres van Grinsven.
Hey Vanessa, welcome to the podcast.
Dr. Vanessa: Hi Jeremy. Thank you. It’s really a pleasure to be here and talk about some issues.
Dr. Sharp: Yeah. It’s funny. I reach out to a lot of folks to come on the podcast and sometimes I’m a little more certain that they would be open to it but in your case, not only is there a lot of geographic distance between us, but time zone stuff and you have no idea who I am. So I am really grateful that you responded to my random email and agreed to come on and talk about measurement error in pediatric testing. Thanks.
[00:04:00] Dr. Vanessa: You’re welcome. And that’s true. I do get some spam but yes, of course, it’s always very important to talk about research results and disseminate and podcast. It’s a really good opportunity to talk about my research work and disseminate it to let other people hear about it and not only by reading an article or something.Dr. Sharp: Sure, yeah, good. Well, I’m glad you’re here and I’m excited to talk about it.
Dr. Vanessa: If I can say something more, I think it’s very good that you do this, that you record this podcast and place them online. I think it’s a very good way to disseminate knowledge. So thank you for that. Thank you for your work.
Dr. Sharp: Oh yeah. Well, I appreciate that. Yeah, it’s really all due to guests like yourself who are willing to come on and have these conversations. So it’s a joint effort for sure.
So [00:05:00] we are talking about measurement error in pediatric assessment. I will start with the question that I always start with, which is, of all the things that you could focus on in your career in psychology and in this whole world, why spend your time researching this particular topic?
Dr. Vanessa: Why this? Well, to start with, and as I also described in my article, these tests, they’re used a lot. The WISC is used a lot for diagnostics. And then based on these diagnostics, decisions are made for children about treatment but also maybe about schooling. Not only the WISC but also many other standardized tests are used nowadays in our society to assess children, and then based on these tests, a lot of important decisions are made.
[00:06:00] So for that, I think there is really general importance in doing research into this test to make sure that they function properly. So, that is maybe the rationale behind this research.I’m not a psychologist but I’m a methodologist with a background in philosophy and anthropology, but I wrote into this methodology field doing my PhD research and then teaching research methodology. And then I got into the field of psychology and research methodology for psychology and from this into measurement and diagnostics. So, that is the trajectory through which I got into this.
I’m interested in this topic. [00:07:00] And this interest comes from this professional background but also as a mom, I have 3kids. I have a lot of experience with all these standardized tests that kids need to do nowadays.
Another thing is, I talked about my Ph.D. research, I also worked in the field of strata methodology and questionnaire construction. In this field, there is a lot of focus on measurement errors. And in trying to reduce this measurement error using certain procedures, for example, qualitative pretest procedures that are not used in psychometric testing.
So with a combination of this professional background and experience with my kids, I got interested in [00:08:00] this topic. And so based on or related to testing of my kids, I also got a discussion with psychologists, and because of this background I have in questionnaire construction from strata methodology, I did ask them, do you not think that maybe some measurement error could occur due to certain characteristics? I thought about things like anxiety or fear of failure, insecurity, or maybe attention issues.
And these few psychologists that I talked to, I don’t know about all psychologists in the world, of course, they were of the opinion that this could not happen. So this really sparked my interest always if it’s really possible.
And from my background in strata methodology and questionnaire construction and also knowing a bit how psychometric tests are constructed, this sparked my interest and [00:09:00] I decided to do some research into this, which I just have just started actually. So that is a lot of research to do.
Dr. Sharp: Sure. There’s so many so many pieces of that that we could talk about. It sounds like for you, you had a similar experience to a lot of us where it’s this intersection of personal experience with professional interest, and you have the good fortune, I suppose, or the skillset to really dig into something that felt important like your kids getting assessed and realizing like, hey, there could be some problems here but people don’t really pay attention to those problems.
Dr. Vanessa: That’s true. And it’s also related maybe to general discussion about these tests that we have in the Netherlands and issues that could go wrong. I also had the luck that I had the time and availability to do this. This is also a luxury, I think, [00:10:00] to be able to choose your research topic in some way and something that really sparks my personal interest and I think is really important in general nowadays.
Dr. Sharp: Yeah. I completely agree. I think that this is a topic that for a lot of practicing psychologists maybe flies under the radar a bit. It’s something that we learn about maybe in graduate school. We have courses on test construction and standardization and measurement error and the statistical, like the methodological aspects of test development.
Maybe I’m just speaking for myself, but I think a lot of us quickly forget about all that stuff once we get into practice and just get in the habit of using certain measures and we lose that scrutiny or that microscope that we applied back early in our career [00:11:00] when evaluating measures. So, I’m glad that we’re having this conversation. It’s very important, right?
Dr. Vanessa: Yeah, thank you.
Dr. Sharp: Well, let’s start at the beginning. I would love to have you define measurement error as it pertains to intelligence testing. That was the focus of your article, so we can at least start there.
Dr. Vanessa: Yes. Measurement error- I think the best way to talk about this is to focus on the three-point of measurement instruments. So in psychology as in other social and behavioral sciences, we design measurement instruments. So, there is a concept or a construct we’re interested in, these are many different types of constructs and concepts in psychology and other sciences, for example, motivation but also intelligence.
It starts [00:12:00] also with the definition of that concept that we are interested in and that we want to measure somehow. Then based on this construct, we define a measurement instrument, and every measurement instrument has issues of course. No measurement instrument is perfect. That may be also something we forget in practice that no measurement instrument is perfect and there could always be some error or some bias.
So we have a measurement instrument to measure intelligence, so then we have a certain definition of intelligence. That is also an issue actually but maybe not a topic of discussion today, what the definition is of intelligence. I think also that, that is a lot of literature and a lot of discussions. So you could sell, I think, also Hope Podcast only talking about the definition of intelligence.
[00:13:00] Dr. Sharp: Absolutely.Dr. Vanessa: But let’s say we have this definition of intelligence maybe in general as a general mental capacity. Then what we want to measure in this test is this general mental capacity. And then measurement error would mean that while we are measuring this general mental capacity, there are actually also other factors that come into play into the results of this test.
Other factors influence the score, so to speak that would, let’s say, not be part of this original concept of intelligence. You could define then that as a measurement error. So when there are other factors, you can also call them confounding variables that have an influence of this measurement because of which the score would not be [00:14:00] what is called the true score. It would not correspond to the true score or also called the consensus score.
These are two different ways to be noted the true score or the consensus score. So you could say that there is a consensus of what is intelligence, what would be the valid score to measure, and if we get a score that does not correspond to this consensus score, then we have measurement error. We could define that as measurement error from a very methodological point of view.
Dr. Sharp: Right. Is it an oversimplification to say measurement error is anything that influences the score that is not the pure construct that you’re trying to measure? It’s any external factor. It’s anything that might make its way into the score that’s not exactly [00:15:00] what you’re trying to test.
Dr. Vanessa: I think that would be a good definition for that. Yes, exactly.
Dr. Sharp: Okay. Maybe we could talk about some examples and kind of bring this to life. What are some of those variables that might interfere and lead to some measurement error?
Dr. Vanessa: Yes. I got interested in this issue and so I had different research ideas, I could not yet carry them all out but I started with doing a review, so I just wanted to know, has that already been researched? That it researched into this, what you could call measurement error.
In the literature, it is not always called measurement error, it could also be called test bias. Actually, in psychometric testing development, there is a lot of literature on test [00:16:00] bias. Test fairness is related to this test bias.
But what I was interested in is a different type of test bias that has not been researched that much, at least not from the psychometric development point of view. It has been researched, fortunately, so I did find some research in my review. I did not do a systematic review, so it is a review of the literature but not a systematic review. I just found some research and I wrote about it. And for me, that was already, let’s say, important enough or good enough to write an article about a summary, like look, this is some research I found, empirical research that shows how some factors get to influence the score in an intelligence test.
And some of these [00:17:00] researchers, they do focus on the WISC because I wanted to focus on the most used intelligence test worldwide which is, according to some authors, the WISC scale, Wechsler Intelligence Scale for Children. I thought it would be good to make it more concrete and focus on that test and select research that has the research on those tests. And from here they found several sources of measurement error. I call them sources of measurement error.
So again, sometimes terminology is a bit different in the literature. I feel that sources is a good way to talk about them. And they found several sources. What comes back a lot is, for example, anxiety in children, depression, social-emotional [00:18:00] problems, stress and coping variables.
Motivation seems to be an important confounding variable, so motivation to take the test. Extrinsic motivation is when you think about material incentives. Examine the interest corporation. Some researchers measured traits like avoidance, and inattentiveness in cooperative behaviors. These have an influence on the results of the test. Negative emotionality and then again, avoidance.
What also comes back a lot is gender but I personally am not that much interested in gender but more on what could be behind it because gender is often related with personality traits. My personal interest is more than these personality traits and how [00:19:00] they would be related to the results to this test.
Dr. Sharp: Could we go down that path just a bit since you are interested in that area, the personality components?
Dr. Vanessa: Yes, I’m especially interested in personality components that would have an influence on the results of these tests. And so from the viewpoint of test bias and test fairness, I also said that there’s been a lot of research within psychometrics to do research on this test bias related to a cultural background and ethnicity but there has been no research into personality traits.
I guess also because there are no data. To be able to do research on test bias due to personality traits, [00:20:00] you have to have data for that. You have to have collected data for that. I did do myself also an analysis but this was not with children. I found a very nice data set where there was some kind of verbal ability test but there were also some data and personality traits and demographics.
I did actually found a strong correlation between personality traits and the result of the test and especially the response behavior because, for me, that’s one important point. So I’ve just been mentioning some examples of sources of measurement error. And this can be explained when you look at making such a test, on one side the process performance approach that I discussed in my article.
So that’s [00:21:00] not something that I thought of myself. So I cite some researchers in my article and they were the first to describe this process, performance approach to intelligence test results. And this is opposed to another approach that sees intelligence more as a stable characteristic.
And this then also goes back to the concept of intelligence and related to the concept of personality. In general, and there’s also discussion about this but in general, in psychology, intelligence and personality are seen as different aspects of a person, but at the same time, also research shows that they are fairly correlated.
At least research shows that certain tests like intelligence tests but also auto-standardized mental ability [00:22:00] tests, like in USA, tests are also done in school. I think they’re called the GPA and the SAT, they are very much correlated with personality and also intelligence tests. Empirical analysis shows that the results of intelligence tests are correlated with personality traits.
And then you could then again approach this from different viewpoints but if you still want to see intelligence as an aspect that is different than personality, if you want to see them as a dichotomy, then you could wonder whether maybe there is an influence of these personality traits on the results of an intelligence test. And you could explain that.
So a user in the process performance approach but also using a more interactionist approach [00:23:00] as opposed to the modular approach. So generally, tests are seen as an objective and neutral, especially neutral measurement instrument. Another point of view is seeing, let’s say the test-taking process as an interaction.
So an interaction occurs between the test and the person. And in this interaction, the personality comes into play. This then leads to a certain response process which then leads to the outcome of the test. So this is a whole different way of looking at this test and at the process of getting to the score from that test.
Dr. Sharp: Yeah. And to me, that just makes intuitive sense that it would be more of an interaction [00:24:00] between the test and the person, personality factors are going to make a huge difference in how someone responds to a test but also interactional components between the examiner and the clients. That’s a whole other area that I imagine we could dig into.
Dr. Vanessa: Yes, and that makes it maybe even more complete. Maybe you could have a test and a person making the test by themselves, so there is an interaction between the test and this person. But when you have a test taker, so with children, I believe when you do the WISC, there’s always a test assessor so you even have a more complicated interaction, so that you have the child, the test taker, and the test in this interaction.
And this approach, this interactionist approach, may be something that comes more from qualitative research and has not really gone into quantitative steps of [00:25:00] research, of which psychometric test development is a part. I think that this could be an improvement to use this interactionist approach to do research on this response process occurring when going through this process of doing this test.
Dr. Sharp: Yeah. Let me back up just a little bit. You talked about personality characteristics being related to test-taking and outcomes. And for folks who are not familiar with that research, can you share some of that in terms of what personality characteristics we know of that are related to test performance? Anything that we know about that?
Dr. Vanessa: I think that there is no research on that. In my review of my article [00:26:00] they’re also personal, emotional, and motivational factors that I discussed, but there is nothing much yet about personality.
Dr. Sharp: Okay, maybe I misunderstood. I apologize. I thought you had said that there was a relationship there with…
Dr. Vanessa: Oh yeah, but that’s true. Yes. But that’s not from the free point of measurement error but there is research into the correlation between personality and test results. So both GPA and SAT and also intelligence test. And so if I remember well, and actually I have this research here, so maybe I can just try to remember.
In personality, of course, you have different personality inventories, and one of the most used is the Big Five Personality Inventory. And especially the traits, [00:27:00] openness to new experience and I think neuroticism are related to results of intelligence measures. This does actually fit in certain frameworks of intelligence where openness to new experience, one part of it is also a thing even called intellect. So that is already conceptually a correlation between these.
Can I mention the literature if someone would like to read it? There is a chapter by DeYoung called Intelligence and Personality, which is published in the Cambridge Handbook of Intelligence. And here this dichotomy between intelligence and personality is discussed and also some correlations. I think the only significant correlation that he finds is between openness as in personality traits, [00:28:00] openness to new experience, and intelligence.
And then there is research where GPA and some scores are correlated with also personality inventories. I think they also used the Big Five personality inventory if I’m correct. Because there are different personality inventories that are based on different conceptualizations of personality and intelligence. I think they also mainly find a correlation between openness to new experiences and the results of the GPA and SAT, or I do not know how you call them in the US, are they called SAT?
Dr. Sharp: Yes. SAT.
Dr. Vanessa: SAT, okay. And so this could mean that these two are actually not really separated, that they’re part of the same [00:29:00] thing, phenomenon. In reality, it could also mean that we are confusing the measurements. These are two different viewpoints that you could have to explain these research results.
And then, oh, yes. Also, these people who did research on the correlations between the GPA and SAT scores, they also found correlations with conscientiousness and maybe also agreeableness but the largest correlations, if I remember well now are with openness to new experience.
Dr. Sharp: Yes. That makes sense to me. And so these are factors that we need to be aware of as we are administering these tests to our clients, which gets back to this idea of an [00:30:00] interaction between the test and the client. Can you say any more about this, like when you say an interaction between the test and the client and what that entails?
Dr. Vanessa: Yes. I can look at it from this viewpoint that I have from questionnaire construction and strata methodology where you look in detail at the questionnaire. So you look at certain types of traits also of this questionnaire, like question-wording, answer format, even layout. In strata methodology, it is proven that all these are really related to the data that you get from such a questionnaire.
So how you word the question, of course, but even the order of the question and the answer format has a lot of influence on the data that you get. And when you look at intelligence tests and [00:31:00] at the WISC for example, of course, you always have the content of what is asked for. So the WISC, do we say it like this in English?
Dr. Sharp: Yeah.
Dr. Vanessa: The WISC has several indexes or indices, again, I’m not sure how you say it in English and subtest that each one of these are supposed to measure a certain ability in the child. So that is, let’s say, the content but that’s also the form. So this is asked in a certain way. Some parts of the WISC-III, at least, I’m not exactly aware of the WISC-V. I think you’ll be able to tell me better, are with a time restriction, for example.
Dr. Sharp: Sure.
Dr. Vanessa: Research shows that certain personality trait react [00:32:00] differently to this time restriction, which you can also logically explain. If someone is more insecure or has anxiety or has fear of failure, this time restriction can function in a negative way on the performance. So this would lead to a lower performance but that could be also personalities by which this would lead to a higher performance. So I can also imagine a personality that is actually stimulated by a time restriction.
So in this way, personality interacts with the form in which this trait is tested. And you could look like this at each one of this subtest and look at all these characteristics of this subtest, which I have not done yet, but this would be one of the next research to do to really look in detail to all of these subtests and to [00:33:00] make an inventory of the form, how they are questioned and to research on how different personalities react differently to these different subtests.
So on the other hand, that could be also subtests for example, that could also be personalities who are more impulsive in their reaction, and depending on how the test is designed, this then can have a negative or a positive influence on the results of the test, on the score. And that was important to know or to think of is whether this influence, so I talked about time restriction, it can be related to, for example, anxiety or impulsivity, if these are part of this concept that you want to measure, or if they are external factors and these confounding variables in this measurement.
But then the thing is that also about this, that can [00:34:00] be also discussion. So someone could say, but this is all part of intelligence. Again this discussion, what are we measuring? And I’m really measuring what we want to measure.
Dr. Sharp: Mm-hmm. Yeah, the question was coming to my mind that’s sort of a devil’s advocate question, which is okay, yes, I think it’s a given that personality, we’ll just say personality, is going to influence test performance right? And we’ll just take anxiety. Some folks get more anxious when things are timed and that decreases their performance. Some folks get excited about time pressure and that amplifies their performance and makes it better.
So the devil’s advocate question is, well, can’t we just trust that all of that variability is accounted for in the test development and standardization? How much do we really have to pay attention to that?
Let’s [00:35:00] take a break to hear from our featured partner.
The RIAS-2 and RIST-2 are trusted gold standard tests of intelligence. For clinicians using tele-assessment, PAR offers the RIAS-2 Remote, which allows you to remotely assess clients and the RIST-2 Remote which lets you screen clients remotely for general intelligence. For those practicing in the office, PAR has In-person e-stimulus Books for both the RIAS-2 and the RIST-2. These are electronic versions of the original paper stimulus books that are an equivalent, convenient, and more hygienic alternative when administering these tests in person. Learn more at parinc.com\rias2_remote.
All right, let’s get back to the podcast.
Dr. Vanessa: I would say, based on the reviewer of research that I did that we cannot trust that. We can’t.
[00:36:00] Dr. Sharp: I thought you might say that.Dr. Vanessa: That is really this influence and let’s say I don’t have answers. I just have this review, let’s say. I think this deserves more research. And up to now, all these tests, they’re researching a lot with all these psychometric procedures, or we could also do a more qualitative type of research into this response process. So that is also my suggestion to do more in-depth and detailed research into the response process occurring to see this from the interactionist point of view, from the process performance point of view, and wonder.
So when we do an intelligence test, we assume that we get or we want to get a maximum performance of that person, but depending on how the test is designed in relation with personality, we [00:37:00] could or could not get this maximal performance. And this is maybe more normative. It’s not scientific but normative. Maybe if we want to give everyone a fair chance, we should do research into how to give everyone a different personalities, this fair chance of getting this maximum performance.
And if time restriction is negative to certain personalities or positive to certain personalities then maybe this is not fair. So we could think of this from the point of test fairness and inclusivity and also diversity. What do we want to do with this? And this is maybe a normative question and based on our values, so the choices we would make about this test.
Dr. Sharp: Yeah. This is such an interesting [00:38:00] discussion. I couldn’t anticipate this is where we were going to go but here we are and I’m really interested in this whole thing. My question is, yes, theoretically I agree with that proposal, that our tests should be as fair as possible to as many people as possible. Like if we’re really trying to measure intelligence, we should find a way to do that in a way that maximizes everyone’s potential on these tests, right?
Dr. Vanessa: Mm-hmm.
Dr. Sharp: On the flip side, and this is just a theoretical question, I’ll put it to you to see what you think. With the amount of variability in personality and performance factors among kids, let’s just say kids, is it possible to do that without having 90 different intelligence tests to capture all these factors that [00:39:00] would maximize people’s performance?
Dr. Vanessa: Well, I don’t know. I think maybe possibly not, maybe yes. This would make the situation very complicated and maybe it would not be possible to design a test that can be adaptive to different personality traits or that can be really completely neutral. So maybe this is not possible.
I think for now, concerning the state of the art at this moment, I think it would deserve a chance to continue doing the research on this and to try to improve this test in general, but maybe also a solution would be to accept that this occurs but then especially to take this into account and how we interpret tests and how we use the results of these [00:40:00] tests.
Nowadays results may be I use very much as an absolute value. So you have an absolute value below this value, this happens below that value that happens. And maybe it’s important from a more qualitative viewpoint, then also look at these other issues and that, for example, maybe a psychologist when doing a test, maybe you can already identify certain issues in children of which that maybe research shows that it has an influence.
So when you do this interpretation and you decide on treatment, or do you decide on something different like in schooling then you take all these other considerations also into account. Maybe that would be also a part of the solution of this issue.
[00:41:00] Dr. Sharp: Yeah, that strikes me as a much more realistic and timely solution to this problem. Let’s assume maybe we could develop tests that maximize everyone’s potential maybe but that will probably take time. And in the meantime though, we are left with this question of, okay, how do we use the data that we have as far as, if we notice that a kid is really anxious during a timed test?I’m curious how you look at it in terms of interpreting that information. How do you actually use that or how would you recommend that we use that when we’re interpreting the data? Even if qualitatively, I think that’s how it would have to happen, right?
Dr. Vanessa: Yeah, of course. That’s something that I don’t know. I think I also couldn’t be [00:42:00] able to tell you because you know more about that than I. You are a psychologist. You are practicing and you are allowed to make those kinds of decisions.
Dr. Sharp: That’s fair.
Dr. Vanessa: And I’m not even, I don’t have the papers for that, let’s say. Maybe in general, in society, we could try to be more tolerant of these issues and less strict with this kind of test results. And this again has to do also with inclusivity and diversity. So maybe in general, we live in a society that asks a lot of people and we always have to perform at our maximum. And maybe some people then are disadvantaged by that.
But I think in general, so as research shows that these things happen, this could be taken into account in the interpretation and [00:43:00] it’s true, it would take a lot of years or time to do research on these issues on this response process, and from this side to develop new tests. And that’s true. So that would be a long shot.
Maybe an issue also is that some of these most used tests like the WISC are protected by copyright and they cost a lot of money. And this also makes it harder to do research on them. So it makes it more hard for researchers like me, I’m interested in the response process and I would like to do research on this because I feel it’s an important topic for society but it’s hard because of this protection of the test.
So maybe also, I do not know if that’s a solution, but moving towards more open access tests that are more easily available for researchers to get into and to research on, to [00:44:00] help develop them and improve them would maybe also be something to think about.
Dr. Sharp: Yeah. I think it’s okay to dream a little bit and think about how we could figure this out a little bit more. It makes me think about the push for digital assessment measures and all the capabilities that we think we have, as assessment measures become more digital, where we’re measuring things behind the scenes that clients may not be aware of. I’m thinking like response time or whether they get the answer right or not.
Digital assessment hides some of those factors a little better than paper and pencil administration. I wonder if that opens some doors to make testing a little less [00:45:00] intimidating for those folks who don’t do well if they feel more anxious during test-taking situations, if we can hide some of those variables a little bit. That’s just one thing that comes to mind for me.
Dr. Vanessa: Yeah. I think that could be a solution because then you would have all these data that would at least help you to assess these issues. And then you could also use them as a control in the score in the final score. Then still the issue is then because this has not done that much yet, I talked about these pretest procedures that I used a certain methodology and they seem to be less used in the development of psychometric tests though they’re also part of the standard.
There is this document, is written conjoined by APA and [00:46:00] some other American organizations. I think I can look at the name of but maybe it’s not so important. The thing is these pretests are also part of these standards of development psychometric tests.
What is not short for me is how often or how much they’re really used in the development of these tests. And that would be also something, again, to do research on. So yes, if the APA and then the American Educational Research Association and the National Council on Measurement Education. So there is a standard guidelines that they published and pretest procedures, according to these guidelines, are part of this psychometric testing development.
What I’m not sure about is how much these are really used in this psychometric testing development. I would [00:47:00] advocate, and I also do this in my article, to do this more or a lot. For example, the WISC could be investigated in detail using strict test procedures. With this, you could identify issues that occur in each of the different subtests. And with this, you could then redesign the test and improve it to make it more fair.
And this would also then be the case for these digital types of testing because you have done the interaction of the test with the person. And design issues can have an influence on the results that can just interact with different personality factors. So even in that case, this would be important to more intensively add pretest procedures to the design process of [00:48:00] psychometric test.
Dr. Sharp: Do you have any idea how widespread that is now? Like this pre-testing procedure during test development, or is that actually happening?
Dr. Vanessa: I think it should be happening somehow but maybe it’s not happening that much. That would be also a good, let’s say, meta-analysis to collect all these articles and see how many researchers or developers have used the pretest procedures. I also have this advisory function for PhD students and at the department where I work at the university in Cologne, they do develop a lot of measurement instruments.
I advise some of them to do these pre-test procedures and they had not started of it before. So from their supervisors, this advice did not come. So that I know [00:49:00] the idea was not there yet to do this. I guess it should be done somehow but maybe not that much.
Dr. Sharp: I see. Yes, there are so many directions that we could go with this discussion. I love this. I think the important theme that I’m pulling from all of this is that there are so many factors that we need to be aware of as we’re administering these tests, that we really can’t trust that just because a test is well-known or widely used, that the results are going to be a 100% accurate. We have to take these personality factors into account.
It also gives me confidence that robots are not going to take our jobs anytime soon because it’s not just the test [00:50:00] administration, of course, it’s the observation and interpretation of these other factors that might be influencing someone’s performance. And until we have really, really good biometric measurement and facial scanning and who knows what else while someone is taking a test, that falls to us to do that work.
Dr. Vanessa: Yeah, I think even so with all these, let’s say robots or algorithms, it’s always still what you put in there is what comes out. And what you put in there is written by people. Even that is always a human factor. And only maybe it gets more generalized but it doesn’t mean it’s neutral. I don’t think that algorithms are usually neutral or at least there can be some biased component in there.
Dr. Sharp: Yeah. I think that’s fair. [00:51:00] We’ve touched on lots of points here. This is actually shocking, but it seems like there is a lot of room in this area to do more research, which is funny to me. It seems like this is kind of an intuitive piece of what we do, that of course, personality is going to influence test performance, but it sounds like there’s a lot of room actually to dig into this, is that right? And do more research?
Dr. Vanessa: I think there is a lot of room to dig into this and do more research. Like I said, this psychometric test bias research has not really been done with personality true traits. So there is a lot of room in there to improve this test. And especially it’s also this viewpoint that I mentioned, the interactionist viewpoint.
[00:52:00] It could be a shift in viewpoint because like I said, this is maybe more of a viewpoint that comes from the qualitative type of research and where you see measurement instruments not as neutral or actually never as neutral but there is always this interaction. And when you see things from this point of view, then I think especially it is a realization that the real world is really very complex and it is very hard to model with statistical procedures.But this is also then a few points that you can discuss. People have a viewpoint that we can model real reality. Others would say, reality is so complex, can we even model it? This is then more a philosophical discussion about the world for you, how you see the world, and how you want to analyze the world.
Dr. Sharp: Mm-hmm. You’re so right.
[00:53:00] Dr. Vanessa: It is more opposing to this idea of a measurement instrument as neutral, even if it’s standardized or especially when it’s standardized. That’s something that, it is again not my idea, but I worked at Statistical Institute of the Netherlands in the development of questionnaires and surveys, and then there was this discussion, should we have standardized measurement instruments or should we adapt them to different situations?Actually, what I’m doing here is transposing this whole idea into this psychometric testing area of the discussion of adaptive but not adaptive testing like it’s generally known. In general, adaptive testing is known as testing that adapts to the level of the person. What I mean with adaptive testing is indeed adapting to, for [00:54:00] example, different personality traits.
Dr. Sharp: I love that you’ve opened this door. Do you have any examples of current measures either in the neuropsychological world or even outside our world that are adaptive measures that we might know of or could look to as examples of how that could work?
Dr. Vanessa: I don’t know of any, at least not when you look at standardized measurement instruments. In psychology, you also have all kinds of unstandardized measurement instruments, and that you will know more about this than me so there are more, how would you call them? Maybe unstructured or is it the TAT? But am sure you know all them, maybe you can [00:55:00] mention a few of them.
Dr. Sharp: Sure. Like the projective test, and the Rorschach, the-performance based test. We do have those measures. And we have more qualitative measures as well. So that to me begs the question, and I would love to get your perspective as a methodologist, statistician, test developer, and philosopher, from your background, is it just a given that we give up the idea of standardization if we make tests more adaptable to the individual taking them?
Dr. Vanessa: That could be maybe the endpoint. I’m not sure. I think that we move toward standardization to make testing more objective. That was was idea behind it by making test more neutral. But considering that all these interaction elements and that test maybe can never be completely neutral, maybe we would move again towards this more qualitative [00:56:00] diagnostic instruments.
But maybe also the solution would be to combine them. So you could combine a psychometric test with a more unstructured measurement instrument. And maybe this would give you the best view and would make it possible for you to get the best conclusion out of these tests. That would be something to work on. To work maybe a bit back to the more unstructured types of measurement and combine them with the standardized measurement instruments. This is just something I’m making up now. I don’t know.
Dr. Sharp: Totally fine to make things up on this podcast. I’m sure there are folks out there who are just like, oh my gosh, if we get away from standardized objective measures, where does that [00:57:00] even leave us as a field and is that going to be valid and what are we even doing with our lives?
I am being dramatic, but it is an interesting question to me because I see both sides, right? I mean, the need for objective assessment is important and it seems like we have really good data to suggest that personality and other factors are really influencing folks’ performance on these measures that we tend to rely on quite heavily to make important decisions in kids’ lives. That’s a conundrum.
Dr. Vanessa: Yeah. And it’s a way in which you impact people’s lives but also shaped reality. You could also think of that. We shape reality when using these tests and then making decisions based on these tests. And it has a lot of implications. It is really something to think about.
Dr. Sharp: It really is. This is such a cool conversation. I feel [00:58:00] like we are generating more questions than we’re answering, which to me is a really engaging discussion. So it’s going to have me thinking. I am curious, as we start to wrap up here, if you see a clear path forward in terms of research focus or anything that we could look at that would help with this problem.
Dr. Vanessa: Research focus. I think one of the research foci, do you say it like this, should be to do qualitative in-depth detailed research into this response process to see what happens there in the response process and try to relate it to certain [00:59:00] variables like personality factors.
I would also think that a large part of this research goes through maybe also testing psychologists like you because for example, the WISC has been used for many years already by people like you. And I’m sure there is a lot of knowledge there, a lot of information to be collected, written up, disseminated that can be used to evaluate these tests and to furthermore develop these tests and to improve them. So I think that would be an important issue.
And then, of course, that’s also the dissemination. I’m also not really aware of and how far in research covered in the world. Maybe people are already doing research or already writing guidelines to take this into account, and then how far this is disseminated among psychologists over the world. So that is also an important part, I [01:00:00] think, of science, of research. So of course it’s not useful to do research and to write it up, and then nobody reads it and it’s not disseminated. So that would be maybe also an important focus in research.
And then again to try to generate data on personality factors and to relate them to test bias in psychometric testing. I think it would be good then, like, if I would choose one test, I would choose the WISC. For all the reasons that we have discussed, it’s one of the most used tests in the world. It’s used for a lot of decisions, so maybe this is an important test to do research on, but of course, this would be something for Pearson. I think this is published by Pearson, right? It would be actually a task for Pearson, I guess, to do this research and to develop and look more into this.
Dr. Sharp: Sure. I’m sure they’re listening.
[01:01:00] Dr. Vanessa: Let me think. There could be also more research more unstructured tests like the projective test and how they can be used maybe conjoined with psychometric tests to draw conclusions for diagnostics. So there could be research into that. I think that would be a good way of doing research. Actually, I have a colleague who does research on that. So that would be good.I think that’s all I can think about at this moment. I’m sure that in 5 or 10 minutes when we have closed down, I will have a new idea but…
Dr. Sharp: Of course
Dr. Vanessa: … that’s how it is.
Dr. Sharp: Yeah. No, that’s plenty. This has been a fantastic discussion. I appreciate you going all sorts of different directions as [00:02:00] we talked and bearing with some curveball questions, but this is important research. We will, of course, link to your article in the show notes that we’ve referenced and a couple of the other things that we’ve mentioned. But I really appreciate you coming on especially because this is evening time for you being over in Europe. I’m just grateful for your knowledge and for your time. Thanks.
Dr. Vanessa: It was my pleasure. Thank you for listening to me. I hope I had something interesting and maybe we can discuss sometime again.
Dr. Sharp: I would love that.
Dr. Vanessa: So thank you a lot.
Dr. Sharp: All right, y’all. Thank you so much for tuning into this episode. Always grateful to have you here. I hope that you take away some information that you can implement in your practice and in your life. Any resources that we mentioned during the episode will be listed in the show notes, so make sure [01:03:00] to check those out.
If you like what you hear on the podcast, I would be so grateful if you left a review on iTunes or Spotify, or wherever you listen to your podcast.
And if you’re a practice owner or aspiring practice owner, I’d invite you to check out The Testing Psychologist mastermind groups. I have mastermind groups at every stage of practice development, beginner, intermediate, and advanced. We have homework, we have accountability, we have support, we have resources. These groups are amazing. We do a lot of work and a lot of connecting. If that sounds interesting to you, you can check out the details at thetestingpsychologist.com/consulting. You can sign up for a pre-group phone call and we will chat and figure out if a group could be a good fit for you. Thanks so much.
The information contained in this podcast and on The Testing Psychologist website are intended for informational and educational purposes only. Nothing in this podcast or on the website is intended to be a substitute for professional, psychological, psychiatric, or medical advice, diagnosis or treatment. Please note that no doctor-patient relationship is formed here, and similarly, no supervisory or consultative relationship is formed between the host or guests of this podcast and listeners of this podcast. If you need the qualified advice of any mental health practitioner or medical provider, please seek one in your area. Similarly, if you need supervision on clinical matters, please find a supervisor with an expertise that fits your needs.[01:05:00]