Thursday, April 21, 2016

Issues in Science

The Magazine First Things recently published an article on the problems of science by Silicon Valley programmer William A Wilson. I'm writing here not to critique his points, as he raises a number of issues that students in my Embodied Cognition seminar have raised throughout the semester (I'll include some of their comments at the bottom of the post), but rather to do what science does and extend his analysis and tie it in to existing literature and commentaries.


In the article, Wilson makes the following points (summarized and bulletized below - my own comments italicized):

  1. Science has a replication problem (citing the OSC attempt to replicate 100 psychology studies - though see a critique here and the backlash here for one example) - Here's a suggestion for how to fix this issue - however, a caution on how we try approach replications here and here
  2. Positive Result Publication Bias - The "hardness" of a science decreases the reporting of positive results - commentary on that paper here
  3. Journals - Impact Factor and Bias
  4. Experimenter bias/fraud 
    1. "Experimenter effect"- when an experimenter expects a certain result, they are more likely to find it - Here's a suggestion on how to fix that here as well as another suggestion here
    2. Publishing the experiments that would likely be accepted for publication. I think he then cites this article from Psychological Science about questionable research practices, but I can't tell for sure since he includes no links or citations
    3. Data analysis as an art - P-hacking, HARKING
  5. Self-Correcting Nature of Science is nonexistent
    1. Scientific Dogma - ignores or outright cover up results that don't agree with accepted dogma - Maybe this example from PLoS earlier this year is an example - where authors credit the hand as being created by intelligent design only to have it retracted because of scientists critiquing the rationale and conclusions
    2. Peer-review doesn't do its job - You can find critiques of the peer review process here and here or from this blog here
  6. Institution of Science is Old and Resistant to Change
    1. Those in power don't want to change (I'll call this the Pyramid Scheme Critique)
    2. Systematic Inequalities are present
  7. Cult of Science
    1. Human bias and error cannot achieve science's goals
      1. Here's a commentary on addressing biases
    2. "Scientism" - science as a "holy" or "true" discipline
    3. Cult leaders who aren't researchers of note or promise proselytize a faulty and disingenuous "truth"
Wilson has very nicely and succinctly summarized a lot of issues that myself along with other scientists have been thinking about for a number of years. If I was able to summarize his points into one, it would be that while science is one path to find "truth" we have to understand that the biases of the people pursing science and the current limitations of both understanding and measurement means that we might not always be right. A little over 6 months before this article was published there was an interesting piece in the NYT and a follow-up on NPR about the role of science in the truth. I want to include a longish quote from George Johnson in his NYT article here:
Science, through this lens, doesn’t discover knowledge, it “manufactures” it, along with other marketable goods.
Altruism and compassion toward the feelings of others represent the best of human impulses. And it is good to continually challenge rigid categories and entrenched beliefs. But that comes at a sacrifice when the subjective is elevated over the assumption that lurking out there is some kind of real world.
The widening gyre of beliefs is accelerated by the otherwise liberating Internet. At the same time it expands the reach of every mind, it channels debate into clashing memes, often no longer than 140 characters, that force people to extremes and trap them in self-reinforcing bubbles of thought.
In the end, you’re left to wonder whether you are trapped in a bubble, too, a pawn and a promoter of a “hegemonic paradigm” called science, seduced by your own delusions.
Marcelo Gleisser in his response for NPR also had a nice quote:
Creationism, the anti-vaccine movement, resistance to genetically modified crops, cellphone radio waves, fluoridation, the ongoing global climate change debate, the risk of certain high energy physics experiments (see my post from last week), all point to a curious "personalization" of science. It's as if scientific issues are simply matters of opinion — and not the product of a very thorough process of consensus-building among technically trained people.
Getting back to the issue of truth Gleisser notes:
Granted, "truth" is a loaded word. What do scientists mean when they say science finds the truth, anyway? One needs to be very careful here, for the very nature of the scientific enterprise implies that truths can shift as knowledge progresses.
A website I love to use to introduce people to the practice of science is Understanding Science: How Science Really Works from UC-Berkley. In their article about science and truth, they note that science tries to build knowledge about the natural world, there are a number of other "truths" to which science can't speak including faith/spiritual beliefs, cultural truths and truths that may be subjective or relative.  So to fall into Wilson's noted "science is self-correcting trap" that fact that we've identified the problems that Wilson notes and the reason that we have a number of experts and non-experts identifying ways to address these issues is the scientific process at work. I don't think there's a group that froths at the mouth more trying to disprove something from their own ranks than scientists. So with groups like the OSC, Open Access/Open Data and improvements to scientific methodology practices and statistical analysis methodology science continues to progress.
---

An addendum: I'm interested to see that this article hasn't been taken up by scientists. It could be that all of the links I've included demonstrate that scientists are well aware of these issues and already harping on one another. It could also be that scientists don't want to take on a religious magazine in the First Things (especially when the article discusses publication bias and human bias in science, but is written in a religious journal and makes the point that we can't trust people because of the agendas they have. Perhaps framed from this religious standpoint, scientists write off the critique as a religion vs science critique and maybe would have taken up this article written as is, but published in a secular outlet).

While I've not seen the article tweeted or blogged about by scientists, I do see it blogged about by religious people and organizations including Intelligent Design supporters. The end the article with this:
If he were active in any other area of science, he could not possibly have gotten away with writing as frankly as this.
Here are some student's thoughts on Issues in Science, saying and thinking the same things:

We had read these 3 papers for one of the last classes on Embodied Cognition
Cesario, J. (2014). Priming, replication and the hardest science. Perspectives on Psychological Science, 9, 40-48.
Everett, J.A.C., & Earp, B.D. (2015). A tragedy of the (academic) commons: Interpreting the replication crisis in psychology as a social dilemma for early-career researchers. Frontiers in Psychology.
Ioannidis, J.P.A. (2005). Why most published findings are false. PLoS Medicine, 2(8), e124.

Cesario's paper addresses many issues that have come up in our class discussions, and I appreciate his explanations for why we see unsatisfying contradictory results in research so frequently. Previously we have brought up concerns regarding the reliability of an effect which has only been replicated by the lab that found the original results. Cesario gives a structure to the replication process which makes this practice necessary, rather than questionable. I agree that it is important for the original researchers to show that they can replicate an effect before anyone else attempts to. Further, I think that by reconstructing each feature of the original study, they will be given more opportunity to consider whether there are elements of the study design contributing to the findings in a way that was not planned for or measured.
He also provides a different perspective on the idea of replicability in multiple contexts, which is a metric we have used to assess the strength of a given effect. We have discussed the influences that situational elements can have on a person's thoughts and behaviors, and sometimes change our expectations for the outcome of a study as a function of cultural context and geographic location. Yet, we do tend to conceive of effects that can be replicated by labs in different countries as somehow stronger. Are effects more important if they can be held universally?
I found the Cesario article very interesting because it proposed a clear framework through which to examine replications in psychology. The proposal that labs should be tasked with replicating their results multiple times before other labs attempt replications seems like a fantastic way to ensure that there is a solid initial foundation for the presence of an effect. However, this proposal runs into the same “tragedy of the commons” problem identified by Everett and Earp. Replicating studies is simply not in the best interests of the initial researchers because they are putting the validity of their scientific contributions at risk and using precious resources and time that could be used to advance their careers and reputation. Thus, the only way for a proposal like Cesario’s to work, a prestigious journal would have to enforce that all submitted articles provide direct replications.

Perhaps even more importantly, I think that psychology researchers should do more to exhaustively record data in a standardized way. In an age where huge swathes of data can be stored online extremely easily, scientists should work to store data on temperature, dates, and even pictures of the lab environment and materials used. I also think it would be useful to have a standardized questionnaire that records basic facts about every participant (age, race, education etc). This data should be formatted in a standardized format in order to make it easy for a computer to search for patterns across studies. Cesario claims that failed replications are often the result of changes in minute variables. Thus, the only way to determine what these variables and patterns are is to meticulously record as many aspects of the study as possible. It’s only then that replications can stop being ambiguous and contribute to our understanding of complex effects.

I found both Ioannidis’ and Cesario’s papers very interesting, specifically because they seem to contradict one another. Ioannidis understands that a “pure gold standard” is essentially unattainable, yet suggests some approaches to improve replication studies and post-study probability. Ioannidis believes in the tightening up theory, method, and execution in order to obtain replication results that can give us a better reading on whether a theory holds water or not. Cesario, on the other hand, seems to place less emphasis on the results of replication studies (for the topic of priming at least) because, among other things, individuals may vary greatly in how they perceive, behave, and react to stimuli. I admit that I was quite skeptical of Cesario’s paper at first, especially when I read the abstract. But when I read the rest of the paper, I found that he explained this point convincingly. It’s almost a shame that he makes a good point, since I feel like his argument places the field of psychology in this nebulous, grey area where it doesn’t fit with true sciences and the scientific method.
This is why I’m glad I read Everett and Earp’s paper last. I like the authors’ idea to make direct replications a requirement for a PhD. It sounds like a practical endeavor that would greatly benefit the field of Psychology. Despite Psychology’s flaws, I’m glad there are still sharp minds and good ideas that could greatly improve the community and research.

With respect to replication studies, we have discussed multiple studies that have not necessarily "replicated" an experiment in it's original form; unless the replication was completely true to the original study, there is always some change that, while claimed not to have significant effects on the results, may have influenced subtle variability in the results. In addition, there have been multiple studies in which experimenters worked in milli-seconds; hence the statistical significance of the data is highly dependent on extremely precise and tight responses and measurement. This type of data collection is suspect to begin with, since, unless the study is done perfectly, subtle errors can occur that obscure the results. However, they are necessary to perform since so much of embodied cognition looks at the influences of priming and metaphor.

On another note, while the theoretical basis for many of the studies we've discussed make sense, there is still a chance that the authors slightly manipulated the data to get larger effect sizes and statistical significance. However, the difficulty in determining the validity of the results for both the original studies and replications can be difficult depending on the experiment. In addition, there are other small factors, like experimenter bias, internal validity and whether or not the participants know what the experimenters were studying, that can influence the results.
Though this sounds dramatic, the corrupt nature of psychological academia represents a larger issue; psychologists motivations aren't necessarily to find new truths about human cognition and behavior, but to publish multiple papers, gain recognition and make money to essentially stay alive. While this pollutes psychological academia with false information, there may be a positive that comes out of the potentially false data: it makes psychology sexy to the general public. By making the results more attractive, psychologists are gradually increasing the amount of potential funding they can receive to perform more accurate experiments. This can increase financial security among psychologists and motivate them to publish completely correct and valid data. On the contrary, psychologists can fall into a hole of continuously producing false data in order to maintain a constant stream of funding. Which ever occurs to a psychologist is ultimately up to them and how much they want to truly add to the field of psychology.

I like the idea from the Everett & Earp (2015) article of requiring graduate students to run a replication study in their field in order to receive a PhD in psychology. I think that having graduate students do this is a much better idea than having undergraduate students run replication studies, and cleans up a lot of the problems with the original idea from Frank and Saxe (2012). If all graduate students were required to run replication studies, the amount of information about the effects found for various theories would increase exponentially. I think that one negative possibility is that there would just be a lot of mixed results created, as we have seen from arguments among different replication studies and original authors. Though I think that the more information and more study attempts the better, I don't know what psychologists would make of it if tons of replication attempts just showed conflicting results. If all graduate students around the country every year created more and more replication attempts, would this be an overload of information? How would we keep track of all of these results? How would we determine which replications are good replications and which aren't? I think that it is a really promising idea, but would like to talk the concept through with the authors to get more information.
In terms of the other two articles, I thought that they worked interestingly in conjunction with one another. Cesario's idea that expectations about replications are inappropriate for priming studies seems to somewhat conflict with Ioannidis's view that most published research findings are actually false. Cesario's article seems to call for less scrutiny of published results, while Ioannidis's article asks for much more of it. I see the validity in both points- though there seems to be a possibility of falseness in many psychology research findings, maybe that is due to undetectable confounding variables from a variety of sources. I think that what needs to happen is more exploration of why a replication attempt fails if it does, in order to find possible mediating factors or specification for which contexts a certain theory applies to.

The readings for today’s class were some of my favorite so far in the course. Something I’ve been struggling with in Embodied Cognition is which effects are real and which effects exist because of a Type I error or some other bias? These readings described the issues in psychological science, disagreement with issues in the field, and potential solutions very well. Asking all Ph.D. candidates to perform replication studies in order to graduate as suggested by Everett and Earp (2015) is a great way to increase the verifiability of psychological science. Though psychology graduate students want and need to conduct their own original research, they still benefit from performing exercises that help them learn new skills. Replicating existing research teaches the psychology researchers of tomorrow these important new skills while improving the psychology research of today.

In his article about priming effects and their replications, I think Joseph Cesario made some good arguments. For example, Cesario’s (2013) belief and suggestion that initial replication attempts should be performed by the author’s of the original study where an effect was found makes total sense. People are very fickle and this means that most psychological effects (including priming effects) are likely to be fickle and subject to slight environmental or contextual changes as well. To avoid the effects of environmental or contextual changes to the experiment, a study’s original authors should perform the first several replication attempts of their findings. Then, if the original authors replicate their original results, other researchers should attempt direct replications of the study to see if effects carry over to new populations and areas. Another argument from Cesario (2013) that I agree with is the fact that conceptual replications don’t help create social verification for psychological research. All conceptual replications do is create additional studies that need to be directly replicated as the experimental effects of priming studies seem to be so sensitive. That said, I disagree with Cesario’s (2013) belief that priming effects cannot or should not have invariance. While I bet most priming effects are not invariant and cannot be seen in most or all people, I think for priming to be taken seriously in the psychological research community there needs to be some effects that are close to being universal.
We have been struggling with the issue of failed (and occasionally successful) replication studies throughout the semester. It has always been a struggle between the original study saying they have found something interesting and another study coming up with null findings. However, throughout this, we generally saw it as a question of who is right? The original experiment or the replication? And after reading Cesario's piece on priming, I believe we have been asking the wrong question. Both studies could be correct, or they both could be wrong. Even changing the site of a study can effect the results of the experiment, even if the methodology stays the same. We have touched upon this idea some when we consider the cultural differences between participants groups, but I don't feel that we have ever allowed the conclusion to be that maybe both studies are right. But the question then becomes, if both studies are correct, which one is useful?
This question of what is useful and what is not gets at many critics strongest point, if both can be correct, then what does that really tell us about the human mind? How does it add to the field? At this point, I would agree with Cesario when he says it doesn't mean much. Without having the experiments replicated by the same labs, it is impossible to tell whether or not an experiment demonstrates an effect for that population. Without a base of some very specific examples of experiments that are replicable, even within the same lab, then it will be difficult for theories of social priming to gain any traction. Einstein is quoted as saying, "Insanity is doing the same thing over and over again and expecting different results." By this definition, psychology could be helped by introducing some insanity to the scientific process.

No comments:

Post a Comment