Power analysis for a between-sample experiment

Understanding statistical power is essential if you want to avoid wasting your time in psychology. The power of an experiment is its sensitivity – the likelihood that, if the effect tested for is real, your experiment will be able to detect it.

Statistical power is determined by the type of statistical test you are doing, the number of people you test and the effect size. The effect size is, in turn, determined by the reliability of the thing you are measuring, and how much it is pushed around by whatever you are manipulating.

Since it is a common test, I’ve been doing a power analysis for a two-sample (two-sided) t-test, for small, medium and large effects (as conventionally defined). The results should worry you.


This graph shows you how many people you need in each group for your test to have 80% power (a standard desirable level of power – meaning that if your effect is real you’ve an 80% chance of detecting it).

Things to note:

  • even for a large (0.8) effect you need close to 30 people (total n = 60) to have 80% power
  • for a medium effect (0.5) this is more like 70 people (total n = 140)
  • the required sample size increases drammatically as effect size drops
  • for small effects, the sample required for 80% is around 400 in each group (total n = 800).

What this means is that if you don’t have a large effect, studies with between groups analysis and an n of less than 60 aren’t worth running. Even if you are studying a real phenomenon you aren’t using a statistical lens with enough sensitivity to be able to tell. You’ll get to the end and won’t know if the phenomenon you are looking for isn’t real or if you just got unlucky with who you tested.

Implications for anyone planning an experiment:

  • Is your effect very strong? If so, you may rely on a smaller sample (For illustrative purposes the effect size of male-female heigh difference is ~1.7, so large enough to detect with small sample. But if your effect is this obvious, why do you need an experiment?)
  • You really should prefer within-sample analysis, whenever possible (power analysis of this left as an exercise)
  • You can get away with smaller samples if you make your measure more reliable, or if you make your manipulation more impactful. Both of these will increase your effect size, the first by narrowing the variance within each group, the second by increasing the distance between them

Technical note: I did this cribbing code from Rob Kabacoff’s helpful page on power analysis. Code for the graph shown here is here. I use and recommend Rstudio.

New grant: Reduced habitual intrusions : an early marker for Parkinson’s Disease?

SurprisalDensityPlotFor4CharacterWindowI have very pleased to announce that the Michael J Fox Foundation have funded a project I lead titled ‘Reduced habitual intrusions : an early marker for Parkinson’s Disease?’. The project is for 1 year, and is a collaboration between a psychologist (myself), a neuroscientist (Pete Redgrave), a clinician specialising in Parkinson’s (Jose Obeso, in Spain) and a computational linguist (Colin Bannard, in Liverpool). Mariana Leriche will be joining us a post-doc.

The idea of the project stems from hypothesis that Parkinson’s Disease will be specifically characterised by a loss of habitual control in the motor system. This was proposed by Pete, Jose and others in 2010. Since my PhD I’ve been interested automatic processes in behaviour. One phenomenon which seems to offer particular promise for exploring the interaction between habits and deliberate control is the ‘action slip’. This is an error where a habit intrudes into the normal stream of intentional action – for example, such as when you put the cereal in to the fridge, or when someone greets you by asking “Isn’t it a nice day?” and you say “I’m fine thank you”. An interesting prediction of the Redgrave et al theory is people with Parkinson’s should make fewer action slips (in contrast to all other types of movement errors, which you would expect to increase as the disease progresses).

The domain we’re going to look at this in is typing, which I’ve worked with before, and which – I’ve argued – is a great domain for looking at how skill, intention and habit combine in an everyday task which generates lots of easily coded data.

I feel the project reflects exactly the kind of work I aspire to do – cognitive science which uses precise behavioural measurement, informed by both neuroscientific and computational perspectives, and in the service of am ambitious but valuable goal. Now, of course, we actually have to get on and do it.

New paper: wiki users get higher exam scores

Just out in Research in Learning Technology, is our paper Students’ engagement with a collaborative wiki tool predicts enhanced written exam performance. This is an observational study which tries to answer the question of how students on my undergraduate cognitive psychology course can improve their grades.

One of the great misconceptions about sudying is that you just need to learn the material. Courses and exams which encourage regurgitation don’t help. In fact, as well as memorising content, you also need to understand it and reflect that understanding in writing. That is what the exam tests (and what an undergraduate education should test, in my opinion). A few years ago I realised, marking exams, that many students weren’t fulfilling their potential to understand and explain, and were relying too much on simply recalling the lecture and textbook content.

To address this, I got rid of the textbook for my course and introduced a wiki – an editable set of webpages, using which the students would write their own textbook. An inspiration for this was a quote from Francis Bacon:

Reading maketh a full man,
conference a ready man,
and writing an exact man.

(the reviewers asked that I remove this quote from the paper, so it has to go here!)

Each year I cleared the wiki and encouraged the people who took the course to read, write and edit using the wiki. I also kept a record of who edited the wiki, and their final exam scores.

The paper uses this data to show that people who made more edits to the wiki scored more highly on the exam. The obvious confound is that people who score more highly on exams will also be the ones who edit the wiki more. We tried to account for this statistically by including students’ scores on their other psychology exams in our analysis. This has the effect – we argue – of removing the general effect of students’ propensity to enjoy psychology and study hard and isolate the additional effect of using the wiki on my particular course.

The result, pleasingly, is that students who used the wiki more scored better on the final exam, even accounting for their general tendancy to score well on exams (as measured by grades for other courses). This means that even among people who generally do badly in exams, and did badly on my exam, those who used the wiki more did better. This is evidence that the wiki is beneficial for everyone, not just people who are good at exams and/or highly motivated to study.

Here’s the graph, Figure 1 from our paper:


This is a large effect – the benefit is around 5 percentage points, easily enough to lift you from a mid 2:2 to a 2:1, or a mid 2:1 to a first.

Fans of wiki research should check out this recent paper Wikipedia Classroom Experiment: bidirectional benefits ofstudents’ engagement in online production communities, which explores potential wider benefits of using wiki editing in the classroom. Our paper is unique for focussing on the bottom line of final course grades, and for trying to address the confound that students who work harder at psychology are likely to both get higher exam scores and use the wiki more.

The true test of the benefit of the wiki would be an experimental intervention where one group of students used a wiki and another did something else. For a discussion of this, and discussion of why we believe editing a wiki is so useful for learning, you’ll have to read the paper.

Thanks go to my collaborators. Harriet reviewed the literature and Herman instaled the wiki for me, and did the analysis. Together we discussed the research and wrote the paper.

Full citation:
Stafford, T., Elgueta, H., Cameron, H. (2014). Students’ engagement with a collaborative wiki tool predicts enhanced written exam performance. Research in Learning Technology, 22, 22797. doi:10.3402/rlt.v22.22797

New paper: Performance breakdown effects dissociate from error detection effects in typing

This is the first work on typing that has come out of C’s PhD thesis. C’s idea, which inspired his PhD, was that typing would be an interesting domain to look at errors and error monitoring. Unlike most discrete trial tasks which have been used to look at errors, typing is a continuous performance task (some of subjects can type over 100 words per minutes, pressing around 10 keys a second!). Futhermore the response you make to signal an error is highly practiced – you press the backspace. Previous research on error signalling hasn’t been able to distinguished between effects due to the error and effects due having to make an unpracticed response to signal that you know you made the error.

For me, typing is a fascinating domain which contradicts some notions of how actions are learnt. The dichotomy between automatic and controlled processing doesn’t obviously apply to typing, which is rapid and low effort (like habits), but flexible and goal-orientated (like controlled processes). A great example of how typing can be used to investigate the complexity of action control comes from this recent paper by Gordan Logan and Matthew Crump (this).

In this paper, we asked skilled touch-typists to copy type some set sentences and analysed the speed of typing before, during and after errors. We found, in contrast to some previous work which had used unpracticed discrete trial tasks to study errors, that there was no change in speed before an error. We did find, however, that typing speeds before errors did increase in variability – something we think signals a loss of control, something akin to slipping “out of the zone” of concentration. A secondary analysis compared errors which participants corrected against those they didn’t correct (and perhaps didn’t even notice they made). This gave us evidence that performance breakdown before an error isn’t just due to the processes that notice and correct errors, but – at least to the extent that error correction is synonymous with error detection – performance breakdown occurs independently of error monitoring.

Here’s the abstract

Mistakes in skilled performance are often observed to be slower than correct actions. This error slowing has been associated with cognitive control processes involved in performance monitoring and error detection. A limited literature on skilled actions, however, suggests that preerror actions may also be slower than accurate actions. This contrasts with findings from unskilled, discrete trial tasks, where preerror performance is usually faster than accurate performance. We tested 3 predictions about error-related behavioural changes in continuous typing performance. We asked participants to type 100 sentences without visual feedback. We found that (a) performance before errors was no different in speed than that before correct key-presses, (b) error and posterror key-presses were slower than matched correct key-presses, and (c) errors were preceded by greater variability in speed than were matched correct key-presses. Our results suggest that errors are preceded by a behavioural signature, which may indicate breakdown of fluid cognition, and that the effects of error detection on performance (error and posterror slowing) can be dissociated from breakdown effects (preerror increase in variability)

Citation and download: Kalfaoğlu, Ç., & Stafford, T. (2013). Performance breakdown effects dissociate from error detection effects in typing. The Quarterly Journal of Experimental Psychology, 67(3), 508-524. doi:10.1080/17470218.2013.820762

Tracing the Trajectory of Skill Learning With a Very Large Sample of Online Game Players

I am very excited about this work, just published in Psychological Science. Working with a online game developer, I was able to access data from over 850,000 players. This allowed myself and Mike Dewar to look at the learning curve in an unprecedented level of detail. The paper is only a few pages long, and there are some great graphs. Using this real-world learning data set we were able to show that some long-established findings from the literature hold in this domain, as well as confirm a new finding from this lab on the value of exploration during learning.

However, rather than the science, in this post I’d like to focus on the methods we used. When I first downloaded the game data I thought I’d be able to use the same approach I was used to using with data sets gathered in the lab – look at the data, maybe in a spreadsheet application like Excel, and then run some analyses using a statistics package, such as SPSS. I was rudely awakened. Firstly, the dataset was so large that my computer couldn’t load it all into memory at one time – meaning that you couldn’t simply ‘look’ at the data in Excel. Secondly, the conventional statistical approaches I was used to, and programming techniques, either weren’t appropriate or didn’t work. I spent five solid days writing matlab code to calculate the practice vs mean performance graph of the data. It took two days to run each time and still didn’t give me the level of detail I wanted from the analysis.

Enter, Mike Dewar, dataist and currently employed in the New York Times R&D Lab. Speaking to Mike over Skype, he knocked up a Python script in two minutes which did in 30 seconds what my matlab script had taken two days to do. It was obvious I was going to have to learn to code in Python. Mike also persuaded me that the data should be open, so we started a github repository which holds the raw data and all the analysis scripts.

This means that if you want to check any of the results in our paper, or extend them, you can replicate our exact analysis, inspecting the code for errors or interrogating the data for patterns we didn’t spot. There are obvious benefits to the scientific community of this way of working. There are even benefits to us. When one of the reviewers questioned a cut-off value we had used in the analysis, we were able to write back that the exact value didn’t matter, and invited them to check for themselves by downloading our data and code. Even if the reviewer didn’t do this, I’m sure our response carried more weight since they knew they could have easily checked our claim if they had wanted. (Our full response to the first reviews, as well as a pre-print of the paper is available via the repository also).

Paper: Stafford, T. & Dewar, M. (2014). Tracing the Trajectory of Skill Learning With a Very Large Sample of Online Game Players. Psychological Science

Data and Analysis code:

New project: “Bias and Blame: Do Moral Interactions Modulate the Expression of Implicit Bias?”

The Leverhulme Trust has awarded a 36 month grant to the University of Nottingham, for a project led by my collaborator Dr Jules Holroyd, with support from myself. The project title is “Bias and Blame: Do Moral Interactions Modulate the Expression of Implicit Bias?” (abstract below). The aim is to conduct experiments to advance our understanding of how implicit biases are regulated by ‘moral interactions’ (these are things such as being blamed, or being held responsible). The grant will pay for a post-doc (Robin Scaife) in Sheffield and a PhD student (as yet unknown, let us know if you’re interested!) in Nottingham.

Obviously, this is something of a departure for myself, at least as far as the topic goes (which is why Jules leads). I’m hoping my background in decision making and training in experimental design will help me navigate the new conceptual waters of implicit bias. Some credit for inspiring the project should go to Jenny Saul and her Bias Project, and before that, Alec Patton and his faith in interdisciplinary dialogue that helped get Jules and myself talking about how experiments and philosophical analysis could help each other out.

Project Abstract:

This project will investigate whether moral interactions are useful tool for regulating implicit bias. Studies have shown that implicit biases – automatic associations which operate without reflective control – can lead to unintentionally differential or unfair treatment of stigmatised individuals. Such biases are widespread, resistant to deliberate moderation, and have a significant role in influencing judgement and action. Strategies for regulating implicit bias have been developed, tested and evaluated by psychologists and philosophers. But neither have explored whether holding individuals responsible for implicit biases may help or hinder their regulation. This is what we propose to do.

New PhD Student: Angelo Pirrone

Angelo joins us in the Department, to run experimental studies of decision making. He is second supervised by James Marshall who is a Reader in Computer Science, and head of the Behavioural and Evolutionary Theory Lab. Angelo’s funding comes from the cross-disciplinary Neuroeconomics network I lead: “Decision making under uncertainty: brains, swarms and markets”. We’re hoping to use computational, neuroscientific and evolutionary perspectives to guide the development of behavioural studies of perceptual decision making. More about this, and the neuroeconomics network, soon. In the meantime – welcome to Sheffield, Angelo!

Update August 2016: Well, that went quick! Angelo is writing up and looking for post-doctoral positions. His CV is here

New paper: No learning where to go without first knowing where you’re coming from: action discovery is trajectory, not endpoint based.

We’ve a new paper out in Frontier in Cognitive Science: No learning where to go without first knowing where you’re coming from: action discovery is trajectory, not endpoint based. This was work done by Martin and Tom Walton as part of the IM-CLeVeR project.

The research uses our joystick task (Stafford et al, 2013) to look at how people learn a novel arbitrary action (in this case moving the joystick to a particular position). By comparing a condition (A) where the start point of the movement is always the same with a condition (B) where the start point moves around, we are able to look at the way people find it easiest to learn novel actions. In condition (A) you could learn the correct action my identifying the target location OR you could lean the correct action by identifying a target trajectory to make (which, since you always start from the same place, would work just as well to get you to the target location). In condition (B) you can’t rely on this second strategy, you have to identify the target location and head towards it from wherever you start. Surprisingly, participants in our experiment were very bad at this second condition – so much so that over the number of trials we gave them, they didn’t appear to learn anything about the target location and so acquired no novel action. This suggests that we have strong bias to rely on trajectories of movement when acquiring novel actions, rather code them by arbitrary spatial end points.

The paper: Thirkettle, M., Walton, T., Redgrave, P., Gurney, K., & Stafford, T. (2013). No learning where to go without first knowing where you’re coming from: action discovery is trajectory, not endpoint based. Frontiers in Cognitive Science, 4:, 638. doi:10.3389/fpsyg.2013.00638

The paper is published as part of our Special Topic in Frontiers on Intrinsic motivations and open-ended development in animals, humans, and robots

Stafford, T., Thirkettle, M., Walton, T., Vautrelle, N., Hetherington, L., Port, M., Gurney, K.N., Redgrave, P. (2012), A Novel Task for the Investigation of Action Acquisition, PLoS One, 7(6), e37749.

New paper: The Discovery of Novel Actions Is Affected by Very Brief Reinforcement Delays and Reinforcement Modality

In this paper, in press at the Journal of Motor Behaviour, we build on our previous work which developed a novel task for investigating how we learn actions. Our interest is in how the motor system connects what we’ve been doing with what happens. When something you do causes a change in the world you want to identify what exactly it was that you did that had the effect. Our hypothesis is that the machinery of the subcortical basal ganglia does this job for us – in the domain of motor learning. One key feature of the basal ganglia architecture is the speed with which dopamine signalling responds to external events. Profs Redgrave and Gurney have argued that this rapidity is because even millisecond delays in event signalling lead to a disporportunate increase in the difficulty of connecting the correct part of what you’ve done with the event. In other words, with delay you easily lose track of what it was that you did that caused a surprising outcome.

This is the context for the experiments reported in the new paper. These experiments show that our task has a very high sensitivity to delay – of the order of 100 ms. This is fits with the Redgrave-Gurney theory of dopamine signalling, and is considerably briefer than previous work looking at the effects of delay on motor learning. This is because, we argue, previous work uses response frequency (of an already learnt action) as the dependent variable, whereas our task is better designed to look at the emergence of new actions as they are in the process of being learn.

Here’s the abstract:

The authors investigated the ability of human participants to discover novel actions under conditions of delayed reinforcement. Participants used a joystick to search for a target indicated by visual or auditory reinforcement. Reinforcement delays of 75–150 ms were found to significantly impair action acquisition. They also found an effect of modality, with acquisition superior with auditory feedback. The duration at which delay was found to impede action discovery is, to the authors’ knowledge, shorter than that previously reported from work with operant and causal learning paradigms. The sensitivity to delay reported, and the difference between modalities, is consistent with accounts of action discovery that emphasize the importance of a time stamp in the motor record for solving the credit assignment problem.

And the citation:

Walton, T., Thirkettle, M., Redgrave, P., Gurney, K. N., & Stafford, T. (2013). The Discovery of Novel Actions Is Affected by Very Brief Reinforcement Delays and Reinforcement Modality. Journal of Motor Behavior, 45(4), 351-360.

New paper: “The path to learning: Action acquisition is impaired when visual reinforcement signals must first access cortex”

Using cunning experimental design we provide evidence which supports a new theory of how the brain learns new actions. Back in 2006, our professors Redgrave and Gurney proposed a new theory of how the brain learns new actions, centered around the subcortical brain area the basal ganglia and the function of the neurotransmitter dopamine. This was exciting for two reasons: it proposed a theory of what these parts of the brain might do, based on our understanding of the pathways involved and the computations they might support and because it was a theory that was in flat contradiction to the most popular theory of dopamine function, the reward prediction error hypothesis.

We set out to test this theory. We used a novel task to assess action-outcome learning, in which human subjects moved a joystick around until they could identify a target movement. We didn’t record the dopamine directly – a tall order for human subjects – but instead used our knowledge of what triggers dopamine to compare two learning conditions: one where dopamine would be triggered as normal, and one where we reasoned the dopamine signal would be weakened.

We did this by using two different kinds of reinforcement signals, either a simple luminance change (i.e. a white flash), or a specifically calibrated change in colour properties (visual psychophysics fans: a shift along the tritan line). The colour change signal is only visible to some of the cells in the eye, the s-cone photoreceptors. Importantly, for our purposes, this means that although the signal travels the cortical visual pathways it does not enter the subcortical visual pathway to the superior colliculus. And the colliculus is the main, if not only, route to trigger dopamine release in the basal ganglia.

So by manipulating the stimulus properties we can control the pathways the stimulus information travels. Either the reinforcement signal goes directly to the colliculus and so to the dopamine (luminance change condition), or the signal must travel through visual cortex first and then to the colliculus, ‘the long way round’, to get to the dopamine (s-cone condition).

The result is a validation for the action-learning hypothesis: when reinforcement signals are invisible to the colliculus learning new action-outcome associations is harder. We also did an important control experiment which showed that the impairment due to the s-cone signals couldn’t be matched by simple transport delay of the stimulus information; this suggests the s-cone signal is weaker, not just slower in terms of dopaminergic action. You can read the full thing here.

The results aren’t conclusive – no behavioural experiment which didn’t record dopamine directly could be – but we think it is a strong result. Popper said there are two kinds of results to be most interested in. One was the experiment which proved a theory wrong. The other – which we believe this is – is an experiment which confirms a bold hypothesis. There are no other theories which would suggest this experiment, and only the Redgrave and Gurney theory predicted the result we got before we got it. This makes it a startling validation for the theory and that is why we’re really proud of the paper.

This work was funded by our European project, im-clever, and all the difficult work was done by Martin Thirkettle, building on Tom Walton’s foundation.

Thirkettle, M., Walton, T., Shah, A., Gurney, K., Redgrave, P., & Stafford, T. (2013). The path to learning: Action acquisition is impaired when visual reinforcement signals must first access cortex. Behavioural Brain Research, 243, 267–272. doi:10.1016/j.bbr.2013.01.023

