The practice in epidemiology to see if a given “external” exposure is associated with a given health outcome has served us well for many years. In this “black box” approach, not much is mentioned about how this exposure modifies the health outcome. Now we expect an exposure–disease statement to be followed by a plausible mechanistic causal path, producing a hypothesis rich in empirical content; a hypothesis that can be used for making testable predictions. Given the flexible nature of biologic evidence, we can usually meet the requirement with different degrees of success. This practice has to a large extent been widely adopted, except in genetic epidemiology of the genome-wide association study (GWAS), where inductive principles prevail.
It is sometimes overlooked that most epidemiologic studies still collect data on only one step in the postulated causal path. Furthermore, we have not been thorough in advocating the justification of black box studies on proximal determinants, especially if these determinants can be modified at low or no risk. What is often lacking is an understanding of how empirical data from a study on proximal determinants, inspired by a suggested causal path, can be interpreted.
Recently, the International Agency for Research on Cancer (IARC) classified night shift work as probably carcinogenic (2A) based on a few positive epidemiologic studies combined with animal experiments and plausible biological evidence (1). A more recent update is available (2). The short version of this hypothesis states that night shift work (NSW) (with light exposures) reduces the production of melatonin that act to prevent breast cancer (3, 4). The full version of the hypothesis is more elaborate in reality (5), but this short version will do for now.
Obtaining lifelong measures of melatonin is impractical but feasible for recording shift work (6), the previous step in the path; our study could then be based on the directed acyclic graph (DAG) set out in figure 1.
According to this figure, we estimate 1 and try to adjust for C4 (confounders), but many have no empirical data on what should be adjusted at C2 and C3, because we have no measurements on melatonin. We know that C2 in this model establish a backdoor path between NSW and breast cancer as well as C4, but C4 can be blocked if we have data on the potential confounders for the link between NSW and breast cancer, but we often have no data to identify C2. Since we have not measured melatonin, the consequence is that neither a positive nor a negative link between NSW and breast cancer, even after a fully blocked backdoor path 4, confirms or falsifies the causal path from NSW to melatonin to breast cancer. It “only” shows that NSW is causally linked to breast cancer through one or more paths, reminding us that (i) our level of making inference should be the level of measuring and (ii) ecological fallacies exist outside studies based on aggregates of individuals. We will therefore be able to reduce the incidence of breast cancer by reducing the occurrence of NSW if the causal diagram is complete, but not necessarily by providing melatonin to people working as night shift workers. That is not necessarily a serious problem. NSW may be our target for invention and can be reduced perhaps by distributing this exposure to more people and thereby reducing exposures for individuals that may work even at a no-risk level if a threshold for an effect exists. Whether this operates with a mechanism including melatonin need not be important in public health practice.
Suppose now that we only have recorded data on job titles but no data on lifelong night shifts because job titles but not working hours may be recorded over time. In using a job exposure matrix (JEM), we can base our study on the DAG set out in figure 2 (3, 7).
As before, if we find a positive association between jobs with a high proportion of night shift work (NSJ) and breast cancer (path 1) – after having closed all type C5 “backdoors” – jobs with many night shift workers are causally linked to breast cancer assuming the DAG is complete. If we have fully adjusted for backdoor paths of types C5 and C2 (we may have data from other sources to identify the backdoor paths from NSJ to NSW), then we can say this causal link is due to shift work. However, without data on confounders C3, the role of melatonin exposure is still speculative. We can only claim the association is due to a reduced melatonin production if we have closed all backdoor paths.
There is no reason to regret that we operate outside the black box when we study distal determinants. We are interested in a possible harmful effect of, for example, shift work or mobile phones whether this may be caused by radio frequency exposures, melatonin, or something else. The public health issue is that we have 6 billion mobile phones users, many of whom work on the night shift, and any type of negative health effect is of concern whatever the mechanism may be.
Adding a potential causal path to our hypothesis may allow us to make more testable predictions, but without data on all the causal links in the causal path it says little about whether this path is right or wrong. A failure to recognize this may lead to preventive interventions within the black box that address the wrong causes.
Concluding remarks
Adding a possible causal path to our hypothesis will often provide more testable predictions but does not provide much stronger inference when we have no data from within the black box. One of the testable predictions is that the effect on shift work may be blocked by providing melatonin to shift workers or making night workers operate in a light with less effect on melatonin levels (eg, light in a colored spectrum). Females working night shifts in “red light districts” have used this precautionary principle for centuries.