Why do we need birth cohorts?

Introduction

Health is greatly determined by the exposome, i.e., environmental exposures one is subjected to throughout their lifetime. With the revolution in genetic techniques in the past decades, we have learned a lot about many of the genetic underpinnings of chronic diseases, e.g., genetics behind heart disease, cancer, type 1, and type 2 diabetes etc. Yet understanding this genetic part of disease is only a part of the puzzle and does not fully explain why these diseases develop. With most non-communicable diseases, the genetics provide a fertile ground for environmental exposures to initiate disease process. However, studying these exposomic disease determinants is a much more demanding task than analysing the genome. Mapping all the possible exposomic determinants is a tremendous undertaking and to make things more difficult the exposome changes throughout the lifetime unlike the genome that stays relatively unchanged throughout life. Finding exposomic disease determinants is shooting at a moving target and success requires sophisticated study designs.

Image 1: An example of the many layers of the exposome we are exposed to throughout life.

Case-control studies

When we consider genome studies, genetic association studies (Genome-wide association studies and other studies) have made major discoveries based on retrospective case-control study designs. These studies compare patients with a disease or outcome of interest (cases) with population which does not have the disease or outcome (controls). Case-control studies are cost effective: they provide relatively high statistical power to detect relationships with a limited total number of observations. In retrospective studies, investigators compare how frequently the exposure to a risk factor is present in each group and from there the relationship between the risk factor and the disease is determined. Since genes do not change after the development of a disease outcome, this study design works relatively well.

Although case-control studies have given medicine a map of the genetics of disease, they cannot shed bright light on the link between the exposome and chronic diseases. This is because any differences in exposome found between the cases and the controls can often be explained by reverse causation. Reverse causation means that the observed difference is a consequence of the disease (or preclinical disease process), rather than the cause. As the disease progresses the exposome changes and reverse causation can often explain differences in microbiota, infections, gene expression and epigenetics.

The alternative: Prospective cohort studies

To get a view on the exposome of a disease we need to use a different study design. Prospective studies follow a cohort of study participants throughout the disease process.

The term cohort (from the Latin cohors, plural cohortes) originally described a roman military unit. Today, cohort refers to a group of individuals with similar baseline characteristics, that are followed over time. In demography, a birth cohort are all people born in the same year, usually in a country or region. In research, a cohort share a set of characteristics and are followed over time with biological samples collected multiple times during follow-up. Longitudinal means data are collected from the same individuals at multiple follow-up times.

An example of a cohort are the participants in the Finnish Type 1 Diabetes Prediction and Prevention (DIPP) study included in HEDIMED. The children selected into the DIPP study share a set of characteristics that qualify them to be part of the cohort. In the case of DIPP the shared characteristics are genetic susceptibility to develop type 1 diabetes.

In DIPP and in other cohorts the samples are collected at set intervals allowing for analysis of exposomic determinants at different stages of disease. Although prospective studies cannot prove causation, a major contribution of such studies is that the exposome can be analysed prior to developing the disease. If the exposome prior to disease development differs between study participants who develop disease and study participant who don’t, we can probably rule out reverse causation as the cause of differences in exposome. Limiting the effect of reverse causation is the key benefit of prospective studies.

However, cohort studies also have their downsides. They require a considerable amount of time and resources and are usually of limited size. Also, a key downside of cohort studies is that they often suffer from participant drop-out during follow-up (“loss to follow-up”). This can lead to selection bias and decreased statistical power.

Image 2: A cohort share the same characteristics, e.g. with regards to the risk of developing a disease, and are followed throughout the disease progress.

The best of both worlds: Case control studies nested within large longitudinal cohort studies

One benefit of cohort studies is that case control studies can be included – this is a win-win! These case-control studies within a cohort are often referred to as nested case-control studies. Here, we can compare cases with disease outcomes to a selection of controls without disease in the period before disease development. The distinction between such nested (or prospective) case-control studies and traditional, retrospective case-control studies is essential and often overlooked. In addition to limiting the effect of reverse causation, nested case-control studies eliminate a weakness of traditional case-control studies: the selection bias.

Selection bias, which often haunts traditional, retrospective case-control studies, is minimized in cohort studies. Selection bias is usually defined as biases between cases and controls since case and control subjects do not represent the same source population. Controls in a case-control study should be recruited from the same population that gave rise to the diseased cases but should not develop the disease outcome during the specified follow-up time. It is often difficult to exactly identify the source population, and especially to recruit controls to research because they are often less motivated to participate in research studies aiming to find causes of a given disease, compared with people who already suffer the disease in question. In retrospective case-control studies this often result in control groups that differ from cases with respect to characteristics such as education and lifestyle.

Table 1: Advantages and disadvantages of case-control, cohort and nested case-control studies.

Birth cohorts and pregnancy cohorts

Most prospective studies follow adults and are useful for understanding diseases that develop in the adulthood. However, immune-mediated diseases such as type 1 diabetes, celiac disease, asthma, and allergy (the diseases studied by HEDIMED) often develop in the childhood and the disease process starts very early in life. To catch the exposome of the developing immune-mediated disease we need to design prospective birth cohorts where samples are collected from a very young age.

In birth cohorts, participants are identified at or around birth, recruited to join the cohort. Mothers, children, and perhaps other family members are actively followed-up for the collection of biological material, and specific clinical and genetic information. While it is costly and time-consuming to establish birth cohort studies, it is also essential for identifying early life exposures that may contribute to disease processes.

Where birth cohorts start after birth, pregnancy cohorts start during the pregnancy. Pregnancy cohorts recruit pregnant women and follow mothers and their children (and sometimes also the father) longitudinally. Large scale pregnancy cohorts are even rarer than birth cohorts, and superior to birth cohorts for relating exposures in utero or exposures mothers are subjected to with later disease in the offspring.

HEDIMED includes both birth and pregnancy cohorts. The birth cohorts of HEDIMED include MIDIA, DIPP, COPSAC and DiPiS/CiPiS among others and the pregnancy cohorts of HEDIMED include MoBa, as well as pregnancy sub-cohorts of DIPP and COPSAC.

Image 3: In birth cohorts, participants are identified at or around birth, recruited to join the cohort.

Causation

Neither case-control studies, cohort studies or nested case-control studies can prove causation between an exposure and disease. What cohort studies and nested case-control studies can do for the exposome is to provide a study design that allows for reliable statistical associations between the exposome and disease. To prove causation an intervention study is needed, where one group is given an intervention for a disease and the other group is not. In HEDIMED the PREVALL intervention trial is designed to intervene in the development of allergies in children, COPSAC studies Vitamin D supplementation for the prevention of asthma and Lund University studies probiotic bacteria intervention for prevention of celiac disease.

More information about the cohorts in HEDIMED: https://hedimed.eu/cohorts/

Authors

The blog post was prepared by HEDIMED researchers. The framework and background information for the post was provided by Lars Stene, the post was written by Johannes Malkamäki, it was drafted and edited by Daniel Schmidtmann and checked by Leena Hakola, Lars Stene and Heikki Hyöty.

Sources

Manolio, T. A., J. E. Bailey-Wilson, and F. S. Collins. 2006. Genes, environment and the value of prospective cohort studies. Nat Rev Genet 7:812-820.

Norris, J. M., R. K. Johnson, and L. C. Stene. 2020. Type 1 diabetes-early life origins and changing epidemiology. Lancet Diabetes Endocrinol 8:226-238.

Virgin, H. W., and J. A. Todd. 2011. Metagenomics and personalized medicine. Cell 147:44-56.

Wijmenga, C., and A. Zhernakova. 2018. The importance of cohort studies in the post-GWAS era. Nat Genet 50:322-328.