Aalto University and FinnGen visualization collaboration

2023

LAVAA: Lightweight Association Viewer Across Ailments

Explore the project

The LAVAA volcano plot tool allows researchers to view not only the significance of PheWAS results of a variant, but also enables one to quickly see different directions and magnitudes of effect across phenotypes. Additional attributes – number of cases and if the variant is in a credible set for the phenotype, can also be visualized and output to a publication-ready file.

Background

A GWAS (genome-wide association study) is a statistical analysis of which genetic variants are more associated with a particular phenotype. A variant is a DNA change that may be associated with a medical phenotype. A phenotype, also called a trait, is an observable physical property of an organism; examples of phenotypes include height or hair color, or characteristics that can be measured in the laboratory, such as levels of hormones or blood cells.

GWAS are done to point to biological mechanisms affecting the phenotype, and make predictions of the phenotype from genomic information. PheWAS (phenome-wide association study) is a complementary study to GWAS in which you look at all the phenotypes associated with a particular variant.

Challenge

Usually, PheWAS data is presented as a Manhattan Plot, in which genomic coordinates are displayed along the x-axis, with the negative logarithm of the association p-value for each single variant displayed on the y-axis. Each dot on the Manhattan plot signifies a variant, and triangles (up and down) indicate the direction of effect (protective or risk). https://r5.finngen.fi/variant/4–110755885-T-A

However, this type of representation can not show clustering between the data points that would fall into the category of either protective or risk factors and leads to issues when there are too many categories on an x-axis. On top of that, as a static visualization, the plot does not enable any interactive filtering that could further help the analysis. To tackle these issues, Eric Fauman presented an alternative concept for using Volcano Plots for PheWAS data at F2F in March 2021. 

Visualization

A Volcano Plot is a type of scatterplot commonly used for statistical applications. It is popular in transcriptomic analysis, as it enables quick visual identification of significant data points. Using it for PheWAS data highlights the biologically interesting features, allowing the viewer to compare different phenotypes quickly, and spot interesting patterns, such as higher effect sizes in certain subgroupings of disease even though they might not be significant due to hit in power. 

The LAVAA tool allows researchers to better analyse PheWAS data and export their findings directly for publications.

In the LAVAA interface, the x-axis of the Volcano Plot plots the negative log 10 p-value of each data point. As the p-value indicates statistical significance, higher points on the x-axis also have a higher statistical significance. The y-axis shows the beta value, which basically means the effect size. The further a point is from the center (which is the 0), the bigger its effect size is. Points on the negative side have a stronger effect on reducing the risk of disease, while on the positive side they have a stronger effect on increasing the risk of disease.

When considering both axes together, points that fall into the upper corners can be expected to have both high significance and effect size, and are potentially interesting for further examination. 

Key features of the visualization

The colours of the scatterplot points signify disease categories and the halo around them signifies the number of cases for each datapoint. The chart also marks the threshold of statistical significance, and a legend lists the categories of all datapoints above that threshold. The legend also allows the user to visually filter specific categories, giving a better idea of the mass of datapoints in each category.

A hover tooltip gives information about the category, phenotype, p- and beta- values, as well as case numbers for each specific datapoint.

Interaction with the visualization

An area of datapoints in the visualization can be selected by brushing. After the selection, the relevant data is presented as a table that can be directly downloaded as a csv file. The selection can be further modified in the brushed area or cleared by using the “clear selection” button. 

There is also an option for showing convex hulls and viewing only the datapoints above the line of significance. Convex hulls are a method of visualizing the distribution area of datapoints of the same category.

The created selections can also be labeled directly in the interface, and downloaded as jpeg or a svg files to use directly in presentations or publications.

LinkS

Tutors

PArtners

Contact

Share on facebook
Share on twitter
Share on linkedin