What is a volcano plot?
A volcano plot is a type of scatter-plot that can be used to quickly identify meaningful changes from within a very large data set. Volcano plots do this by plotting a measure of the statistical significance of a change (e.g., p-value) on the y-axis, versus the magnitude of the change (fold-change) on the x-axis.
When are volcano plots used?
Volcano plots are increasingly popular in ‘omics’ type experiments (e.g., genomics, proteomics, and metabolomics) that typically compare two conditions (e.g., wild-type vs. mutant or healthy vs. disease) and involve many thousands of replicate data points. By separating these data by the magnitude of the difference between the two conditions (on the x-axis) and the statistical significance of that difference (on the y-axis), it’s possible to quickly pick out those data points (e.g., genes or proteins) that display a large magnitude change but are also statistically significant.
How are volcano plots made?
A volcano plot is constructed by plotting the negative log of the p-value on the y-axis (usually base 10). This results in data points with low p-values (highly significant) appearing toward the top of the plot. The x-axis displays the fold-change between the two conditions; this is plotted as the log of the fold-change so that changes in both directions appear equidistant from the centre. Data sets plotted in this way often resemble an erupting volcano, which accounts for the name. Those data points in the top-right and top-left sectors are those of most interest because they are the most different between the two conditions and with high statistical confidence about that difference.
Let’s consider an example
Here we’ll use data from a proteomics experiment comparing wild-type plants versus mutant plants, with the aim of quickly identifying those proteins that have a very different abundance under these conditions. But this example is applicable for any situation where you are comparing two conditions, have replicate data, and many data points.
In this example we have two conditions (wild-type and mutant), replicate data (×3 replicates for wild-type and ×3 replicates for the mutant), and many data points (for around 1300 proteins).
Let’s consider the data for just one of those proteins, called Q9M0A7. In the wild-type, the abundance values for Q9M0A7 were 258, 310, and 297 in our three replicates. Whereas, in the mutant condition, the abundance values for Q9M0A7 were 18, 8, and 30.
This is equivalent to a fold-change of around 15 (a big change! There is around 15-times more of Q9M0A7 in the wild-type than in the mutant). Then, by calculating the log of the fold-change, we have a value of 3.9 that can be plotted on the x axis of our volcano plot.
When calculating the significance of this difference using a t-test, we get a p-value of 0.000086 (highly significant). Then, after calculating the LOG10 of the p-value, we can plot 4.06 on the y axis of our volcano plot.
Because Q9M0A7 had a large magnitude change when comparing our two samples and this change was highly significant, it falls into the top-right sector of our volcano plot, where it can be easily picked out as a protein of interest.
In a hurry? We provide a data-to-figure service and can produce your volcano plot from as little as $40.
Still hungry for more information? Wikipedia do a nice job describing this type of plot.