Principal Component Analysis
Posted by ludesi in 2DE Knowledge Base
![]() Andreas Ekefjard, CTO and Biostatistician at Ludesi, answers: |
Q: “What is Principal Component Analysis and how can I use it for my 2D gel data?” |
Principal Component Analysis (PCA) is a data analysis method that can be used for multiple purposes, for example:
- For verifying that biological differences exists.
- For finding outliers among the samples.
- For finding unknown subgroups among the known groups.
- For verifying that technical replicates are more similar than biological replicates.
You can easily create a PCA plot of your data in REDFIN by clicking on the PCA symbol in the top right corner of the results viewing environment. Note, that each small dot in your PCA plot represents one sample. Each large dot represents one artificial average sample for one particular group. Samples close to each other have proteins that are similarly expressed. If all the dots of a particular group cluster nicely together, it verifys that biological differences exist.
If on the other hand, one sample from one group clusters together with the samples of another group, it is an indicator of an outlier in your data. If it’s clinical samples, one possible explanation could be that the sample has been misclassified in the clinic.
PCA is ‘unbiased’
An important characteristic of PCA is that it is unsupervised. That means that no information about the samples’ group membership is used when the analysis is performed and the method can be thought of as an unbiased method. If the PCA plot divides the samples into groups that resembles the biological groups, then this is a strong indication that biological differences between the groups exists!
How does PCA work?
PCA looks at all information in your data set. If there exist 1000 different proteins, each gel can be described by 1000 variables, one variable for each proteins. This is far too much information to display in an intuitive way. PCA reduces the number of variables so that the data can be presented in a two- or three-dimensional plot. The reduction is done in a mathematically optimal way – so you can be sure that as much information as possible is kept.
If you want to read more, a good place to start is at Wikipedia which contains many good links to other sites.
You can also watch the following video showing how to use the PCA functionality in REDFIN for 2D gel data: PCA Tutorial




