Site Map
News and updates
New releases and related tools will be announced through the mailing list |
Getting Help
Questions about Cufflinks and Cuffdiff should be posted on our Google Group. Please use tophat.cufflinks@gmail.com for private communications only. Please do not email technical questions to Cufflinks contributors directly. |
Releases
version 2.2.0 | 5/25/2014 |
Source code | |
Linux x86_64 binary | |
Mac OS X x86_64 binary |
Related Tools
- Monocle: Single-cell RNA-Seq analysis
- CummeRbund: Visualization of RNA-Seq differential analysis
- TopHat: Alignment of short RNA-Seq reads
- Bowtie: Ultrafast short read alignment
Publications
-
Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
Nature Biotechnology doi:10.1038/nbt.1621
-
Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L. Improving RNA-Seq expression estimates by correcting for fragment bias
Genome Biology doi:10.1186/gb-2011-12-3-r22
-
Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-Seq
Bioinformatics doi:10.1093/bioinformatics/btr355
-
Trapnell C, Hendrickson D,Sauvageau S, Goff L, Rinn JL, Pachter L Differential analysis of gene regulation at transcript resolution with RNA-seq
Nature Biotechnology doi:10.1038/nbt.2450
Contributors
- Cole Trapnell
- Adam Roberts
- Geo Pertea
- David Hendrickson
- Loyal Goff
- Martin Sauvageau
- Brian Williams
- Ali Mortazavi
- Gordon Kwan
- Jeltje van Baren
- John Rinn
- Steven Salzberg
- Barbara Wold
- Lior Pachter
Links
Getting started
Install quick-startInstalling a pre-compiled binary release In order to make it easy to install Cufflinks, we provide a few binary packages to save users from occasionally frustrating process of building Cufflinks, which requires that you install the Boost libraries. To use the binary packages, simply download the appropriate one for your machine, untar it, and make sure the cufflinks,cuffdiff and cuffcompare binaries are in a directory in your PATH environment variable. Building Cufflinks from source In order to build Cufflinks, you must have the Boost C++ libraries (version 1.47 or higher) installed on your system. See below for instructions on installing Boost. Installing Boost
If you are on a 32-bit Linux system, type (all on
one line): bjam --prefix=<YOUR_BOOST_INSTALL_DIRECTORY> --toolset=gcc architecture=x86 address_model=32 link=static runtime-link=static stage install If you are on a 64-bit Linux system, type (all on
one line): bjam --prefix=<YOUR_BOOST_INSTALL_DIRECTORY> --toolset=gcc architecture=x86 address_model=64 link=static runtime-link=static stage install Installing the SAM tools
Installing the Eigen libraries
Building Cufflinks
Testing the installation
Common uses of the Cufflinks packageCufflinks includes a number of tools for analyzing RNA-Seq experiments. Some of these tools can be run on their own, while others are pieces of a larger workflow. The complexity of your workflow depends on what you want to achieve with your analysis. For a complete discussion of how Cufflinks can help you with your analysis, please see our protocol paper. The paper includes a diagram (Figure 2) describing how the various parts of the Cufflinks package (and its companion tool TopHat) fit together. As of version 2.2.0, you can also run Cuffquant and Cuffnorm to make large scale analyses easier to handle. The figure below is an updated version of Figure 2 showing how the two utilities released after the protocol paper appeared fit into the workflow:
You can use Cuffquant to pre-compute gene expression levels for each of your samples, which can save time if you have to re-run part of your analysis. Using Cuffquant also makes it easier to spread the load of computation for lots of samples across multiple computers. If you don't want to perform differential expression analysis, you can run Cuffnorm instead of Cuffdiff. Cuffnorm produces simple tables of expression values that you can look at in R (for example) to cluster samples and perform other follow up analysis. Discovering novel genes and transcriptsRNA-Seq is a powerful technology for gene and splice variant discovery. You can use Cufflinks to help annotate a new genome or find new genes and splice isoforms of known genes in even well-annotated genomes. Annotating genomes is a complex and difficult process, but we outline a basic workflow that should get you started here. The workflow also excludes examples of the commands you'd run to implement each step in the workflow. Suppose we have RNA-Seq reads from human liver, brain, and heart.
We recommend that you use TopHat to map your reads to the reference genome. For this example, we'll assume you have paired-end RNA-Seq data. You can map reads as follows:
tophat -r 50 -o tophat_brain /seqdata/indexes/hg19 brain_1.fq brain_2.fq tophat -r 50 -o tophat_liver /seqdata/indexes/hg19 liver_1.fq liver_2.fq tophat -r 50 -o tophat_heart /seqdata/indexes/hg19 heart_1.fq heart_2.fq The commands above are just examples of how to map reads with TopHat. Please see the TopHat manual for more details on RNA-Seq read mapping. The next step is to assemble each tissue sample independently using Cufflinks. Assemble each tissue like so: cufflinks -o cufflinks_brain tophat_brain/accepted_hits.bam cufflinks_brain/transcripts.gtfNow run the merge script: cuffmerge -s /seqdata/fastafiles/hg19/hg19.fa assemblies.txt The final, merged annotation will be in the file merged_asm/merged.gtf. At this point, you can use your favorite browser to explore the structure of your genes, or feed this file into downstream informatic analyses, such as a search for orthologs in other organisms. You can also explore your samples with Cuffdiff and identify genes that are significantly differentially expressed between the three conditions. See the workflows below for more details on how to do this. cuffcompare -s /seqdata/fastafiles/hg19/hg19.fa -r known_annotation.gtf merged_asm/merged.gtfCuffcompare will produce a number of output files that you can parse to select novel genes and isoforms. Identifying differentially expressed and regulated genesThere are two workflows you can choose from when looking for differentially expressed and regulated genes using the Cufflinks package. The first workflow is simpler and is a good choice when you aren't looking for novel genes and transcripts. This workflow requires that you not only have a reference genome, but also a reference gene annotation in GFF format (GFF3 or GTF2 formats are accepted, see details here). The second workflow, which includes steps to discover new genes and new splice variants of known genes, is more complex and requires more computing power. The second workflow can use and augment a reference gene annotation GFF if one is available. Differential analysis without gene and transcript discovery
We recommend that you use TopHat to map your reads to the reference genome. For this example, we'll assume you have paired-end RNA-Seq data. Suppose you have RNA-Seq from a knockdown experiment where you have two biological replicates of a mock condition as a control and two replicates of your knockdown. Note: Cuffdiff will work much better if you map your replicates independently, rather than pooling the replicates from one condition into a single set of reads.
Note: While an GTF of known transcripts is not strictly required at this stage, providing one will improve alignment sensitivity, and ultimately, the accuracy of Cuffdiff's analysis. You can map reads as follows: tophat -r 50 -G annotation.gtf -o tophat_mock_rep1 /seqdata/indexes/hg19 \ cuffdiff annotation.gtf mock_rep1.bam,mock_rep2.bam \ Differential analysis with gene and transcript discovery
Follow the protocol for gene and transcript discovery listed above. Be sure to provide TopHat and the assembly merging script with an reference annotation if one is available for your organism, to ensure the highest possible quality of differential expression analysis. cuffdiff merged_asm/merged.gtf liver1.bam,liver2.bam brain1.bam,brain2.bamAs shown above, replicate BAM files for each conditions must be given as a comma separated list. If you put spaces between replicate files instead of commas, cuffdiff will treat them as independent conditions. |