Amplicon sequencing processing
Generated by: George Chlipala
Report date: December 18, 2024
Overview
When you publish manuscripts based on data generated at our facility, we would greatly appreciate an acknowledgement of our efforts. Please cite our facility as follows (for example):
Basic processing of the raw data were performed by the University of Illinois at Chicago Research Informatics Core (UICRIC).
We adhere to a general policy for acknowledgements and authorship as established by the Association for Biomolecular Resource Facilities (ABRF) , and we support the following statement from the ABRF.
The existence of core facilities depends in part on proper acknowledgment in publications. This is an important metric of the value of most core facilities. Proper acknowledgment of core facilities enables them to obtain financial and other support so that they may continue to provide their essential services in the best ways possible. It also helps core personnel to advance in their careers, adding to the overall health of the core facility.
Please contact us for assistance in drafting manuscripts.
- This report provides a high-level summary of the basic bioinformatic analysis included with the amplicon sequencing services provided by the UIC Research Resources Center (RRC). The end result of these bioinformatics services is to provide investigators basic information concerning the abundance of taxa present in the samples. The basic bioinformatic analysis includes basic processing of raw sequence data including read merging, adapter & quality trimming, chimeric checking and processing using DADA2 to generate a table of abundance data and associated taxonomic annotations.
- There were 20 samples in this project.
- Method: Pear1 (version: v0.9.11)
-
Forward and reverse reads were merged using PEAR.
Figure 1 . Merging results
Table 1 . Sequence merging statistics
- Method: cutadapt2 (version: 4.4)
-
Sequencing trimming using cutadapt - -a = ^GTGCCAGCMGCCGCGGTAA...AAACTYAAAKRAATTGRCGG$
- --trim-n
- --max-n = 0
- -q = 20
- -m = 300
- --trimmed-only
- -e = 0.10
- --report = minimal
- Method: Quality trimming
-
Quality trimming based on quality threshold and length parameters. - min length = 300
- p = 20
- Method: Adapter trimming
-
Adapter/primer sequences were trimmed from the reads. - 5' adapter = GTGCCAGCMGCCGCGGTAA
- 3' adapter = AAACTYAAAKRAATTGRCGG
- Method: Adapter filter
-
Reads that lack the adpater/primer sequences were discarded. - Method: Abiguous nucleotide trimming
-
Ambiguous nucleotides (N) were trimmed from the ends and reads with internal ambiguous nucleotides were discarded. - max-n = 0
Figure 1 . Trimming results
Table 1 . Trimming statistics
- Method: VSEARCH reference based3 (version: v2.25.0)
-
Chimeric sequences were identified using the VSEARCH algorithm as compared with a reference database.
Figure 1 . Chimera checking results
Table 1 . Chimera checking statistics
- Method: DADA2 amplicon denoising4 (version: 1.30.0 )
-
Amplicon Sequence Variants (ASVs) were infered using DADA2.
Figure 1 . Comparison of observation and sequence counts in each sample
Figure 2 . Comparison of samples and sequence counts for ASVs/OTUs
Table 1 . Observation (ASV/OTU) and total sequence counts per sample
Figure 1 . Summary of major level 2 taxa
Figure 2 . Summary of taxonomic annotation depth
Table 1 . Summary of major level 2 taxa
Table 2 . Summary of taxonomic annotation depth - Grouped by level 2 taxa
- Method: Taxonomic filter
-
Data are filtered to remove read counts associated with particular taxa - filter = c__Chloroplast,f__mitochondria,D_4__Mitochondria,D_3__Chloroplast,Chloroplast,Mitochondria,Synthetic_Rhodanobacter_Spike-In
- Method: Relative sequence abundance
-
Read counts were normalized as fraction of total sequence counts in each sample
Figure 1 . Filtering stats
Figure 2 . PCA plots of normalized data (relative sequence abundance)
Table 1 . Filtering stats
Citations
- Jiajie Zhang, Kassian Kobert, Tom Flouri, and Alexandros Stamatakis. (2014) Pear: a fast and accurate illumina paired-end read merger. Bioinformatics, 30(5):614-620
- Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17(1):10-12. doi:https://doi.org/10.14806/ej.17.1.200
- Rognes T, Flouri T, Nichols B, Quince C, Mahé F. (2016) VSEARCH: a versatile open source tool for metagenomics. PeerJ 4:e2584. doi: 10.7717/peerj.2584
- Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA and Holmes SP (2016). DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods, 13:581-583. doi: 10.1038/nmeth.3869
- Wang Q, Garrity GM, Tiedje JM, and Cole JR. (2007). Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and environmental microbiology, 73(16):5261-5267. doi:10.1128/AEM.00062-07