Overview

When you publish manuscripts based on data generated at our facility, we would greatly appreciate an acknowledgement of our efforts. Please cite our facility as follows (for example):

Basic processing of the raw data were performed by the University of Illinois at Chicago Research Informatics Core (UICRIC).

We adhere to a general policy for acknowledgements and authorship as established by the Association for Biomolecular Resource Facilities (ABRF) , and we support the following statement from the ABRF.

The existence of core facilities depends in part on proper acknowledgment in publications. This is an important metric of the value of most core facilities. Proper acknowledgment of core facilities enables them to obtain financial and other support so that they may continue to provide their essential services in the best ways possible. It also helps core personnel to advance in their careers, adding to the overall health of the core facility.

Please contact us for assistance in drafting manuscripts.

  • This report provides a high-level summary of the basic bioinformatic analysis included with the amplicon sequencing services provided by the UIC Research Resources Center (RRC). The end result of these bioinformatics services is to provide investigators basic information concerning the abundance of taxa present in the samples. The basic bioinformatic analysis includes basic processing of raw sequence data including read merging, adapter & quality trimming, chimeric checking and processing using DADA2 to generate a table of abundance data and associated taxonomic annotations.
  • There were 20 samples in this project.

Method: Pear1 (version: v0.9.11)

Forward and reverse reads were merged using PEAR.

Figure 1 . Merging results

Table 1 . Sequence merging statistics

Method: cutadapt2 (version: 4.4)

Sequencing trimming using cutadapt

Custom Parameters
  • -a = ^GTGCCAGCMGCCGCGGTAA...AAACTYAAAKRAATTGRCGG$
  • --trim-n
  • --max-n = 0
  • -q = 20
  • -m = 300
  • --trimmed-only
  • -e = 0.10
  • --report = minimal
Method: Quality trimming

Quality trimming based on quality threshold and length parameters.

Custom Parameters
  • min length = 300
  • p = 20
Method: Adapter trimming

Adapter/primer sequences were trimmed from the reads.

Custom Parameters
  • 5' adapter = GTGCCAGCMGCCGCGGTAA
  • 3' adapter = AAACTYAAAKRAATTGRCGG
Method: Adapter filter

Reads that lack the adpater/primer sequences were discarded.

Method: Abiguous nucleotide trimming

Ambiguous nucleotides (N) were trimmed from the ends and reads with internal ambiguous nucleotides were discarded.

Custom Parameters
  • max-n = 0

Figure 1 . Trimming results

Table 1 . Trimming statistics

Chimeric sequences are artifacts of the PCR process and occur when portions of two separate amplicons fuse during the amplification process. The RRC analysis pipeline uses a standard chimera checking program to identify chimeric sequences and the remove from the dataset

Method: VSEARCH reference based3 (version: v2.25.0)

Chimeric sequences were identified using the VSEARCH algorithm as compared with a reference database.

Figure 1 . Chimera checking results

Table 1 . Chimera checking statistics

Reduce read count complexity by a number of techniques, e.g. dereplication or sequences clustering.

Method: DADA2 amplicon denoising4 (version: 1.30.0 )

Amplicon Sequence Variants (ASVs) were infered using DADA2.

Figure 1 . Comparison of observation and sequence counts in each sample


Figure 2 . Comparison of samples and sequence counts for ASVs/OTUs

Table 1 . Observation (ASV/OTU) and total sequence counts per sample

Annotate sequences with taxonomic information

Method: Naive Bayseian classifier4,5 (version: dada2_taxa.R --version)

Taxonomic annotations were deteremined using a Naive Bayesian approach included in the DADA2 package.

Figure 1 . Summary of major level 2 taxa


Figure 2 . Summary of taxonomic annotation depth

Table 1 . Summary of major level 2 taxa


Table 2 . Summary of taxonomic annotation depth - Grouped by level 2 taxa

Method: Taxonomic filter

Data are filtered to remove read counts associated with particular taxa

Custom Parameters
  • filter = c__Chloroplast,f__mitochondria,D_4__Mitochondria,D_3__Chloroplast,Chloroplast,Mitochondria,Synthetic_Rhodanobacter_Spike-In
Method: Relative sequence abundance

Read counts were normalized as fraction of total sequence counts in each sample

Figure 1 . Filtering stats


Figure 2 . PCA plots of normalized data (relative sequence abundance)

Table 1 . Filtering stats


Citations

  1. Jiajie Zhang, Kassian Kobert, Tom Flouri, and Alexandros Stamatakis. (2014) Pear: a fast and accurate illumina paired-end read merger. Bioinformatics, 30(5):614-620
  2. Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17(1):10-12. doi:https://doi.org/10.14806/ej.17.1.200
  3. Rognes T, Flouri T, Nichols B, Quince C, Mahé F. (2016) VSEARCH: a versatile open source tool for metagenomics. PeerJ 4:e2584. doi: 10.7717/peerj.2584
  4. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA and Holmes SP (2016). DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods, 13:581-583. doi: 10.1038/nmeth.3869
  5. Wang Q, Garrity GM, Tiedje JM, and Cole JR. (2007). Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and environmental microbiology, 73(16):5261-5267. doi:10.1128/AEM.00062-07