Amplicon sequencing processing

Overview

When you publish manuscripts based on data generated at our facility, we would greatly appreciate an acknowledgement of our efforts. Please cite our facility as follows (for example):

Basic processing of the raw data were performed by the University of Illinois at Chicago Research Informatics Core (UICRIC).

We adhere to a general policy for acknowledgements and authorship as established by the Association for Biomolecular Resource Facilities (ABRF) , and we support the following statement from the ABRF.

The existence of core facilities depends in part on proper acknowledgment in publications. This is an important metric of the value of most core facilities. Proper acknowledgment of core facilities enables them to obtain financial and other support so that they may continue to provide their essential services in the best ways possible. It also helps core personnel to advance in their careers, adding to the overall health of the core facility.

Please contact us for assistance in drafting manuscripts.

This report provides a high-level summary of the basic bioinformatic analysis included with the amplicon sequencing services provided by the UIC Research Resources Center (RRC). The end result of these bioinformatics services is to provide investigators basic information concerning the abundance of taxa present in the samples. The basic bioinformatic analysis includes basic processing of raw sequence data including read merging, adapter & quality trimming, chimeric checking and processing using DADA2 to generate a table of abundance data and associated taxonomic annotations.
There were 20 samples in this project.

File	Description	Type
rep_set_tax_assignments.txt	Taxonomic assignment of master sequences	result
taxa_raw_counts.zip	ZIP archive of taxonomic summaries from phylum to species level result - raw sequence counts	result
taxa_raw_counts.xlsx	Excel spreadsheet of taxonomic summaries from phylum to species level result - raw sequence counts	result
taxa_relative.zip	ZIP archive of taxonomic summaries from phylum to species level result - relative sequence abundance	result
taxa_relative.xlsx	Excel spreadsheet of taxonomic summaries from phylum to species level result - raw sequence counts	result
biom-summary.txt	Summary statistics of ASV table	result
taxa_table.biom	Amplicon Sequence Variant (ASV) table, in BIOM format	result
sequences.zip	ZIP archive of sequences for each sample, after merging, trimming and chimera checking	result
rep_set_sequences.zip	Compressed FASTA file of representative sequences for ASVs	result

Sample list
Sample	OriginalID
Sample10-M10-Fec2	Sample10-M10-Fec2
Sample11-M12-Fec3	Sample11-M12-Fec3
Sample12-M13-Fec1	Sample12-M13-Fec1
Sample13-M14-Fec3	Sample13-M14-Fec3
Sample14-M15-Fec1	Sample14-M15-Fec1
Sample15-M16-Fec2	Sample15-M16-Fec2
Sample16-M18-Fec3	Sample16-M18-Fec3
Sample17-M19-Fec2	Sample17-M19-Fec2
Sample18-M20-Fec3	Sample18-M20-Fec3
Sample19-M23-Fec2	Sample19-M23-Fec2
Sample20-M24-Fec3	Sample20-M24-Fec3
Sample21-M25-Fec2	Sample21-M25-Fec2
Sample22-M26-Fec1	Sample22-M26-Fec1
Sample23-M29-Fec3	Sample23-M29-Fec3
Sample24-M30-Fec3	Sample24-M30-Fec3
Sample25-M32-Fec2	Sample25-M32-Fec2
Sample26-M33-Fec2	Sample26-M33-Fec2
Sample27-M34-Fec3	Sample27-M34-Fec3
Sample28-M35-Fec3	Sample28-M35-Fec3
Sample29-M36-Fec3	Sample29-M36-Fec3

Method: PearJiajie Zhang, Kassian Kobert, Tom Flouri, and Alexandros Stamatakis. (2014) Pear: a fast and accurate illumina paired-end read merger. Bioinformatics, 30(5):614-620 (version: v0.9.11): Forward and reverse reads were merged using PEAR.

Current analysis
Return to top

Table 1 . Sequence merging statistics

Table 1 . Sequence merging statistics
Sample	Assembled reads	Discarded reads	Not assembled reads	Percent passed
Sample10-M10-Fec2	76474	0	1415	98.18%
Sample11-M12-Fec3	64322	2	777	98.80%
Sample12-M13-Fec1	71175	2	2984	95.97%
Sample13-M14-Fec3	64928	0	1052	98.41%
Sample14-M15-Fec1	63916	0	1485	97.73%
Sample15-M16-Fec2	56417	0	3556	94.07%
Sample16-M18-Fec3	58822	0	1372	97.72%
Sample17-M19-Fec2	94815	0	2222	97.71%
Sample18-M20-Fec3	66613	0	4437	93.76%
Sample19-M23-Fec2	58394	0	1613	97.31%
Sample20-M24-Fec3	58382	0	5214	91.80%
Sample21-M25-Fec2	203	0	11	94.86%
Sample22-M26-Fec1	49922	4	10627	82.44%
Sample23-M29-Fec3	50389	0	5304	90.48%
Sample24-M30-Fec3	72858	0	2391	96.82%
Sample25-M32-Fec2	78046	0	7113	91.65%
Sample26-M33-Fec2	47643	2	3398	93.34%
Sample27-M34-Fec3	56241	0	5758	90.71%
Sample28-M35-Fec3	57480	4	1128	98.07%
Sample29-M36-Fec3	58008	0	2808	95.38%

Method: cutadaptMartin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17(1):10-12. doi:https://doi.org/10.14806/ej.17.1.200 (version: 4.4)

Sequencing trimming using cutadapt

Custom Parameters

-a = ^GTGCCAGCMGCCGCGGTAA...AAACTYAAAKRAATTGRCGG$
--trim-n
--max-n = 0
-q = 20
-m = 300
--trimmed-only
-e = 0.10
--report = minimal

Method: Quality trimming

Quality trimming based on quality threshold and length parameters.

Custom Parameters

min length = 300
p = 20

Method: Adapter trimming

Adapter/primer sequences were trimmed from the reads.

Custom Parameters

5' adapter = GTGCCAGCMGCCGCGGTAA
3' adapter = AAACTYAAAKRAATTGRCGG

Method: Adapter filter

Reads that lack the adpater/primer sequences were discarded.

Method: Abiguous nucleotide trimming

Ambiguous nucleotides (N) were trimmed from the ends and reads with internal ambiguous nucleotides were discarded.

Custom Parameters

max-n = 0

Current analysis
Return to top

Figure 1 . Trimming results

Current analysis
Return to top

Table 1 . Trimming statistics

Table 1 . Trimming statistics
Sample	Trim passed	Ambig dropped	Trim filter dropped	Trim length dropped	Percent passed	Mean Length	Mean Trimmed Length	Quality trimmed bp
Sample10-M10-Fec2	70366	6	5239	863	92.01%	409.8	373.2	609
Sample11-M12-Fec3	59791	7	4073	451	92.96%	409.8	372.8	611
Sample12-M13-Fec1	64900	5	4370	1900	91.18%	407.5	372.6	739
Sample13-M14-Fec3	59719	4	4661	544	91.98%	409.8	373.0	772
Sample14-M15-Fec1	59675	4	3726	511	93.36%	409.7	372.5	374
Sample15-M16-Fec2	50088	4	3881	2444	88.78%	405.4	372.9	448
Sample16-M18-Fec3	54461	7	3722	632	92.59%	409.8	373.0	926
Sample17-M19-Fec2	86736	7	6180	1892	91.48%	408.0	372.5	361
Sample18-M20-Fec3	58293	6	4975	3339	87.51%	404.2	372.8	1033
Sample19-M23-Fec2	53160	4	3624	1606	91.04%	406.2	372.5	418
Sample20-M24-Fec3	48723	5	3407	6247	83.46%	396.9	372.9	264
Sample21-M25-Fec2	70	0	120	13	34.48%	377.1	372.7	0
Sample22-M26-Fec1	34717	1	2758	12446	69.54%	380.7	373.0	725
Sample23-M29-Fec3	42059	6	3170	5154	83.47%	398.1	372.5	731
Sample24-M30-Fec3	66054	0	4639	2165	90.66%	407.1	373.0	473
Sample25-M32-Fec2	67218	9	4671	6148	86.13%	400.8	372.8	945
Sample26-M33-Fec2	41972	1	3099	2571	88.10%	403.2	372.2	783
Sample27-M34-Fec3	49421	10	3255	3555	87.87%	403.1	372.7	384
Sample28-M35-Fec3	53160	2	3880	438	92.48%	408.9	372.1	907
Sample29-M36-Fec3	52292	6	4317	1393	90.15%	407.0	372.5	878

Chimeric sequences are artifacts of the PCR process and occur when portions of two separate amplicons fuse during the amplification process. The RRC analysis pipeline uses a standard chimera checking program to identify chimeric sequences and the remove from the dataset

Method: VSEARCH reference basedRognes T, Flouri T, Nichols B, Quince C, Mahé F. (2016) VSEARCH: a versatile open source tool for metagenomics. PeerJ 4:e2584. doi: 10.7717/peerj.2584 (version: v2.25.0): Chimeric sequences were identified using the VSEARCH algorithm as compared with a reference database.

Current analysis
Return to top

Figure 1 . Chimera checking results

Current analysis
Return to top

Table 1 . Chimera checking statistics

Table 1 . Chimera checking statistics
Sample	Non-chimeras	Chimeras	Percent passed
Sample10-M10-Fec2	64552	5814	84.41%
Sample11-M12-Fec3	54789	5002	85.18%
Sample12-M13-Fec1	59871	5029	84.12%
Sample13-M14-Fec3	55345	4374	85.24%
Sample14-M15-Fec1	55016	4659	86.08%
Sample15-M16-Fec2	45815	4273	81.21%
Sample16-M18-Fec3	49969	4492	84.95%
Sample17-M19-Fec2	79599	7137	83.95%
Sample18-M20-Fec3	53433	4860	80.21%
Sample19-M23-Fec2	49256	3904	84.35%
Sample20-M24-Fec3	45206	3517	77.43%
Sample21-M25-Fec2	64	6	31.53%
Sample22-M26-Fec1	32454	2263	65.01%
Sample23-M29-Fec3	39398	2661	78.19%
Sample24-M30-Fec3	59564	6490	81.75%
Sample25-M32-Fec2	62442	4776	80.01%
Sample26-M33-Fec2	39424	2548	82.75%
Sample27-M34-Fec3	45957	3464	81.71%
Sample28-M35-Fec3	51176	1984	89.03%
Sample29-M36-Fec3	49230	3062	84.87%

Reduce read count complexity by a number of techniques, e.g. dereplication or sequences clustering.

Method: DADA2 amplicon denoisingCallahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA and Holmes SP (2016). DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods, 13:581-583. doi: 10.1038/nmeth.3869 (version: 1.30.0 ): Amplicon Sequence Variants (ASVs) were infered using DADA2.

Current analysis
Return to top

Figure 1 . Comparison of observation and sequence counts in each sample

Figure 2 . Comparison of samples and sequence counts for ASVs/OTUs

Current analysis
Return to top

Table 1 . Observation (ASV/OTU) and total sequence counts per sample

Table 1 . Observation (ASV/OTU) and total sequence counts per sample
Sample	Observations	Counts
Sample21.M25.Fec2	4	31
Sample22.M26.Fec1	150	31815
Sample23.M29.Fec3	153	38696
Sample26.M33.Fec2	178	38762
Sample20.M24.Fec3	180	44393
Sample15.M16.Fec2	162	44921
Sample19.M23.Fec2	190	48391
Sample16.M18.Fec3	173	49095
Sample18.M20.Fec3	185	52481
Sample14.M15.Fec1	178	54356
Sample13.M14.Fec3	170	54595
Sample11.M12.Fec3	176	54085
Sample12.M13.Fec1	170	58893
Sample24.M30.Fec3	185	58840
Sample25.M32.Fec2	199	61600
Sample10.M10.Fec2	159	63532
Sample17.M19.Fec2	191	78309
Sample27.M34.Fec3	154	45315
Sample28.M35.Fec3	180	50424
Sample29.M36.Fec3	172	48267

Annotate sequences with taxonomic information

Method: Naive Bayseian classifierCallahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA and Holmes SP (2016). DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods, 13:581-583. doi: 10.1038/nmeth.3869Wang Q, Garrity GM, Tiedje JM, and Cole JR. (2007). Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and environmental microbiology, 73(16):5261-5267. doi:10.1128/AEM.00062-07 (version: dada2_taxa.R --version): Taxonomic annotations were deteremined using a Naive Bayesian approach included in the DADA2 package.

Current analysis
Return to top

Figure 1 . Summary of major level 2 taxa

Figure 2 . Summary of taxonomic annotation depth

Current analysis
Return to top

Table 1 . Summary of major level 2 taxa

Table 1 . Summary of major level 2 taxa
Taxon	Counts	Percent
Bacteria;Bacillota	534830	54.75%
Bacteria;Bacteroidota	432499	44.28%
Bacteria;Deferribacterota	8675	< 1%
Bacteria;Pseudomonadota	407	< 1%
Bacteria;Actinomycetota	390	< 1%
Unassigned	0	0%

Table 2 . Summary of taxonomic annotation depth - Grouped by level 2 taxa

Table 2 . Summary of taxonomic annotation depth - Grouped by level 2 taxa
Level 2 taxon	Level 5 raw counts	Level 6 raw counts	Level 7 raw counts	Level 5 relative counts	Level 6 relative counts	Level 7 relative counts
Bacillota	17494	503151	14185	1.79%	51.51%	1.45%
Bacteroidota	0	432499	0	-	44.28%	-
Deferribacterota	0	0	8675	-	-	< 1%
Pseudomonadota	0	407	0	-	< 1%	-
Actinomycetota	0	390	0	-	< 1%	-

Method: Taxonomic filter

Data are filtered to remove read counts associated with particular taxa

Custom Parameters

filter = c__Chloroplast,f__mitochondria,D_4__Mitochondria,D_3__Chloroplast,Chloroplast,Mitochondria,Synthetic_Rhodanobacter_Spike-In

Method: Relative sequence abundance

Read counts were normalized as fraction of total sequence counts in each sample

Current analysis
Return to top

Figure 1 . Filtering stats

Figure 2 . PCA plots of normalized data (relative sequence abundance)

Current analysis
Return to top

Table 1 . Filtering stats

Table 1 . Filtering stats
Sample	Original	Filtered	Retained	Percent Retained
Sample10.M10.Fec2	63532	0	63532	100.00 %
Sample11.M12.Fec3	54085	0	54085	100.00 %
Sample12.M13.Fec1	58893	0	58893	100.00 %
Sample13.M14.Fec3	54595	0	54595	100.00 %
Sample14.M15.Fec1	54356	0	54356	100.00 %
Sample15.M16.Fec2	44921	0	44921	100.00 %
Sample16.M18.Fec3	49095	0	49095	100.00 %
Sample17.M19.Fec2	78309	0	78309	100.00 %
Sample18.M20.Fec3	52481	0	52481	100.00 %
Sample19.M23.Fec2	48391	2	48389	100.00 %
Sample20.M24.Fec3	44393	0	44393	100.00 %
Sample21.M25.Fec2	31	0	31	100.00 %
Sample22.M26.Fec1	31815	0	31815	100.00 %
Sample23.M29.Fec3	38696	0	38696	100.00 %
Sample24.M30.Fec3	58840	0	58840	100.00 %
Sample25.M32.Fec2	61600	0	61600	100.00 %
Sample26.M33.Fec2	38762	0	38762	100.00 %
Sample27.M34.Fec3	45315	0	45315	100.00 %
Sample28.M35.Fec3	50424	0	50424	100.00 %
Sample29.M36.Fec3	48267	0	48267	100.00 %