Amplicon sequencing processing
Generated by: George Edward Chlipala
Report date: May 31, 2023
Overview
When you publish manuscripts based on data generated at our facility, we would greatly appreciate an acknowledgement of our efforts. Please cite our facility as follows (for example):
Basic processing of the raw data were performed by the University of Illinois at Chicago Research Informatics Core (UICRIC).
We adhere to a general policy for acknowledgements and authorship as established by the Association for Biomolecular Resource Facilities (ABRF) , and we support the following statement from the ABRF.
The existence of core facilities depends in part on proper acknowledgment in publications. This is an important metric of the value of most core facilities. Proper acknowledgment of core facilities enables them to obtain financial and other support so that they may continue to provide their essential services in the best ways possible. It also helps core personnel to advance in their careers, adding to the overall health of the core facility.
Please contact us for assistance in drafting manuscripts.
Output Files
File | Description | Type |
---|---|---|
rep_set_tax_assignments.txt | Taxonomic assignment of master sequences | result |
taxa_raw_counts.zip | ZIP archive of taxonomic summaries from phylum to species level result - raw sequence counts | result |
taxa_raw_counts.xlsx | Excel spreadsheet of taxonomic summaries from phylum to species level result - raw sequence counts | result |
taxa_relative.zip | ZIP archive of taxonomic summaries from phylum to species level result - relative sequence abundance | result |
taxa_relative.xlsx | Excel spreadsheet of taxonomic summaries from phylum to species level result - raw sequence counts | result |
biom-summary.txt | Summary statistics of ASV table | result |
taxa_table.biom | Amplicon Sequence Variant (ASV) table, in BIOM format | result |
sequences.zip | ZIP archive of sequences for each sample, after merging, trimming and chimera checking | result |
rep_set_sequences.zip | Compressed FASTA file of representative sequences for ASVs | result |
Sample | OriginalID |
---|---|
Sample10_M10_Fec2 | Sample10_M10_Fec2 |
Sample11_M12_Fec3 | Sample11_M12_Fec3 |
Sample12_M13_Fec1 | Sample12_M13_Fec1 |
Sample13_M14_Fec3 | Sample13_M14_Fec3 |
Sample14_M15_Fec1 | Sample14_M15_Fec1 |
Sample15_M16_Fec2 | Sample15_M16_Fec2 |
Sample16_M18_Fec3 | Sample16_M18_Fec3 |
Sample17_M19_Fec2 | Sample17_M19_Fec2 |
Sample18_M20_Fec3 | Sample18_M20_Fec3 |
Sample19_M23_Fec2 | Sample19_M23_Fec2 |
Sample20_M24_Fec3 | Sample20_M24_Fec3 |
Sample21_M25_Fec2 | Sample21_M25_Fec2 |
Sample22_M26_Fec1 | Sample22_M26_Fec1 |
Sample23_M29_Fec3 | Sample23_M29_Fec3 |
Sample24_M30_Fec3 | Sample24_M30_Fec3 |
Sample25_M32_Fec2 | Sample25_M32_Fec2 |
Sample26_M33_Fec2 | Sample26_M33_Fec2 |
Sample27_M34_Fec3 | Sample27_M34_Fec3 |
Sample28_M35_Fec3 | Sample28_M35_Fec3 |
Sample29_M36_Fec3 | Sample29_M36_Fec3 |
Details
- Method: Pear
-
Forward and reverse reads were merged using PEAR.
Figure 1. Merging results
Table 1. Sequence merging statistics Download table data
Sample | Assembled reads | Discarded reads | Not assembled reads | Percent passed |
---|---|---|---|---|
Sample10_M10_Fec2 | 76474 | 0 | 1415 | 98.18% |
Sample11_M12_Fec3 | 64322 | 2 | 777 | 98.80% |
Sample12_M13_Fec1 | 71175 | 2 | 2984 | 95.97% |
Sample13_M14_Fec3 | 64928 | 0 | 1052 | 98.41% |
Sample14_M15_Fec1 | 63916 | 0 | 1485 | 97.73% |
Sample15_M16_Fec2 | 56417 | 0 | 3556 | 94.07% |
Sample16_M18_Fec3 | 58822 | 0 | 1372 | 97.72% |
Sample17_M19_Fec2 | 94815 | 0 | 2222 | 97.71% |
Sample18_M20_Fec3 | 66613 | 0 | 4437 | 93.76% |
Sample19_M23_Fec2 | 58394 | 0 | 1613 | 97.31% |
Sample20_M24_Fec3 | 58382 | 0 | 5214 | 91.80% |
Sample21_M25_Fec2 | 203 | 0 | 11 | 94.86% |
Sample22_M26_Fec1 | 49922 | 4 | 10627 | 82.44% |
Sample23_M29_Fec3 | 50389 | 0 | 5304 | 90.48% |
Sample24_M30_Fec3 | 72858 | 0 | 2391 | 96.82% |
Sample25_M32_Fec2 | 78046 | 0 | 7113 | 91.65% |
Sample26_M33_Fec2 | 47643 | 2 | 3398 | 93.34% |
Sample27_M34_Fec3 | 56241 | 0 | 5758 | 90.71% |
Sample28_M35_Fec3 | 57480 | 4 | 1128 | 98.07% |
Sample29_M36_Fec3 | 58008 | 0 | 2808 | 95.38% |
Details
- Method: cutadapt
-
Sequencing trimming using cutadapt - -a = ^GTGCCAGCMGCCGCGGTAA...AAACTYAAAKRAATTGRCGG$
- --trim-n
- --max-n = 0
- -q = 20
- -m = 300
- --trimmed-only
- -e = 0.10
- --report = minimal
- Method: Quality trimming
-
Quality trimming based on quality threshold and length parameters. - min length = 300
- p = 20
- Method: Adapter trimming
-
Adapter/primer sequences were trimmed from the reads. - 5' adapter = GTGCCAGCMGCCGCGGTAA
- 3' adapter = AAACTYAAAKRAATTGRCGG
- Method: Adapter filter
-
Reads that lack the adpater/primer sequences were discarded. - Method: Abiguous nucleotide trimming
-
Ambiguous nucleotides (N) were trimmed from the ends and reads with internal ambiguous nucleotides were discarded. - max-n = 0
Figure 1. Trimming results
Table 1. Trimming statistics Download table data
Sample | Trim passed | Ambig dropped | Trim filter dropped | Trim length dropped | Percent passed | Mean Length | Mean Trimmed Length | Quality trimmed bp |
---|---|---|---|---|---|---|---|---|
Sample10_M10_Fec2 | 70388 | 6 | 5217 | 863 | 92.04% | 409.8 | 373.2 | 609 |
Sample11_M12_Fec3 | 59805 | 7 | 4059 | 451 | 92.98% | 409.8 | 372.8 | 611 |
Sample12_M13_Fec1 | 64909 | 5 | 4361 | 1900 | 91.20% | 407.5 | 372.6 | 739 |
Sample13_M14_Fec3 | 59727 | 4 | 4653 | 544 | 91.99% | 409.8 | 373.0 | 772 |
Sample14_M15_Fec1 | 59679 | 4 | 3722 | 511 | 93.37% | 409.7 | 372.5 | 374 |
Sample15_M16_Fec2 | 50096 | 4 | 3873 | 2444 | 88.80% | 405.4 | 372.9 | 448 |
Sample16_M18_Fec3 | 54471 | 7 | 3712 | 632 | 92.60% | 409.8 | 373.0 | 926 |
Sample17_M19_Fec2 | 86749 | 7 | 6167 | 1892 | 91.49% | 408.0 | 372.5 | 361 |
Sample18_M20_Fec3 | 58302 | 6 | 4966 | 3339 | 87.52% | 404.2 | 372.8 | 1033 |
Sample19_M23_Fec2 | 53168 | 4 | 3616 | 1606 | 91.05% | 406.2 | 372.5 | 418 |
Sample20_M24_Fec3 | 48734 | 5 | 3396 | 6247 | 83.47% | 396.9 | 372.9 | 264 |
Sample21_M25_Fec2 | 70 | 0 | 120 | 13 | 34.48% | 377.1 | 372.7 | 0 |
Sample22_M26_Fec1 | 34723 | 1 | 2752 | 12446 | 69.55% | 380.7 | 373.0 | 725 |
Sample23_M29_Fec3 | 42065 | 6 | 3164 | 5154 | 83.48% | 398.1 | 372.5 | 731 |
Sample24_M30_Fec3 | 66066 | 0 | 4627 | 2165 | 90.68% | 407.1 | 373.0 | 473 |
Sample25_M32_Fec2 | 67228 | 9 | 4661 | 6148 | 86.14% | 400.8 | 372.8 | 945 |
Sample26_M33_Fec2 | 41979 | 1 | 3092 | 2571 | 88.11% | 403.2 | 372.2 | 783 |
Sample27_M34_Fec3 | 49432 | 10 | 3244 | 3555 | 87.89% | 403.1 | 372.7 | 384 |
Sample28_M35_Fec3 | 53167 | 2 | 3873 | 438 | 92.50% | 408.9 | 372.1 | 907 |
Sample29_M36_Fec3 | 52307 | 6 | 4302 | 1393 | 90.17% | 407.0 | 372.5 | 878 |
Details
- Method: USEARCH reference based
-
Chimeric sequences were identified using the USEARCH algorithm as compared with a reference database. - Reference sequence database : silva_138.1_16S
-
Silva v138.1, 16S
Figure 1. Chimera checking results
Table 1. Chimera checking statistics Download table data
Sample | Non-chimeras | Chimeras | Percent passed |
---|---|---|---|
Sample10_M10_Fec2 | 65923 | 4465 | 86.20% |
Sample11_M12_Fec3 | 56098 | 3707 | 87.21% |
Sample12_M13_Fec1 | 60978 | 3931 | 85.67% |
Sample13_M14_Fec3 | 56492 | 3235 | 87.01% |
Sample14_M15_Fec1 | 56061 | 3618 | 87.71% |
Sample15_M16_Fec2 | 47070 | 3026 | 83.43% |
Sample16_M18_Fec3 | 51074 | 3397 | 86.83% |
Sample17_M19_Fec2 | 81171 | 5578 | 85.61% |
Sample18_M20_Fec3 | 54773 | 3529 | 82.23% |
Sample19_M23_Fec2 | 50359 | 2809 | 86.24% |
Sample20_M24_Fec3 | 46278 | 2456 | 79.27% |
Sample21_M25_Fec2 | 64 | 6 | 31.53% |
Sample22_M26_Fec1 | 33102 | 1621 | 66.31% |
Sample23_M29_Fec3 | 40097 | 1968 | 79.57% |
Sample24_M30_Fec3 | 61430 | 4636 | 84.31% |
Sample25_M32_Fec2 | 63827 | 3401 | 81.78% |
Sample26_M33_Fec2 | 40006 | 1973 | 83.97% |
Sample27_M34_Fec3 | 46583 | 2849 | 82.83% |
Sample28_M35_Fec3 | 51697 | 1470 | 89.94% |
Sample29_M36_Fec3 | 49863 | 2444 | 85.96% |
Details
- Method: DADA2 amplicon denoising
-
Amplicon Sequence Variants (ASVs) were infered using DADA2.
Figure 1. Comparison of observation and sequence counts in each sample
Figure 2. Comparison of samples and sequence counts for ASVs/OTUs
Table 1. Observation (ASV/OTU) and total sequence counts per sample Download table data
Sample | Observations | Counts |
---|---|---|
Sample21_M25_Fec2 | 4 | 31 |
Sample22_M26_Fec1 | 153 | 32343 |
Sample23_M29_Fec3 | 157 | 39305 |
Sample26_M33_Fec2 | 180 | 39237 |
Sample15_M16_Fec2 | 163 | 46029 |
Sample20_M24_Fec3 | 193 | 45393 |
Sample19_M23_Fec2 | 196 | 49364 |
Sample16_M18_Fec3 | 177 | 49988 |
Sample18_M20_Fec3 | 194 | 53641 |
Sample11_M12_Fec3 | 184 | 55259 |
Sample13_M14_Fec3 | 176 | 55619 |
Sample14_M15_Fec1 | 183 | 55243 |
Sample12_M13_Fec1 | 173 | 59838 |
Sample10_M10_Fec2 | 168 | 64718 |
Sample24_M30_Fec3 | 198 | 60474 |
Sample25_M32_Fec2 | 206 | 62840 |
Sample17_M19_Fec2 | 198 | 79610 |
Sample27_M34_Fec3 | 158 | 45856 |
Sample29_M36_Fec3 | 175 | 48789 |
Sample28_M35_Fec3 | 181 | 50868 |
Details
- Method: Naive Bayseian classifier
-
Taxonomic annotations were deteremined using a Naive Bayesian approach included in the DADA2 package. - Reference sequence database : silva_138.1_16S
-
Silva v138.1, 16S
Figure 1. Summary of major level 2 taxa
Figure 2. Summary of taxonomic annotation depth
Table 1. Summary of major level 2 taxa Download table data
Taxon | Counts | Percent |
---|---|---|
Bacteria;Firmicutes | 549249 | 55.23% |
Bacteria;Bacteroidota | 435706 | 43.81% |
Bacteria;Deferribacterota | 8689 | < 1% |
Bacteria;Proteobacteria | 407 | < 1% |
Bacteria;Actinobacteriota | 392 | < 1% |
Bacteria;Other | 2 | < 1% |
Unassigned | 0 | 0% |
Table 2. Summary of taxonomic annotation depth - Grouped by level 2 taxa Download table data
Level 2 taxon | Level 0 raw counts | Level 4 raw counts | Level 5 raw counts | Level 6 raw counts | Level 7 raw counts | Level 0 relative counts | Level 4 relative counts | Level 5 relative counts | Level 6 relative counts | Level 7 relative counts |
---|---|---|---|---|---|---|---|---|---|---|
Firmicutes | 0 | 66051 | 150860 | 310691 | 21647 | - | 6.64% | 15.17% | 31.24% | 2.18% |
Bacteroidota | 0 | 0 | 435613 | 93 | 0 | - | - | 43.80% | < 1% | - |
Deferribacterota | 0 | 0 | 0 | 0 | 8689 | - | - | - | - | < 1% |
Proteobacteria | 0 | 0 | 2 | 405 | 0 | - | - | < 1% | < 1% | - |
Actinobacteriota | 0 | 0 | 0 | 392 | 0 | - | - | - | < 1% | - |
Unassigned | 2 | 0 | 0 | 0 | 0 | < 1% | - | - | - | - |
Details
- Method: Taxonomic filter
-
Data are filtered to remove read counts associated with particular taxa - filter = c__Chloroplast,f__mitochondria,D_4__Mitochondria,D_3__Chloroplast,Chloroplast,Mitochondria,Synthetic_Rhodanobacter_Spike-In
- Method: Relative sequence abundance
-
Read counts were normalized as fraction of total sequence counts in each sample
Figure 1. Filtering stats
Figure 2. PCA plots of normalized data (relative sequence abundance)
Table 1. Filtering stats Download table data
Sample | Original | Filtered | Retained | Percent Retained |
---|---|---|---|---|
Sample10_M10_Fec2 | 64718 | 0 | 64718 | 100.00 % |
Sample11_M12_Fec3 | 55259 | 0 | 55259 | 100.00 % |
Sample12_M13_Fec1 | 59838 | 0 | 59838 | 100.00 % |
Sample13_M14_Fec3 | 55619 | 0 | 55619 | 100.00 % |
Sample14_M15_Fec1 | 55243 | 0 | 55243 | 100.00 % |
Sample15_M16_Fec2 | 46029 | 0 | 46029 | 100.00 % |
Sample16_M18_Fec3 | 49988 | 0 | 49988 | 100.00 % |
Sample17_M19_Fec2 | 79610 | 0 | 79610 | 100.00 % |
Sample18_M20_Fec3 | 53641 | 0 | 53641 | 100.00 % |
Sample19_M23_Fec2 | 49364 | 2 | 49362 | 100.00 % |
Sample20_M24_Fec3 | 45393 | 0 | 45393 | 100.00 % |
Sample21_M25_Fec2 | 31 | 0 | 31 | 100.00 % |
Sample22_M26_Fec1 | 32343 | 0 | 32343 | 100.00 % |
Sample23_M29_Fec3 | 39305 | 0 | 39305 | 100.00 % |
Sample24_M30_Fec3 | 60474 | 0 | 60474 | 100.00 % |
Sample25_M32_Fec2 | 62840 | 0 | 62840 | 100.00 % |
Sample26_M33_Fec2 | 39237 | 0 | 39237 | 100.00 % |
Sample27_M34_Fec3 | 45856 | 0 | 45856 | 100.00 % |
Sample28_M35_Fec3 | 50868 | 0 | 50868 | 100.00 % |
Sample29_M36_Fec3 | 48789 | 0 | 48789 | 100.00 % |