Overview

When you publish manuscripts based on data generated at our facility, we would greatly appreciate an acknowledgement of our efforts. Please cite our facility as follows (for example):

Basic processing of the raw data were performed by the University of Illinois at Chicago Research Informatics Core (UICRIC).

We adhere to a general policy for acknowledgements and authorship as established by the Association for Biomolecular Resource Facilities (ABRF) , and we support the following statement from the ABRF.

The existence of core facilities depends in part on proper acknowledgment in publications. This is an important metric of the value of most core facilities. Proper acknowledgment of core facilities enables them to obtain financial and other support so that they may continue to provide their essential services in the best ways possible. It also helps core personnel to advance in their careers, adding to the overall health of the core facility.

Please contact us for assistance in drafting manuscripts.

This report provides a high-level summary of the basic bioinformatic analysis included with the amplicon sequencing services provided by the UIC Research Resources Center (RRC). The end result of these bioinformatics services is to provide investigators basic information concerning the abundance of taxa present in the samples. The basic bioinformatic analysis includes basic processing of raw sequence data including read merging, adapter & quality trimming, chimeric checking and processing using DADA2 to generate a table of abundance data and associated taxonomic annotations.
There were 20 samples in this project.

Output Files

File Description Type
rep_set_tax_assignments.txt Taxonomic assignment of master sequences result
taxa_raw_counts.zip ZIP archive of taxonomic summaries from phylum to species level result - raw sequence counts result
taxa_raw_counts.xlsx Excel spreadsheet of taxonomic summaries from phylum to species level result - raw sequence counts result
taxa_relative.zip ZIP archive of taxonomic summaries from phylum to species level result - relative sequence abundance result
taxa_relative.xlsx Excel spreadsheet of taxonomic summaries from phylum to species level result - raw sequence counts result
biom-summary.txt Summary statistics of ASV table result
taxa_table.biom Amplicon Sequence Variant (ASV) table, in BIOM format result
sequences.zip ZIP archive of sequences for each sample, after merging, trimming and chimera checking result
rep_set_sequences.zip Compressed FASTA file of representative sequences for ASVs result
Sample OriginalID
Sample10_M10_Fec2 Sample10_M10_Fec2
Sample11_M12_Fec3 Sample11_M12_Fec3
Sample12_M13_Fec1 Sample12_M13_Fec1
Sample13_M14_Fec3 Sample13_M14_Fec3
Sample14_M15_Fec1 Sample14_M15_Fec1
Sample15_M16_Fec2 Sample15_M16_Fec2
Sample16_M18_Fec3 Sample16_M18_Fec3
Sample17_M19_Fec2 Sample17_M19_Fec2
Sample18_M20_Fec3 Sample18_M20_Fec3
Sample19_M23_Fec2 Sample19_M23_Fec2
Sample20_M24_Fec3 Sample20_M24_Fec3
Sample21_M25_Fec2 Sample21_M25_Fec2
Sample22_M26_Fec1 Sample22_M26_Fec1
Sample23_M29_Fec3 Sample23_M29_Fec3
Sample24_M30_Fec3 Sample24_M30_Fec3
Sample25_M32_Fec2 Sample25_M32_Fec2
Sample26_M33_Fec2 Sample26_M33_Fec2
Sample27_M34_Fec3 Sample27_M34_Fec3
Sample28_M35_Fec3 Sample28_M35_Fec3
Sample29_M36_Fec3 Sample29_M36_Fec3

Details

Method: Pear

Forward and reverse reads were merged using PEAR.

Figure 1. Merging results

Table 1. Sequence merging statistics Download table data

Sample Assembled reads Discarded reads Not assembled reads Percent passed
Sample10_M10_Fec2 76474 0 1415 98.18%
Sample11_M12_Fec3 64322 2 777 98.80%
Sample12_M13_Fec1 71175 2 2984 95.97%
Sample13_M14_Fec3 64928 0 1052 98.41%
Sample14_M15_Fec1 63916 0 1485 97.73%
Sample15_M16_Fec2 56417 0 3556 94.07%
Sample16_M18_Fec3 58822 0 1372 97.72%
Sample17_M19_Fec2 94815 0 2222 97.71%
Sample18_M20_Fec3 66613 0 4437 93.76%
Sample19_M23_Fec2 58394 0 1613 97.31%
Sample20_M24_Fec3 58382 0 5214 91.80%
Sample21_M25_Fec2 203 0 11 94.86%
Sample22_M26_Fec1 49922 4 10627 82.44%
Sample23_M29_Fec3 50389 0 5304 90.48%
Sample24_M30_Fec3 72858 0 2391 96.82%
Sample25_M32_Fec2 78046 0 7113 91.65%
Sample26_M33_Fec2 47643 2 3398 93.34%
Sample27_M34_Fec3 56241 0 5758 90.71%
Sample28_M35_Fec3 57480 4 1128 98.07%
Sample29_M36_Fec3 58008 0 2808 95.38%

Details

Method: cutadapt

Sequencing trimming using cutadapt

Custom Parameters
  • -a = ^GTGCCAGCMGCCGCGGTAA...AAACTYAAAKRAATTGRCGG$
  • --trim-n
  • --max-n = 0
  • -q = 20
  • -m = 300
  • --trimmed-only
  • -e = 0.10
  • --report = minimal
Method: Quality trimming

Quality trimming based on quality threshold and length parameters.

Custom Parameters
  • min length = 300
  • p = 20
Method: Adapter trimming

Adapter/primer sequences were trimmed from the reads.

Custom Parameters
  • 5' adapter = GTGCCAGCMGCCGCGGTAA
  • 3' adapter = AAACTYAAAKRAATTGRCGG
Method: Adapter filter

Reads that lack the adpater/primer sequences were discarded.

Method: Abiguous nucleotide trimming

Ambiguous nucleotides (N) were trimmed from the ends and reads with internal ambiguous nucleotides were discarded.

Custom Parameters
  • max-n = 0

Figure 1. Trimming results

Table 1. Trimming statistics Download table data

Sample Trim passed Ambig dropped Trim filter dropped Trim length dropped Percent passed Mean Length Mean Trimmed Length Quality trimmed bp
Sample10_M10_Fec2 70388 6 5217 863 92.04% 409.8 373.2 609
Sample11_M12_Fec3 59805 7 4059 451 92.98% 409.8 372.8 611
Sample12_M13_Fec1 64909 5 4361 1900 91.20% 407.5 372.6 739
Sample13_M14_Fec3 59727 4 4653 544 91.99% 409.8 373.0 772
Sample14_M15_Fec1 59679 4 3722 511 93.37% 409.7 372.5 374
Sample15_M16_Fec2 50096 4 3873 2444 88.80% 405.4 372.9 448
Sample16_M18_Fec3 54471 7 3712 632 92.60% 409.8 373.0 926
Sample17_M19_Fec2 86749 7 6167 1892 91.49% 408.0 372.5 361
Sample18_M20_Fec3 58302 6 4966 3339 87.52% 404.2 372.8 1033
Sample19_M23_Fec2 53168 4 3616 1606 91.05% 406.2 372.5 418
Sample20_M24_Fec3 48734 5 3396 6247 83.47% 396.9 372.9 264
Sample21_M25_Fec2 70 0 120 13 34.48% 377.1 372.7 0
Sample22_M26_Fec1 34723 1 2752 12446 69.55% 380.7 373.0 725
Sample23_M29_Fec3 42065 6 3164 5154 83.48% 398.1 372.5 731
Sample24_M30_Fec3 66066 0 4627 2165 90.68% 407.1 373.0 473
Sample25_M32_Fec2 67228 9 4661 6148 86.14% 400.8 372.8 945
Sample26_M33_Fec2 41979 1 3092 2571 88.11% 403.2 372.2 783
Sample27_M34_Fec3 49432 10 3244 3555 87.89% 403.1 372.7 384
Sample28_M35_Fec3 53167 2 3873 438 92.50% 408.9 372.1 907
Sample29_M36_Fec3 52307 6 4302 1393 90.17% 407.0 372.5 878

Chimeric sequences are artifacts of the PCR process and occur when portions of two separate amplicons fuse during the amplification process. The RRC analysis pipeline uses a standard chimera checking program to identify chimeric sequences and the remove from the dataset

Details

Method: USEARCH reference based

Chimeric sequences were identified using the USEARCH algorithm as compared with a reference database.

Reference sequence database : silva_138.1_16S

Silva v138.1, 16S

Figure 1. Chimera checking results

Table 1. Chimera checking statistics Download table data

Sample Non-chimeras Chimeras Percent passed
Sample10_M10_Fec2 65923 4465 86.20%
Sample11_M12_Fec3 56098 3707 87.21%
Sample12_M13_Fec1 60978 3931 85.67%
Sample13_M14_Fec3 56492 3235 87.01%
Sample14_M15_Fec1 56061 3618 87.71%
Sample15_M16_Fec2 47070 3026 83.43%
Sample16_M18_Fec3 51074 3397 86.83%
Sample17_M19_Fec2 81171 5578 85.61%
Sample18_M20_Fec3 54773 3529 82.23%
Sample19_M23_Fec2 50359 2809 86.24%
Sample20_M24_Fec3 46278 2456 79.27%
Sample21_M25_Fec2 64 6 31.53%
Sample22_M26_Fec1 33102 1621 66.31%
Sample23_M29_Fec3 40097 1968 79.57%
Sample24_M30_Fec3 61430 4636 84.31%
Sample25_M32_Fec2 63827 3401 81.78%
Sample26_M33_Fec2 40006 1973 83.97%
Sample27_M34_Fec3 46583 2849 82.83%
Sample28_M35_Fec3 51697 1470 89.94%
Sample29_M36_Fec3 49863 2444 85.96%

Reduce read count complexity by a number of techniques, e.g. dereplication or sequences clustering.

Details

Method: DADA2 amplicon denoising

Amplicon Sequence Variants (ASVs) were infered using DADA2.

Figure 1. Comparison of observation and sequence counts in each sample


Figure 2. Comparison of samples and sequence counts for ASVs/OTUs

Table 1. Observation (ASV/OTU) and total sequence counts per sample Download table data

Sample Observations Counts
Sample21_M25_Fec2 4 31
Sample22_M26_Fec1 153 32343
Sample23_M29_Fec3 157 39305
Sample26_M33_Fec2 180 39237
Sample15_M16_Fec2 163 46029
Sample20_M24_Fec3 193 45393
Sample19_M23_Fec2 196 49364
Sample16_M18_Fec3 177 49988
Sample18_M20_Fec3 194 53641
Sample11_M12_Fec3 184 55259
Sample13_M14_Fec3 176 55619
Sample14_M15_Fec1 183 55243
Sample12_M13_Fec1 173 59838
Sample10_M10_Fec2 168 64718
Sample24_M30_Fec3 198 60474
Sample25_M32_Fec2 206 62840
Sample17_M19_Fec2 198 79610
Sample27_M34_Fec3 158 45856
Sample29_M36_Fec3 175 48789
Sample28_M35_Fec3 181 50868

Annotate sequences with taxonomic information

Details

Method: Naive Bayseian classifier

Taxonomic annotations were deteremined using a Naive Bayesian approach included in the DADA2 package.

Reference sequence database : silva_138.1_16S

Silva v138.1, 16S

Figure 1. Summary of major level 2 taxa


Figure 2. Summary of taxonomic annotation depth

Table 1. Summary of major level 2 taxa Download table data

Taxon Counts Percent
Bacteria;Firmicutes 549249 55.23%
Bacteria;Bacteroidota 435706 43.81%
Bacteria;Deferribacterota 8689 < 1%
Bacteria;Proteobacteria 407 < 1%
Bacteria;Actinobacteriota 392 < 1%
Bacteria;Other 2 < 1%
Unassigned 0 0%

Table 2. Summary of taxonomic annotation depth - Grouped by level 2 taxa Download table data

Level 2 taxon Level 0 raw counts Level 4 raw counts Level 5 raw counts Level 6 raw counts Level 7 raw counts Level 0 relative counts Level 4 relative counts Level 5 relative counts Level 6 relative counts Level 7 relative counts
Firmicutes 0 66051 150860 310691 21647 - 6.64% 15.17% 31.24% 2.18%
Bacteroidota 0 0 435613 93 0 - - 43.80% < 1% -
Deferribacterota 0 0 0 0 8689 - - - - < 1%
Proteobacteria 0 0 2 405 0 - - < 1% < 1% -
Actinobacteriota 0 0 0 392 0 - - - < 1% -
Unassigned 2 0 0 0 0 < 1% - - - -

Details

Method: Taxonomic filter

Data are filtered to remove read counts associated with particular taxa

Custom Parameters
  • filter = c__Chloroplast,f__mitochondria,D_4__Mitochondria,D_3__Chloroplast,Chloroplast,Mitochondria,Synthetic_Rhodanobacter_Spike-In
Method: Relative sequence abundance

Read counts were normalized as fraction of total sequence counts in each sample

Figure 1. Filtering stats


Figure 2. PCA plots of normalized data (relative sequence abundance)

Table 1. Filtering stats Download table data

Sample Original Filtered Retained Percent Retained
Sample10_M10_Fec2 64718 0 64718 100.00 %
Sample11_M12_Fec3 55259 0 55259 100.00 %
Sample12_M13_Fec1 59838 0 59838 100.00 %
Sample13_M14_Fec3 55619 0 55619 100.00 %
Sample14_M15_Fec1 55243 0 55243 100.00 %
Sample15_M16_Fec2 46029 0 46029 100.00 %
Sample16_M18_Fec3 49988 0 49988 100.00 %
Sample17_M19_Fec2 79610 0 79610 100.00 %
Sample18_M20_Fec3 53641 0 53641 100.00 %
Sample19_M23_Fec2 49364 2 49362 100.00 %
Sample20_M24_Fec3 45393 0 45393 100.00 %
Sample21_M25_Fec2 31 0 31 100.00 %
Sample22_M26_Fec1 32343 0 32343 100.00 %
Sample23_M29_Fec3 39305 0 39305 100.00 %
Sample24_M30_Fec3 60474 0 60474 100.00 %
Sample25_M32_Fec2 62840 0 62840 100.00 %
Sample26_M33_Fec2 39237 0 39237 100.00 %
Sample27_M34_Fec3 45856 0 45856 100.00 %
Sample28_M35_Fec3 50868 0 50868 100.00 %
Sample29_M36_Fec3 48789 0 48789 100.00 %

Citations