, 2006) is a competing software package for reverse complementary 16S sequence detection and orientation. The software operates by Selumetinib supplier matching short oligonucleotide sequences at highly conserved positions along the gene and offers a user-friendly interface combined with an impressive processing speed. In order to compare the detection efficiency and reliability of this tool, we processed the bacterial and archaeal full-length, V1-V3 and V1-V2 datasets. The detection efficiency of orientationchecker decreased with decreasing sequence length, showing detection of 100%, 95% and 2% for the full-length,
V1-V3 and V1-V2 datasets, respectively. Although the performance on full-length sequences was somewhat similar to that of v-revcomp, orientationchecker failed to detect the correct orientation of 124 full-length sequences and incorrectly assigned 10 as being reverse complementary. The lack of detection selleck products increased by 5% on V1-V3 sequences when compared with v-revcomp and the tool almost completely failed to detect the shorter V1-V2 sequences. In conclusion, v-revcomp demonstrated superior performance, especially on shorter sequences, and features a more reliable mechanism by screening multiple conserved regions at once. Furthermore, HMMs will be more flexible in detecting deviant sequences than a simple pattern matching using oligonucleotide
sequences. In addition, the command-line nature of v-revcomp facilitates incorporation into automated software pipelines (e.g. Barker et al., 2010; Caporaso et al., 2010), which makes it especially suited to screen HTS datasets. In order to assess the status of reverse complementary sequences in public data repositories, we ran v-revcomp on the 1 113 159 bacterial and 58 487 archaeal 16S sequences of a minimum length of 500 bp that were available in GenBank as of 1 July 2010. The 16S status was determined by screening the GenBank definition line for various synonyms for this gene; therefore, 16S sequences including parts of up- or downstream regions of the gene (e.g.
promoter region, intergenic spacer) were coextracted. A total of 1 158 546 sequences (i.e. 98.9%) were reported by v-revcomp to be in the correct orientation, 9067 (0.8%) in the reverse orientation, 185 (0.02%) were flagged as uncertain and 3848 (0.3%) did not show any HMM detection at all such that no decision ALOX15 was obtained (Fig. 1b). The following reasons accounted for the failure to detect any HMM in the 3848 sequences. In 3437 cases (89.3%), only a very small segment was actually identified as the 16S, whereas most of the sequence information comprised either the intergenic spacer region downstream of the gene (3421 cases) or regions, such as promoters, upstream of the gene (16 cases). In 220 cases (5.7%), the sequences showed only partial, poor or no match to any entry in GenBank as assessed through blast, and are therefore likely to be artefacts created during PCR amplification, sequencing or data processing. In 26 cases (0.