These connected components were then counted to determine the size of the core proteome. It is important to note that the size of the core proteome was defined in terms of the HSP targets number of orthologous groups, not in terms of the total number of individual proteins (from one specific
organism) in those www.selleckchem.com/products/GSK1904529A.html groups. For example, suppose that we were finding the size of the core proteome for a genus with eight isolates, and that there were 500 orthologous groups containing proteins from all eight of those isolates. Further, suppose that each of these groups actually contained ten individual proteins (say, with six isolates having one protein each, and two isolates having two each). Then the size of the core proteome would be reported as 500, not as 500 × 10 = 5000. Unique proteomes were found BKM120 supplier in a similar manner–by counting the number of connected components that contained proteins from all members of a particular group, but in no members of a second group. Finally, the number of singlets in a particular genus was found by performing orthologue detection on the proteins from that genus (only), and identifying the number of connected components containing
just a single protein. Most comparisons done in this study involved a fairly small number of isolates (and therefore proteins). For example, finding the core proteome of a particular genus involved performing orthologue detection for the isolates of that genus (between 4 and 31 isolates, depending on the genus), each
of which had a proteome containing around 1000 to 9000 proteins. However, one type of comparison–finding the proteins unique to each genus–required finding orthologues among all proteins in the proteomes of all isolates used in this study. Due to memory constraints, this could not be done using a single orthologue detection comparison. Instead, comparisons were performed between all possible pairs of genera. Lenvatinib molecular weight For example, in finding the proteins unique to genus A, we first determined the list of proteins in all isolates of genus A, but no isolates of genus B; we then determined the list of proteins found in all isolates of A, but no isolates of C, and so on. Once all lists had been calculated, the proteins that were present in every list were the proteins unique to genus A. Comparison of proteomic similarity with 16S rRNA gene similarity To determine 16S rRNA gene percent identities, the 16S rRNA gene was obtained from each sequenced genome used in this study and the RDP10 tool [49] was used to align sequences based on known conserved and variable regions according to the rRNA’s secondary structure. The percent identity of the 16S rRNA gene between pairs of isolates from the same genus was calculated to the nearest 0.01%.