The human microbiome refers to the community of microorganisms including prokaryotes, viruses and microbial eukaryotes that populate the human body. The National Institutes of Health launched an initiative that focuses describing the diversity of microbial species associated with health and disease. The first phase of this initiative includes the sequencing of hundreds of microbial reference genomes, coupled to metagenomic sequencing from multiple body sites. Here we present results from an initial reference genome sequencing of 178 microbial genomes. From 547,968 predicted polypeptides that correspond to the gene complement of these strains “novel” polypeptides that had both unmasked sequence length > 100 amino acids and no BLASTP match to any non-reference entry in the nr subset were defined. This analysis resulted in a set of 30,867 polypeptides, of which 29,987 (~97%) were unique. In addition, this set of microbial genomes allows for ~ 40% of random sequences from the microbiome of the gastrointestinal tract to be associated with organisms based on the match criteria used. Insights into pan-genome analysis suggest that we are still far from saturating microbial species genetic datasets. In addition, the associated metrics and standards used by the group for quality assurance are presented.