Metagenomic analysis is an extremely powerful approach for investigating the microbial ecology of diverse environments, and a useful tool for accessing genetic diversity for applications in biotechnology [
5,
12]. This genetic diversity can serve as an important source for new biocatalysts with desired characteristics [
13,
14], as well as providing targets for structure-function analysis expanding our understanding of protein structural space [
15,
16]. Environments where biomass turnover rates are high at elevated temperatures, such as compost, are potentially an important source of targeted novel biocatalysts for lignocellulose degradation, one of the major barriers to economically viable biofuel production.
Our work in cloning, expressing, and assaying 22 putative CAZy enzymes has validated the targeted metagenomic approach for GH discovery by identifying 7 active hits from 22 screened ORFs. We used a high-throughput cloning approach using the Gateway system and liquid-handling robots for cloning the 22 ORFs into multiple expression vectors, expression screening, and activity screening. This approach has allowed us to rapidly identify vector and host combinations that result in well-behaved expression of ORFs.
Four of the expressed enzymes were purified and characterized in detail. JMC25406, predicted to be a GH43 family bifunctional β-xylosidase/α-arabinofuranosidase, showed robust activity of 7.6 and 14.3 U/mg protein on model substrates (using 4-nitrophenyl-α-L-arabinofuranoside and 4-nitrophenyl-β-
d-xylopyranoside respectively) for both activities, comparable to literature values as well as commercially available standards. Compared to a previously described GH43 enzyme isolated from compost [
17], JMC25406 exhibited similar pH profile (pH 5.5-7.5), but surprisingly increased thermal stability (by 10°C). Due to its dual functions, high thermal stability, and ease of purification, JMC25406 is an excellent starting point for further protein engineering for enzymatic cocktail development for various potential pretreatment strategies. JMC01245, a GH11 family protein, and JMC37744, a GH10 family protein, were demonstrated to possess the predicted endoxylanase activity. Both enzymes had similar properties; pH optima around 6.5 and stability to 60–70°C. JMC37744 displays activity over a very broad pH range (pH 5–10). JMC09349 was demonstrated to possess the predicted α-fucosidase activity, and has a broad pH optimum around 3–8.5 and modest thermostability up to 50°C. Taken together, the characterized enzymes pH optima and thermal stability profiles are quite consistent with the culture conditions of the compost community [
8], further highlighting the importance of altering culture conditions in metagenomics experiments to enrich for enzymes with desired properties. These data indicate that it may be possible to isolate enzymes with robust activity in specific operating conditions (temperature, pH, solvents) by tailoring matching cultivation conditions and a recent publication on a similar compost community grown under thermophilic conditions found that culture supernatants were enriched for thermotolerant cellulose and hemicellulase activities [
18].
Although metagenomic analysis is a powerful tool for enzyme discovery, the limiting factor for success using this approach remains time-consuming soluble protein expression and purification and biochemical characterization. The development of generalized methods for enzyme purification and assay with increased throughput will be necessary to fully harness the immense genetic diversity present in natural environments.