Research Article
A subfamily classification to choreograph the diverse activities within glycoside hydrolase family 31

https://doi.org/10.1016/j.jbc.2023.103038Get rights and content
Under a Creative Commons license
open access

The Carbohydrate-Active Enzyme classification groups enzymes that breakdown, assemble, or decorate glycans into protein families based on sequence similarity. The glycoside hydrolases (GH) are arranged into over 170 enzyme families, with some being very large and exhibiting distinct activities/specificities towards diverse substrates. Family GH31 is a large family that contains more than 20,000 sequences with a wide taxonomic diversity. Less than 1% of GH31 members are biochemically characterized and exhibit many different activities that include glycosidases, lyases, and transglycosidases. This diversity of activities limits our ability to predict the activities and roles of GH31 family members in their host organism and our ability to exploit these enzymes for practical purposes. Here, we established a subfamily classification using sequence similarity networks that was further validated by a structural analysis. While sequence similarity networks provide a sequence-based separation, we obtained good segregation between activities among the subfamilies. Our subclassification consists of 20 subfamilies with sixteen subfamilies containing at least one characterized member and eleven subfamilies that are monofunctional based on the available data. We also report the biochemical characterization of a member of the large subfamily 2 (GH31_2) that lacked any characterized members: RaGH31 from Rhodoferax aquaticus is an α-glucosidase with activity on a range of disaccharides including sucrose, trehalose, maltose, and nigerose. Our subclassification provides improved predictive power for the vast majority of uncharacterized proteins in family GH31 and highlights the remaining sequence space that remains to be functionally explored.

Keywords

glycosidase
enzyme
bioinformatics
glycobiology
kinetics

Abbreviations

CAZy
Carbohydrate Active enzyme
CBM
carbohydrate-binding module
EC
enzyme commission
GH
glycoside hydrolase (family)
6GT
6-α-glucosyltransferase
HMM
hidden Markov model
PDB
protein databank
SI
sucrase-isomaltase
SSN
sequence similarity network

Cited by (0)