领先的Cas蛋白、LAMP检测技术、原料酶提供商

用宏基因组分析15万个基因组，揭示大量的未经探索的人类微生物群

Pasolli et al

2019-01-20

广州美格生物在微生物研究领域有着丰富的经验。咨询热线：18027152056

人类的微生物群寄居了许多未知的种类。尽管最近使用多种独立培养的分子技术，对人类微生物群进行了广泛的研究，但这些生态系统的大多数特征仍然集中在易于培育的微生物上，尤其是考虑到那些有序列的分离基因组。

我们通过对来自不同种群的样本进行大规模的宏基因组组装，研究了来自4930个物种的超过150000个微生物基因组数据。许多种类（77%）以前从未被述过，增加了宏基因组的可测绘性，并扩大了我们对全球人体微生物群的理解。

全身体的人类微生物群在健康方面发挥着作用，但其完全多样性特性仍然不清楚，特别是在肠道外和国际人群中。

我们利用9428个宏基因组构建了154723个不同身体部位、年龄、国家和生活方式的微生物基因组（高质量的占45%）。

我们总结了4930个物种水平的基因组分箱（SGB），77%的在公共存储库中没有基因组（未知的SGB，uSGB）。

uSGB是主流（在93%的组装好的样本中），扩展了没有被代表的门类，，并且在非西化人群中丰富（占总SGB的40%）。

我们在SGB中注释了2.85 M的基因，其中许多与婴儿发育（94000个）或西化（106000个）等条件有关。

SGB和uSGB可以进行更深入的微生物组分析，在肠道中，宏基因组reads数据的平均可测绘能力从67.76%增加到87.51%，中位数是94.26%，而口腔中，则从65.14%增加到82.34%。

我们从尚未命名的物种中识别出数千个微生物基因组，扩大了与人类相关微生物的泛基因组，并可以更好地开发宏基因组技术。

我们采用宏基因组测序分析，利用严格质量控制（包括基于核苷酸多态性的筛选）支持的大规模单样本宏基因组组装，没有以前的全基因组信息数据，我们识别了3796个物种级的分支（包括34205个基因组）。

这确定了一些普遍存在、但以前未被发现的类群，甚至是在一些数据图形分明的种群中，例如，一个属级的瘤胃球菌科分支，大量的与非西方人群相关的、分类上未被标记的物种，以及在口腔和肠道微生物群中存在一些来自采样不足的门（例如，糖精菌和隐孢子虫）的类群。

因此，由此产生的基因组集可以作为未来株特异性比较基因组学的基础，将人类微生物群的变异与全球环境暴露和健康结果联系起来。

我们采用了一种大规模的宏基因组组装方法来重建填充人类微生物群的细菌和古细菌基因组（见STAR方法）。

利用总共9316个宏基因组、跨越来自多个人群、身体部位和宿主年龄的46个数据集，和来自马达加斯加的一个额外队列（STAR法），我们使用定制的单样本装配策略,　重建了共154723个基因组（见STAR方法），该策略旨在最大限度地提高质量，而不是从每个样本重建的基因组数量。

由此产生的目录极大地扩展了15万个已公开的微生物基因组。

所有组装的基因组数据都通过了严格的质量控制，包括完整性评估、污染和菌株异质性测量，并且它们超过了根据Bowers等人的最新指导方针，完整性>50%，污染<5%，定义的中等质量（MQ）的阈值。

这些基因组的质量与分离测序的质量相当，也与人工组织的宏基因组方法和时间序列或横截面的宏基因组结合所能达到的质量一致。

基因组可能包括质粒重叠群，更严格的质量控制将接近完整的、高质量（HQ）基因组减少到70178，完整性高于90%，降低了样本内菌株异质性的概率（多态位置<0.5%，参见STAR方法）。

HQ基因组的主要特征是一致的，并且在某些情况下比公共存储库中提供的参考基因组概要中的特征更好，尽管与HQ基因组相比，MQ基因组也具有相似的质量分数（模完整性；STAR方法）。

因此，我们重建的基因组集（数据和软件可用性）和相关的285万总功能注释是更深入进行微生物群落分析的基础。

这项工作扩大了与人类微生物群相关的微生物基因组的收集范围，15万多个新重建的基因组将目前收集的扩大了一倍以上，在这个过程中，重新发现了与全球种群相关的一些隐藏功能和系统发育多样性。（尤其是那些非西方生活方式和非肠道区域采样不足的人群，图1e）。

超过94%的宏基因组读数现在可以映射到一半肠道微生物群的扩展基因组目录，从而对这些群落进行更全面的分析。

这里采用的宏基因组装配策略代表了一种可扩展的方法，用于大规模整合宏基因组，我们广泛验证了这种方法（STAR方法；图7和S7），并可有效地应用于额外的或非人类相关的宏基因组。

这些方法也与新兴技术兼容，例如合成或单分子长读测序，这将进一步增加微生物基因组的多样性。

最后，研究结果本身强调了稀有生物的系统发育和功能多样性，特别是粪便以外的样本类型、全球人类种群和人类微生物群的不同生活方式。

即使在当前的数据收集中，仍有许多结果有待探索。部分无法与我们的扩展细菌和古细菌资源对应的宏基因组的reads数据可能来自病毒和真核基因组。

例如，我们发现了大量的病毒（101个样本中，相对读取深度大于0.5%的噬菌体，在参考细菌基因组中从未发现）、肠道真核寄生虫胚泡（158个样本中，大于0.5%）和皮肤真菌马拉色菌（297个样本中，大于0.5%）。

非细菌基因组的重新发现是非常具有挑战性的，在未来应该受到更多的关注，真核微生物和病毒可能会在这些数据中解释一些剩余的不可复制序列.

这些结果有助于确定特定人群、环境或暴露环境中特有的微生物，最重要的是，未来的研究可能能够更容易地捕获特定菌株或微生物分子机制，这些菌株或微生物分子机制导致微生物组与人类健康状况相关。

STAR 方法

REAGENT or RESOURCE SOURCE IDENTIFIER

Biological Samples

Stool samples from Madagascar cohort Golden et al., 2017 N/A

Stool samples from Ethiopian cohort This paper N/A

Critical Commercial Assays

PowerSoil DNA Isolation Kit MoBio Laboratories Carlsbad, USA Catalog No. 12888-50

NexteraXT DNA Library Preparation Kit Illumina, California, USA FC-131-1096

Deposited Data

Raw sequencing data (Madagascar cohort) This paper NCBI-SRA BioProject: PRJNA485056

Raw sequencing data (Ethiopian cohort) This paper NCBI-SRA BioProject: PRJNA504891

Data for all genomes This paper http://segatalab.cibio.unitn.it/data/Pasolli_et_al.html

Representative genome for Ca. Cibiobacter This paper DDBJ/ENA/GenBank accession SAUS00000000

qucibialis

Software and Algorithms

metaSPAdes (version 3.10.1) Nurk et al., 2017 https://github.com/ablab/spades/releases

MEGAHIT (version 1.1.1) Li et al., 2015 https://github.com/voutcn/megahit

MetaBAT2 (version 2.12.1) Kang et al., 2015 https://bitbucket.org/berkeleylab/metabat

CheckM (version 1.0.7) Parks et al., 2015 https://github.com/Ecogenomics/CheckM

CMSeq (version 1.0.0) This study https://bitbucket.org/CibioCM/cmseq

Mash (version 2.0) Ondov et al., 2016 https://github.com/marbl/Mash

MetaPhlAn2 (version 2.0) Segata et al., 2012b; Truong et al., 2015 https://bitbucket.org/biobakery/metaphlan2

HUMANn2 (version 0.7.1) Franzosa et al., 2018 https://bitbucket.org/biobakery/humann2/

Bowtie2 (version 2.2.9) Langmead and Salzberg, 2012 https://github.com/BenLangmead/bowtie2

Prodigal (version 2.6.3) https://github.com/hyattpd/Prodigal

Pyani (version 0.2.6) Pritchard et al., 2016 https://github.com/widdowquinn/pyani

StrainPhlAn (version 2.0.0) Truong et al., 2017 https://bitbucket.org/biobakery/metaphlan2

Anvi’o (version 4) Eren et al., 2015 https://github.com/merenlab/anvio

BWA (version 0.7.17) Li and Durbin, 2009 https://github.com/lh3/bwa

CONCOCT (version 0.5.0) Alneberg et al., 2014 https://github.com/BinPro/CONCOCT

RPSBlast Marchler-Bauer et al., 2003 ftp://ftp.ncbi.nih.gov/blast/executables/

PhyloPhlAn (version dev, 0.25) Segata et al., 2013 https://bitbucket.org/nsegata/phylophlan

Diamond (version 0.9.9.110) Buchfink et al., 2015 https://github.com/bbuchfink/diamond

mafft (version 7.310) Katoh and Standley, 2013 https://github.com/The-Bioinformatics-Group/Albiorix/wiki/mafft

trimal (version 1.2rev59) Capella-Gutie´rrez et al., 2009 https://github.com/scapella/trimal

RAxML (version 8.1.15) Stamatakis, 2014 https://github.com/stamatak/standard-RAxML

IQ-TREE (version 1.6.6) Nguyen et al., 2015 https://github.com/Cibiv/IQ-TREE

Roary (version 3.8) Page et al., 2015 https://github.com/sanger-pathogens/Roary

blastn (version 2.6.0+) Altschul et al., 1990 ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast

FastTree (version 2.1.9) Price et al., 2010 https://github.com/PavelTorgashov/FastTree

ecodist R package Goslee and Urban, 2007 https://github.com/cran/ecodist

GraPhlAn (version 1.1.3) Asnicar et al., 2015 https://bitbucket.org/nsegata/graphlan/

FigTree (version 1.4.3) N/A http://tree.bio.ed.ac.uk/software/figtree/

Prokka (version 1.12) Seemann, 2014 https://github.com/tseemann/prokka

EggNOG mapper (version 1.0.3) Huerta-Cepas et al., 2017 https://github.com/jhcepas/eggnog-mapper

HMM Eddy, 2011 https://github.com/guyz/HMM

Barrnap (version 0.9) N/A https://github.com/tseemann/barrnap

RDP (version 2.11) Cole et al., 2014; Wang et al., 2007 https://github.com/rdpstaff/classifier

Other

curatedMetagenomicData Pasolli et al., 2017 https://github.com/waldronlab/

curatedMetagenomicData

UniProt The UniProt Consortium, 2017 https://github.com/ebi-uniprot

NCBI GenBank database NCBI Resource Coordinators, 2013 https://www.ncbi.nlm.nih.gov/genbank/

RefSeq (viral genomes and plasmids) ister et al., 2015; O’Leary et al., 2016 https://www.ncbi.nlm.nih.gov/refseq/

图1. 4,930 SGBs Assembled from 9,428 Meta-analyzed Body-wide Metagenomes

(A) A human-associated microbial phylogeny of representative genomes from each species-level genome bin (SGB). Figure S3A reports the same phylogeny but

including isolate genomes not found in the human-associated metagenomes.

(B) Overlap of SGBs containing both existing microbial genomes (including other metagenomic assemblies) and genomes reconstructed here (kSGBs), SGBs

with only genomes reconstructed here and without existing isolate or metagenomically assembled genomes (uSGBs), and SGBs with only existing genomes and

no genomes from our metagenomic assembly of human microbiomes (non-human SGBs).

(C) Many SGBs contain no genomes from sequenced isolates or publicly available metagenomic assemblies (uSGBs). Only SGBs containing >10 genomes

are shown.

(D) Fraction of uSGBs and kSGBs as a function of the size of the SGBs (i.e., number of genomes in the SGB).

(E) Distribution of the fraction of uSGBs in each sample by age category, body site, and lifestyle.

(F) Distribution of the fraction of uSGBs in each study.

图7. Quality of the Single-Sample Assembled Genomes against Multiple Alternative Genome Reconstruction Approaches

(A) Percentage identity between genomes from isolates (I) and genomes we reconstructed from metagenomes (M) for five Bifidobacterium species from the

FerrettiP_2018 dataset (Ferretti et al., 2018). We mark isolates and metagenomes coming from the same specimen (big filled circles) and coming from specimens

of the same mother-infant pair (small filled circles). In all cases, our automatic pipeline reconstructs genomes from metagenomes that are almost identical to the

genomes of the expected isolated strains.

(B) The strains of S. aureus and P. aeruginosa isolated from three patients are almost perfectly matching the genomes reconstructed from sputum metagenomes

sequenced at multiple time points. In the only case in which a S. aureus genome from a metagenome is not matching the strain isolated from a previous time point

in the same patient, we verified with MLST typing that a clinical event of strain-replacement from ST45 to ST273 occurred.

(C) In the dataset by Nielsen et al. (2014), we successfully recover at >99.5% identity the strain of a B. animalis subspecies lactis present in a commercial probiotic

product that was consumed by the enrolled subjects, even if the probiotic strain was at low relative abundance in the stool microbiome (<0.3% on average

[Nielsen et al., 2014]).

(D) Comparison of the 46 manually curated genomes (using anvi’o) with automatically assembled (using metaSPAdes) and binned (using MetaBAT2) genomes.

(E) Example comparison between the set of single-sample assembled genomes and co-assembled genomes for a time series (n = 5) of gut metagenomes from a

newborn. Several genomes reconstructed with the two approaches have the same phylogenetic placement, with single-sample assembly retrieving the same (or

a very closely related) genome at multiple time points, and both methods retrieving some unique genomes. This is an example of the comprehensive comparison

performed in the STAR Methods and reported in Table S2 and Figure S7B.

返回首页

参考文献

Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle

Pasolli et al