Bin-assembled Escherichia coli genomes from a study in Punjab, Pakistan

dc.contributor.affiliationUniversity of Helsinki - Mäklin, Tommi
dc.contributor.affiliationHelsinki University Hospital - Khawaja, Tamim
dc.contributor.authorMäklin, Tommi
dc.contributor.authorKhawaja, Tamim
dc.date.accessioned2025-03-24T15:17:19Z
dc.date.issued2024-06-25
dc.date.issued2024-06-25
dc.descriptionBin-assembled Escherichia coli genomes from Punjab, Pakistan These assemblies are a part of a cross-sectional study conducted in Punjab, Pakistan aimed at investigating E. coli colonisation diversity in healthy carriage with the use of CLED enrichment plates. About Version history v0.1.1 (current version) Added reference to the study. v0.1.0 Added brief description with a few missing parts. Distribution If you use these assemblies in your study please cite the source as appropriate. These assemblies are made available under a CC-BY 4.0 license. Citation Khawaja, T., Mäklin, T., Kallonen, T. et al. Deep sequencing of Escherichia coli exposes colonisation diversity and impact of antibiotics in Punjab, Pakistan. Nature Communications 15, 5196 (2024). https://doi.org/10.1038/s41467-024-49591-5 Methods briefly Species identification Sequencing data from the ENA project PRJEB36642 was error-corrected with fastp and pseudoaligned with Themisto against a species-level index (available from https://doi.org/10.5281/zenodo.6656881). Reads were assigned to species using the mSWEEP/mGEMS pipeline as described in https://www.nature.com/articles/s41467-022-35178-5. Lineage identification Read from the species-level bins were again pseudoaligned with Themisto against an E. coli index (will be made available in a later version). Lineage-level assignment was performed using mSWEEP and mGEMS at the level of PopPUNK sequence clusters. The created bins were screened with demix_check and bins that received a score of 1 or 2 were kept. Data in the kept bins were assembled with shovill and the bin-assembled genomes (BAGs) were quality controlled with checkm for >= 90% completeness and <= 10% contamination. Finally, BAGs shorter than 4 Mb or longer than 6 Mb were removed. Contact Tommi Mäklin <tommi'at'maklin.fi>.
dc.identifierhttps://doi.org/10.5281/zenodo.12166291
dc.identifier.urihttps://hydatakatalogi-test-24.it.helsinki.fi/handle/123456789/10080
dc.rightsOpen
dc.rights.licensecc-by-4.0
dc.subjectgenome informatics
dc.subjectescherichia coli
dc.subjectantimicrobial resistance
dc.subjectmetagenomics
dc.titleBin-assembled Escherichia coli genomes from a study in Punjab, Pakistan
dc.typedataset
dc.typedataset