Dive into genomics & data science: Analyze real soil microbial data from diverse environments.
Uncover secrets of the soil: Learn how microbes impact our health & environment.
Gain cutting-edge skills: Master cloud computing, data analysis, and more.
Join a supportive community: Network with faculty & peers from across the country.
One critical aspect of an undergraduate STEM education is hands-on research. Undergraduate research experiences enhance what students learn in the classroom as well as increase a student’s interest in pursuing STEM careers ^1. It can also lead to improved scientific reasoning and increased academic performance overall ^2. However, many students at underresourced institutions like community colleges, Historically Black Colleges and Universities (HBCUs), tribal colleges and universities, and Hispanic-serving institutions have limited access to research opportunities compared to their cohorts at larger four-year colleges and R1 institutions. These students are also more likely to belong to groups that are already under-represented in STEM disciplines, particularly genomics and data science ^3 ^4.
The BioDIGS (BioDiversity and Informatics for Genomics Scholars) Project aims to be at the intersection of genomics, data science, cloud computing, and education.
Genomics broadly refers to the study of genomes, which are an organism’s complete set of DNA. This includes both genes and non-coding regions of DNA. Traditional genomics involves sequencing and analyzing the genome of individual species.
Metagenomics expands genomics to look at the collective genomes of entire communities of organisms in an environmental sample, like soil. It allows researchers to study not just the genes of culturable or isolated organisms, but the entirety of genetic material present in a given environment. By using genomic techniques to survey the soil microbes, we can identify everything in the soil, including microbes that no one has identified before.
We are doing both traditional genomics and metagenomics as part of BioDIGS.
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. It includes collecting, cleaning, and combining data from multiple databases, exploring data and developing statistical and machine learning models to identify patterns in complex datasets, and creating tools to efficiently store, process, and access large amounts of data.
Cloud computing just means using the internet to get access to powerful computer resources like storage, servers, databases, networking tools, and specialized software programs. Instead of having to buy and maintain their own powerful computers, storage servers, and other systems, users can pay to use them through an internet connection as needed. Users only pay for what they need, when they actually use it, and professionals update and maintain the systems in large data centers. It is a particularly useful tool for researchers and students at smaller institutions with limited computational services, especially when working with complex databases.
It can be challenging to include undergraduates in human genomic and health research, especially in a classroom context. Both human genetic data and human health data are protected data, which limits the sort of information students can access without undergoing specialized ethics training. However, the same sorts of data cleaning and analysis methods used for human genomic data are also used for microbial genomic data, which does not have the same sort of legal protections as human genetic data. This makes it ideal for training undergraduate students at the beginning of their careers and can be used to prepare students for future research in human genomics and health ^5. Additionally, the microbes in the soil can have big impacts on our health ^6.
Human activities that change the landscape can also change what sorts of inorganic and abiotic compounds we find in the soil, particularly increasing the amount of heavy metals ^7. When cars drive on roads, compounds from the exhaust, oil, and other fluids might settle onto the roads and be washed into the soil. When we put salt on roads, parking lots, and sidewalks, the salts themselves will eventually be washed away and enter the ecosystem through both water and soil. Chemicals from factories and other businesses also leech into our environment. Previous research has demonstrated that in areas with more human activity, like cities, soils include greater concentrations of heavy metals than found in rural areas with limited human populations ^8 ^9. Increased heavy metal concentrations also disproportionately affect lower-income and predominantly minority areas ^10.
Research suggests that increased heavy metal concentration in soils has major impacts on the soil microbial community. In particular, increased heavy metal concentration is associated with an increase in soil bacteria that have antibiotic resistance markers ^11 ^12 ^13.
1: Russell et al. 2007: https://doi.org/10.1126/science.1140384
2: Buffalari et al. 2020: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8040836/
3: Canner et al. 2017: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5398168/
4: GDSCN 2022: https://doi.org/10.1101/gr.276496.121
5: Jurkowski et al. 2007: https://doi.org/10.1187/cbe.07-09-0075
6: Brevik and Burgess 2004: https://www.nature.com/scitable/knowledge/library/the-influence-of-soils-on-human-health-127878980/
7: Yan et al. 2020: https://doi.org/10.1016/j.scitotenv.2019.136116
8: Khan et al. 2023: https://pubmed.ncbi.nlm.nih.gov/36907936/
9: Wang et al. 2022: https://pubmed.ncbi.nlm.nih.gov/35240153/
10: Jones et al. 2022: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8834334/
11: Gorovtsov et al. 2018: https://doi.org/10.1007/s11356-018-1465-9
12: Nguyen et al. 2019: https://doi.org/10.1007/s11783-019-1129-0
13: Sun et al. 2021: https://doi.org/10.1016/j.jenvman.2021.113754
BioDIGS is run by the Genomic Data Science Community Network (GDSCN). The GDSCN is a consortium of educators who are dedicated to expanding genomic data science and bioinformatics education to students at typically underresourced institutions including community colleges, HBCUs, tribal colleges and universities, and Hispanic-serving institutions. They are supported by Johns Hopkins University, the Fred Hutchinson Cancer Center, and the National Human Genome Research Institute. You can learn more about the GDSCN here.
Soil sampling for BioDIGS was done by both faculty and student volunteers from schools that aren’t traditional R1 research institutions. Information about the schools and programs involved are on the map near each sampling area.
Many of the faculty are also members of the GDSCN.
Funding for this project has been provided by the National Human Genome Research Institute (Contract # 75N92022P00232 awarded to Johns Hopkins University), as well as by donations from PacBio and CosmosID.
Advances in Genome Biology and Technology provided funding support for several team members to attend AGBT 2024.
Computational support has been provided by NHGRI’s AnVIL cloud computing platform and Galaxy.
BioDIGSData
R Package We’ve created a data package to help you easily bring BioDIGS soil data and metadata into R! Learn more about the package or leave us a feature request on GitHub: https://github.com/fhdsl/BioDIGSData.
Install the package by running the following in R. You might need to install the devtools
package.
devtools::install_github("fhdsl/BioDIGSData")
Bring in the data using predefined functions. For example:
# Load soil data
my_data <- BioDIGSData::BioDIGS_soil_data()
The following are materials designed to help you dive into BioDIGS. This could be for your own learning, or for you to implement in your classrooms!
Explore soil data and metadata from the BioDIGS Project! The activities in this guide are written for undergraduate students and beginning graduate students. Some sections require basic understanding of the R programming language, which is indicated at the beginning of each chapter.
Activities leverage NHGRI’s AnVIL cloud computing platform. AnVIL is the preferred computing platform for the GDSCN. However, all of the activities can be done using personal installations of R or using the online Galaxy portal.
If you don’t find the answer to your question here, please contact Natalie Kucher (nkucher3 at jhu.edu) and/or Ava Hoffman (ahoffma2 at fredhutch.org)!
Land that you might want to sample will generally be one of three types: public land (like parks or playing fields), university and college grounds, or private land.
Tip 1: Be specific about what sites you want to sample (they often will give a permit for just those sites).
Tip 2: Be prepared to be passed to several people, as the request is unusual and they don’t always immediately know who the proper staff member to deal with the request is.
Tip 3: If they seem hesitant, assure them that you’ll be taking very little soil and the grass cover will not be visually disturbed, since the samples will be taken from 5 inches below the ground.
For college and university grounds, you should contact the Facilities department. We have found email works best. You can follow the same template email and approach used for public land.
For private land, you’ll need to approach the owner. You may have to approach the owner in person or perhaps by phone. You can follow the same script as in the template email.
Please add a copy of the email or written permission in this folder. If permission was given verbally, please add to this sheet with the details of who you spoke with and when. Please reach out to us if you need access to these files.
We have prepared a protocol to make this easy! Check out the written protocol here. We have also created YouTube videos for you to reference.
It might be a good idea to distribute a flyer. Here’s an example used by one of our faculty.
You should be able to store the soil samples at room temperature. The DNA/RNA Shield chemicals should keep the microbial DNA from degrading before the samples get analyzed in the lab. However, it doesn’t hurt to keep them in a refrigerator! (Don’t freeze them, but a standard refrigerator or a lab refrigerator at 4 C is fine.)
Try to avoid leaving the samples in hot places or in the sunlight. The DNA/RNA Shield can only do so much to protect the DNA, but the heat can still cause degradation.
The soil can be dried at room temperature. Simply spread it out on the aluminum baking sheet out of direct sunlight. There’s no need to bake the soil or use an oven to speed the process up. The goal is to prevent/slow any excess mold or microbial growth caused by excess moisture.
Nope! All shipping costs will be paid for by BioDIGS. When you’re ready to send your samples back, contact Natalie Kucher (nkucher3 at jhu.edu) and she will send you a shipping label.
We expect the process to take several weeks, depending on when we receive the samples and how many we need to process at once.
Sure! While we are happy to handle this, protocols can be found here.