DNA Background Image
a GDSCN Project

Empowering underrepresented students in STEM through hands-on research

Dive into genomics & data science: Analyze real soil microbial data from diverse environments.

Uncover secrets of the soil: Learn how microbes impact our health & environment.

Gain cutting-edge skills: Master cloud computing, data analysis, and more.

Join a supportive community: Network with faculty & peers from across the country.

Get Involved


About the BioDIGS Project

One critical aspect of an undergraduate STEM education is hands-on research. Undergraduate research experiences enhance what students learn in the classroom as well as increase a student's interest in pursuing STEM careers 1. It can also lead to improved scientific reasoning and increased academic performance overall 2. However, many students at underresourced institutions like community colleges, Historically Black Colleges and Universities (HBCUs), tribal colleges and universities, and Hispanic-serving institutions have limited access to research opportunities compared to their cohorts at larger four-year colleges and R1 institutions. These students are also more likely to belong to groups that are already under-represented in STEM disciplines, particularly genomics and data science 3 4.

The BioDIGS (BioDiversity and Informatics for Genomics Scholars) Project aims to be at the intersection of genomics, data science, cloud computing, and education.

What is genomics? 🧬

Genomics broadly refers to the study of genomes, which are an organism's complete set of DNA. This includes both genes and non-coding regions of DNA. Traditional genomics involves sequencing and analyzing the genome of individual species.

Metagenomics expands genomics to look at the collective genomes of entire communities of organisms in an environmental sample, like soil. It allows researchers to study not just the genes of culturable or isolated organisms, but the entirety of genetic material present in a given environment. By using genomic techniques to survey the soil microbes, we can identify everything in the soil, including microbes that no one has identified before.

We are doing both traditional genomics and metagenomics as part of BioDIGS.

What is data science? 📈

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. It includes collecting, cleaning, and combining data from multiple databases, exploring data and developing statistical and machine learning models to identify patterns in complex datasets, and creating tools to efficiently store, process, and access large amounts of data.

What is cloud computing? ☁️

Cloud computing just means using the internet to get access to powerful computer resources like storage, servers, databases, networking tools, and specialized software programs. Instead of having to buy and maintain their own powerful computers, storage servers, and other systems, users can pay to use them through an internet connection as needed. Users only pay for what they need, when they actually use it, and professionals update and maintain the systems in large data centers. It is a particularly useful tool for researchers and students at smaller institutions with limited computational services, especially when working with complex databases.

Why soil microbes? 🦠

It can be challenging to include undergraduates in human genomic and health research, especially in a classroom context. Both human genetic data and human health data are protected data, which limits the sort of information students can access without undergoing specialized ethics training. However, the same sorts of data cleaning and analysis methods used for human genomic data are also used for microbial genomic data, which does not have the same sort of legal protections as human genetic data. This makes it ideal for training undergraduate students at the beginning of their careers and can be used to prepare students for future research in human genomics and health 5. Additionally, the microbes in the soil can have big impacts on our health 6.

Why are heavy metals important for human health? 🩺

Human activities that change the landscape can also change what sorts of inorganic and abiotic compounds we find in the soil, particularly increasing the amount of heavy metals 7. When cars drive on roads, compounds from the exhaust, oil, and other fluids might settle onto the roads and be washed into the soil. When we put salt on roads, parking lots, and sidewalks, the salts themselves will eventually be washed away and enter the ecosystem through both water and soil. Chemicals from factories and other businesses also leech into our environment. Previous research has demonstrated that in areas with more human activity, like cities, soils include greater concentrations of heavy metals than found in rural areas with limited human populations 8 9. Increased heavy metal concentrations also disproportionately affect lower-income and predominantly minority areas 10.

Research suggests that increased heavy metal concentration in soils has major impacts on the soil microbial community. In particular, increased heavy metal concentration is associated with an increase in soil bacteria that have antibiotic resistance markers 11 12 13.


1: Russell et al. 2007: https://doi.org/10.1126/science.1140384

2: Buffalari et al. 2020: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8040836/

3: Canner et al. 2017: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5398168/

4: GDSCN 2022: https://doi.org/10.1101/gr.276496.121

5: Jurkowski et al. 2007: https://doi.org/10.1187/cbe.07-09-0075

6: Brevik and Burgess 2004: https://www.nature.com/scitable/knowledge/library/the-influence-of-soils-on-human-health-127878980/

7: Yan et al. 2020: https://doi.org/10.1016/j.scitotenv.2019.136116

8: Khan et al. 2023: https://pubmed.ncbi.nlm.nih.gov/36907936/

9: Wang et al. 2022: https://pubmed.ncbi.nlm.nih.gov/35240153/

10: Jones et al. 2022: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8834334/

11: Gorovtsov et al. 2018: https://doi.org/10.1007/s11356-018-1465-9

12: Nguyen et al. 2019: https://doi.org/10.1007/s11783-019-1129-0

13: Sun et al. 2021: https://doi.org/10.1016/j.jenvman.2021.113754

The Research Team

BioDIGS is run by the Genomic Data Science Community Network (GDSCN). The GDSCN is a consortium of educators who are dedicated to expanding genomic data science and bioinformatics education to students at typically underresourced institutions including community colleges, HBCUs, tribal colleges and universities, and Hispanic-serving institutions. They are supported by Johns Hopkins University, the Fred Hutchinson Cancer Center, and the National Human Genome Research Institute. You can learn more about the GDSCN here.

Who did the soil sampling?

Soil sampling for BioDIGS was done by both faculty and student volunteers from schools that aren't traditional R1 research institutions. Information about the schools and programs involved are on the map near each sampling area.

Many of the faculty are also members of the GDSCN.

  • Annandale, VA: Northern Virginia Community College
  • Atlanta, GA: Spelman College
  • Baltimore, MD: College of Southern Maryland, Notre Dame College of Maryland, Towson University
  • Bismark, ND: United Tribes Technical College
  • El Paso, TX: El Paso Community College, The University of Texas at El Paso
  • Fresno, CA: Clovis Community College
  • Greensboro, NC: North Carolina A&T State University
  • Harrisonburg, VA: James Madison University
  • Honolulu, Hawai'i: University of Hawai'i at Mānoa
  • Las Cruces, NM: Doña Ana Community College
  • Montgomery County, MD: Montgomery College, Towson University
  • Nashville, TN: Meharry Medical College
  • New York, NY: Guttman Community College CUNY
  • Petersburg, VA: Virginia State University
  • Seattle, WA: North Seattle College, Pierce College
  • Tsaile, AZ: Diné College
  • you?


Funding for this project has been provided by the National Human Genome Research Institute (Contract # 75N92022P00232 awarded to Johns Hopkins University), as well as by donations from PacBio and CosmosID.

Advances in Genome Biology and Technology provided funding support for several team members to attend AGBT 2024.

Analytical and Computational Support

Computational support has been provided by NHGRI's AnVIL cloud computing platform and Galaxy.

Download data

Download data

Note: Arsenic (As_EPA3051) is not detectable below 3.0 mg/kg. Cadmium (Cd_EPA3051) is not detectable below 0.2 mg/kg.

Download data

BioDIGSData R Package

We've created a data package to help you easily bring BioDIGS soil data and metadata into R! Learn more about the package or leave us a feature request on GitHub: https://github.com/fhdsl/BioDIGSData.

Using the Package

Install the package by running the following in R. You might need to install the devtools package.


Bring in the data using predefined functions. For example:

# Load soil data
my_data <- BioDIGSData::BioDIGS_soil_data()



The following are materials designed to help you dive into BioDIGS. This could be for your own learning, or for you to implement in your classrooms!

BioDIGS In the Classroom

Explore soil data and metadata from the BioDIGS Project! The activities in this guide are written for undergraduate students and beginning graduate students. Some sections require basic understanding of the R programming language, which is indicated at the beginning of each chapter.

Activities leverage NHGRI's AnVIL cloud computing platform. AnVIL is the preferred computing platform for the GDSCN. However, all of the activities can be done using personal installations of R or using the online Galaxy portal.



If you don't find the answer to your question here, please contact Natalie Kucher (nkucher3 at jhu.edu) and/or Ava Hoffman (ahoffma2 at fredhutch.org)!

What should I look for in a sampling site?

  • It needs to be big enough! We recommend a space 10 yards square (you can measure 10 yards by taking 10 big steps).
  • Make sure the area is at least 3 feet away from a road to avoid vehicle contamination of the soil.
  • If possible, try to pick a site that hasn't obviously been disturbed or changed recently, as the microbial community might not have reached a stable equilibrium yet. We recommend waiting at least 1 year after construction or other major activities.
  • Try to pick sites that you think would be interesting! For the pilot sampling, we focused on managed vs unmanaged sites and tried to sample a variety of landscapes. Managed sites are generally those that have been used as lawns, playing fields, or gardens; unmanaged sites include places like meadows and forests.
  • You should pick 5 sites, with 2 replicates at each, for a total of ten samples.

How do I get permission to sample at a location?

Land that you might want to sample will generally be one of three types: public land (like parks or playing fields), university and college grounds, or private land.

  • For public land, the best approach is to email the appropriate contact person for permission. Public lands are generally managed by county, town, or city Park Services. Contact information can usually be found on the specific park websites. A template email is provided here!

Tip 1: Be specific about what sites you want to sample (they often will give a permit for just those sites).

Tip 2: Be prepared to be passed to several people, as the request is unusual and they don't always immediately know who the proper staff member to deal with the request is.

Tip 3: If they seem hesitant, assure them that you'll be taking very little soil and the grass cover will not be visually disturbed, since the samples will be taken from 5 inches below the ground.

  • For college and university grounds, you should contact the Facilities department. We have found email works best. You can follow the same template email and approach used for public land.

  • For private land, you'll need to approach the owner. You may have to approach the owner in person or perhaps by phone. You can follow the same script as in the template email.

Please add a copy of the email or written permission in this folder. If permission was given verbally, please add to this sheet with the details of who you spoke with and when. Please reach out to us if you need access to these files.

How do I collect samples?

We have prepared a protocol to make this easy! Check out the written protocol here. We have also created YouTube videos for you to reference.

What information do I need to collect from students before we go into the field?

  • Make sure you have all participating students sign a liability waiver. Your school may have a specific one they want you to use; check with administration or student services.
  • If you are taking pictures for the GDSCN archives, we ask that you also have students sign a photo release form giving us permission to use their image on the website or in future presentations.
  • A standard liability waiver and photo release is provided here!

How do I store the samples being sent for DNA analysis?

You should be able to store the soil samples at room temperature. The DNA/RNA Shield chemicals should keep the microbial DNA from degrading before the samples get analyzed in the lab. However, it doesn't hurt to keep them in a refrigerator! (Don't freeze them, but a standard refrigerator or a lab refrigerator at 4 C is fine.)

Try to avoid leaving the samples in hot places or in the sunlight. The DNA/RNA Shield can only do so much to protect the DNA, but the heat can still cause degradation.

How do I dry the samples being sent for elemental analysis?

The soil can be dried at room temperature. Simply spread it out on the aluminum baking sheet out of direct sunlight. There's no need to bake the soil or use an oven to speed the process up. The goal is to prevent/slow any excess mold or microbial growth caused by excess moisture.

Do I have to pay for shipping the samples back to you?

Nope! All shipping costs will be paid for by BioDIGS. When you're ready to send your samples back, contact Natalie Kucher (nkucher3 at jhu.edu) and she will send you a shipping label.

How long will it take for my samples to be analyzed?

We expect the process to take several weeks, depending on when we receive the samples and how many we need to process at once.

The Genomic Data Science Community Network (GDSCN) is supported by an NHGRI contract to Johns Hopkins University.