Summary Iranian Genome Data

Dear friends,

During the festive Norooz season, we here at the Iranian Genome Project have been celebrating by pouring over a deluge of data.  As our team continues to run its analyses, we wanted to make summary data available to the community.  We are currently working on putting the de-identified genetic data on a secure server, which will be available to scientists.

File for download: Iranian Genome File

This file contains the allele frequencies and counts at over 6 million different sites in the genome.  It can be used to explore your favorite area of the genome to see what we see in 77 Iranian individuals.



1:704367 . 2 154 C:1 T:0 C:154 T:0

The first column is the chromosome:position, the second column is the rsid if it is available (“.” if it is not), the third column is the number of alleles, the fourth column is the number of total chromosomes, the fifth column has the allele frequencies (C = 100% and T = 0% in this case), and the sixth column represents allele counts.

We look forward to sharing more as the project moves forward!

The Iranian Genome Team

The Importance of the Iranian Genome Project

Genetics plays an important role in understanding what determines health and disease.  Major projects in population genetics have studied European, African, and Asian populations.  We are the first study that will catalogue a large number of genomes from a Middle Eastern population.  The Iranian Genome Project not only has scientific value, but is also necessary so that Iranians can participate in the development of genomic personalized medicine.

The human genome encodes the “language of life” — it holds the recipe for producing each unique individual.  From the enzymes that metabolize drugs to the factors that help blood clot after a cut — all of these processes depend on an individual’s genes.  Variation in these genes, as well as environmental factors, determine a person’s “phenotype”, the observable or measurable traits such as hair color or required effective drug dose.

There is usually a balance between our genes and our environment, though for some characteristics, the balance is skewed in one direction or another.  For example, let us say that everyone in your family has high cholesterol.  You know high cholesterol runs in your family.  Because of this, you are very diligent about what you eat, exercise every day, and live a generally healthy lifestyle.  By mitigating your environmental risk, you may be able to avoid having high cholesterol (or you might still get it anyway, but let’s be optimistic).  This is why understanding genetic predispositions are important — there are cases where your fate is not set in stone, and you can do something about it.  However, there are other cases where you can’t change what your genes dictate — you will have the phenotype, but knowing your genetic predisposition still empowers you.  For instance, if you are genetically predisposed to be sensitive to the blood thinner, warfarin, then getting what a doctor considers a “normal dose” of warfarin might cause you to have abnormal bleeding.  There’s nothing you can do to change your sensitivity to warfarin, but if your physician knows that your genes indicate warfarin sensitivity, he or she can make sure you receive a lower, safer dose.

The above scenarios, of course, might lead you to wonder: how will I know what’s in my genes?  How do scientists and doctors figure out what genes cause what disease/phenotype?  And why does the Iranian Genome project matter?  I hope to answer these questions and any others that arise through the course of this project via this blog.

How will I know what’s in my genes?

Right now, genetic testing is not regularly done by physicians unless there is a suspicion of a genetic disease.  However, direct-to-consumer genetic testing exists (there are several companies, and we do not provide any endorsements), and you can get yourself genotyped for “fun”.  I say “fun” because the clinical utility of this is still under investigation, and any one who decides to be genotyped should only proceed after getting a full understanding of what genotyping involves and what the results mean.  Genetics is currently a work in progress, thus, individuals who do undergo genotyping should not become unnecessarily alarmed over a genotype for which the disease risk is not well characterized.  Geneticists imagine a future (hopefully soon) where we will be able to better interpret an individual’s genetics, and everyone will have copies of their full genome for clinical analysis.  That way, any time you go to the doctor, the doctor can know what diseases to check for (for instance, high cholesterol), and which drug is most effective with your personal genetics.

How do scientists and doctors figure out what genes cause what disease/phenotype?  Currently, one popular way is using Genome-Wide Association Studies (GWAS).  In a GWAS, the scientist finds individuals who do and don’t have a certain phenotype, while trying to control for all other variables.  For instance, in a GWAS of Crohn’s disease, scientists find many individuals with Crohn’s disease (hundreds to thousands) and many people without (hundreds to thousands).  Then, each group is genotyped.

However, because it is expensive to sequence the entire genome of each group, the scientists choose to just genotype the places in the genome with the most known variation.  These areas are called single nucleotide polymorphisms (SNPs).  Let’s look at a representation of the genetic code and a “SNP”.  The genetic code is made up of four letters: A, T, C, G — these represent the four nucleotide bases of the DNA that can be “read” by a cell’s internal machinery and translated into the body’s proteins.  A SNP is a single position in the code where there is variation between individuals.  In humans, more than 99% of the genome is exactly the same.  It’s the small percentage of variation between individuals which determines health and disease risks.

For instance:



The SNP in this instance would be at second to last position, where between individuals, there is variation — the genotype can have either a “C” or a “T” allele.  In a GWAS, scientists look to see if there are some diseases that occur more when an individual has one genotype over another.  But, as mentioned before, since most of the genome is similar, scientists don’t look at every single genotype.  Instead, they just look at the positions where variation is known to occur.

How do scientists know where the variation occurs?  Population studies, such as the HapMap project have looked at the genomes of individuals from European, African, and Asian descent in order to see where in the genome variation occurs.  There are catalogues of where the variation occurs in these populations, and there are tools for “reading” the genetic code just at those positions.  So, scientists only genotype and study those positions when doing GWAS.  However, each population has its own signature of variation (that is, positions that are SNPs for European descent individuals may not be SNPs in Africans and vice versa).

Why does the Iranian Genome project matter?

The Hapmap ( project has looked at European, African, and Asian populations. Because no major study has looked at Middle Eastern genomes, our knowledge of what SNPs are in the Middle Eastern population is sorely lacking.  Because of this, GWAS are not easily or typically done for Middle Eastern populations.  In fact, 96% of GWAS thus far are in European descent individuals.  Research has shown that GWAS results in one population cannot be applied to another.  So, if a SNP is responsible for drug sensitivity in European descent individuals, it does not mean that the same SNP causes drug sensitivity in Iranians.  Thus, without an understanding of Iranian genomes, scientists and doctors cannot determine how genes in Iranians affect health and disease.  We want to help change that.  In the future, genetics will likely be used to determine medical care — it may be able to assess your risk of a heart attack or figure out which high blood pressure drug works best.  We want to make sure that information on the Iranian population’s genetics exists so that doctors will be able to give Iranian patients the personalized care they need.

Roxana Daneshjou,

Welcome to the Website of the Iranian Genome Project at Stanford

We will be updating information on the Iranian Genome Project through this website.  This project has been funded generously by the PARSA Foundation.  The following announcement has been reprinted with permission from the PARSA Foundation.

PARSA CF Awards $250,000 to the Iranian Genome Project at Stanford University

May 12, 2011

As a part of its Mehrgan 2010 Grant Cycle, PARSA Community Foundation announces a $250,000 grant to the Department of Bioengineering at Stanford University for the “Iranian Genome Project,” a one-of-a-kind initiative for better understanding of the genes of the Iranian population.

Researchers at various universities and companies have been working on genetics and genomics. However, the genetics of the Iranian population has never been studied before.  There is no base data on which to evaluate the risk factors of new drugs, as they are being developed, for people of  this ethnic background. This population-specific information is crucial to understanding the health benefits and risk factors of certain drugs. It also helps researchers explain history of a certain population in more depth.

Although all humans are 99.9% identical, the small differences are very important to uncover.  While most genetic research has been conducted on people of European descent, this project promises to shine a bright light on the Iranian people, and makes it easier for scientists interested in personalized medicine to focus on genetics relevant to Iranians. After the project is completed, the resulting data will be made available in the form of scientific publications as well as public databases which will give pharmaceutical companies and credentialed researchers the ability to do follow-up scientific investigations and develop drugs that are compatible to this population.

PARSA CF is delighted to provide private and philanthropic support for such a bold and pioneering work. First in the world to do so, the Stanford research team is in the process of collecting DNA samples from more than 50 Iranians, representing various ethnicities such as Armanians, Kurds, and Turks, and has extensive plans to analyze the data.

The DNA samples are being obtained from individuals from a diverse set of geographic and ethnic backgrounds, to build a foundation that represents the diversity of the Iranian human genetics. The team is currently working with experts on Iranian culture and identifying the different groups from Iran that should be included in the sampling.  They are also working on the consent documents for participants.  An important goal is to ensure the privacy and security of the data, and to minimize the possibility of this genetic information being misused.   With all consents in place, the team will collect spit samples, isolate the DNA from them, and use state-of-the-art “next generation” sequencing machines to determine the sequence of the whole DNA. These will then be analyzed by computer to tally the location and frequency of genetic differences, and to analyze the relationships between different ethnic groups.

The project will use the latest advancements in genome sequencing through collaboration with other institutions, providing access to devices that can sequence entire genomes and provide data on the entire 6.5 billion blocks of DNA. The genome sequencing research will result in very large datasets, which will be analyzed by a team with expertise in handling such large amounts of data.

The grant makes it possible for the research team to explore the “family tree” of Iranians and understand how they are related.  They will also get a first look at the genetics of the Iranian people, as a starting point for potential health interventions that can use genetics to assess the risk for various diseases and for predicting likely drug responses. This project will provide important baseline data about both the similarities and differences of Iranian genomes and the current human genome, which was predominantly determined based on people of European descent.

PARSA CF is supporting this project to create not only a basis for the anthropological study of the Iranian population, but also the scientific foundation for the health and well-being of Persian children and generations to come. While it’s getting cheaper to do so, genome sequencing is still expensive and the cost prohibits private institutions to engage in such ambitious research. At the same time government funding typically does not apply to such ethnic projects. PARSA CF is in a unique position to provide this opportunity for its donors to participate in such strategic and fundamental initiative.

“It is an honor to receive this funding, to allow this exciting collaboration between Stanford, Harvard and Illumina, Inc., and to explore the genetic roots of the Persian people.   We look forward to understanding the “family tree” of Iran, and to use this as a baseline for understanding the health and population history of Iran” said Russ Altman, Chairman, Department of Bioengineering, Stanford University.