This study has been approved by Yale University Institutional Review Boards. A written informed consent has been obtained from each individual healthy donor, and the consent procedure has also been approved by Yale University Institutional Boards. Peripheral blood mononuclear cells were isolated from 5 individual healthy donors with Ficoll separation. CD4+ T cells were enriched by negative selection (Miltenyi Biotech) and CD4+CD25-CD45RO+CCR6+CD161+, CD4+CD25-CD45RO+CCR6-CD161- and CD4+CD25-CD45RO-CD62L+ naïve T cells were sorted by FACS Aria (BD Bioscience). For Th1 differentiation, naive T cells were activated with plate-bound anti-CD3 mAb (10µg/ml) and soluble anti-CD28 mAb (1.0 µg/ml) in serum-free media supplemented with recombinant IL-12 (5 ng/ml) and anti-IL-4 (10 µg/ml) for 7 days. Sorted CD4+ memory T cell subsets were activated with plate-bound anti-CD3 mAb and soluble anti-CD28 mAb, and supplemented with IL-23 (20 ng/ml) and IL-1β (10 ng/ml) for 7 days. Intracellular staining was performed by using the Cytofix/Cytoperm buffer set (BD Biosciences) according to the manufacturer’s instructions. Briefly, cells were incubated for 5 hours with PMA (50 ng/ml), ionomycin (500 ng/ml) and GolgiPlug (BD Biosciences, CA), then permeabilized with Cytofix/Cytoperm buffer and stained with FITC-conjugated anti-IFNγ (BD Biosciences) and APC-conjugated anti-IL-17 (eBioscience, CA).
Total RNA Isolation, Microarray, Definition of Immune-mediated Gene Loci, Enrichment and Cluster Analysis
Total RNA was isolated from naïve CD4+ T cells, Th1, Th17-enriched and Th17-negative memory CD4+ T cells using the RNeasy Mini Kit (Qiagen) with DNase treatment. Naïve and Th1 cells were from 4 individuals. Th17-enriched and Th17-negative memory CD4+ T cells were from 5 individuals, and 4 of them are the same individuals as the Naïve and Th1 group. High quality (RIN >8.5) RNA was hybridized to the Affymetrix GeneChipR Human Gene 1.0 ST expression arrays for gene expression profiling. RMA normalization 
of the intensity data was performed using the Partek Genomics Suite. All microarray data is MIAME compliant and the raw data has been deposited in GEO database. The GEO accession number is GSE 32901.
Immune-mediated disease loci were defined by the inclusion as a fine-mapping disease locus on the custom-designed Immunochip. Loci were included if genome-wide significant evidence for disease association was observed in at least one immune-mediated disease (Crohn1s disease, ulcerative colitis, celiac disease, psoriasis, type 1 diabetes mellitus, rheumatoid arthritis, multiple sclerosis, ankylosing spondylitis, systemic lupus erythematosus). All European ancestry 1000 Genomes SNPs within 0.1 cM of the peak association signals were included in the fine-mapping loci, and 1387 transcripts within the fine-mapping boundaries were included in the enrichment analyses. For each possible ordered pair of 2 cell types, (cell type 1, cell type 2), where cell type belongs to the set (naive, in vitro differentiated Th1, Th17-enriched and Th17-negative), we measure the degree of enrichment between a set of 1387 transcripts, chosen for to their proximity to immune-mediated disease associated loci, and the set of up-regulated transcripts in cell type 1 relative to cell type 2. Fisher’s exact test was used to estimate enrichment P-values using the following steps:
- Calculate the P-values for upregulation of cell type 1 relative to cell type 2 for 18,524 transcripts (that are common to both RNASeq and microarray).
- From this set of 18,524 transcripts, find a subset of transcripts, S, which are not related to autoimmune disease loci, with the property that the median gene expression of cell type 1 and cell type 2 for transcripts within this set, S, equals the median gene expression within the 1387 immune mediated disease transcripts. Next merge this set with the 1387 immune mediated disease transcripts to create a ‘background’ set of transcripts appropriate for the enrichment test.
- Construct a 2 x 2 table of counts which assigns each of the background transcripts to one of 4 cells depending on whether it is among the 5% most differentially expressed transcripts (according to P-value), and whether the transcript is within the 1387 autoimmune transcripts or not.
- Calculate the Fisher’s exact test P-value associated with the table constructed in (iii).
RNA libraries were prepared according to the manufacturer’s recommended protocol (Illumina, CA). Total RNA samples of Naïve, in vitro differentiatedTh1, Th17-enriched and Th17-negative CD4+ T cells from 4 individual healthy controls were transcribed to cDNA. cDNA samples were then sheared by nebulization (35 psi, 6 min). Duplexes were blunt ended (large Klenow fragment, T4 polynucleotide kinase and T4 polymerase) and a single 3′adenosine moiety was added using Klenow exo− and dATP. Illumina adapters, containing primer sites for flow cell surface annealing, amplification and sequencing, were ligated onto the repaired ends of the cDNA. Gel electrophoresis was used to select for DNA constructs 200–250 base pairs in size, which were subsequently amplified by 18 cycles of PCR with Phusion polymerase. These libraries were denatured with sodium hydroxide and diluted to 3.5 pM in hybridization buffer for loading onto a single lane of an Illumina GA flow cell. Cluster formation, primer hybridization and sequencing reactions were according to the manufacturer’s protocol. High throughput sequencing was performed using paired end, 75 base pair reads. Two flow lanes were used for each cDNA sample, yielding an average of 103.3 million reads per sample.
RNASeq Mapping, Estimates of Differential Expression and Isoform Abundance
Tophat v1.3.3 
was used to align the RNAseq reads to the hg19 genome. RNASeq gene expression was measured for each gene from version 59 of the Ensembl database by Mapped Fragments per Kilobase of Exon model per Million mapped reads (FPKM) calculated via Cufflinks v1.2.1. The Affymetrix annotation for the Human HuGene 1.0 array was downloaded and used to annotate the microarray probes. Subsequently, microarray intensity was calculated as the RMA-normalized log intensity for a single chosen probe in a given gene region. Microarray intensity and FPKM were matched on common gene name for 18,524 genes, and plotted against each other (Figure S1
We used the program Cuffdiff to test for differential transcript expression in each pair of cell lines. When samples were paired, it was necessary to analyze each sample individually in Cufflinks before combining the resulting P-values using the Fisher method to get a single P-value for differential expression for each gene. We made an exception to this rule for genes where the estimated log fold changes for the 4 samples did not all have the same sign. These genes were assigned a P-value of 1. P-values for differential expression from the microarray data were calculated via paired t-tests. The QQ plot () was created by plotting the observed ordered negative log10 P-values for the RNASeq (or microarray) data against the ordered negative log10 P-values that would be expected if no genes were differentially expressed.
Quantitative Methylation Studies
Quantitative DNA methylation analysis was performed at the Keck Core Facility at Yale University using the MassARRAY EpiTYPER system (Sequenom, CA) according to the manufacturer’s instructions. Genomic DNA samples were bisulfite treated to convert non-methylated cytosine into uracil using the EZ DNA methylation kit (Zymo Research, CA), followed by PCR amplification using T7-promotor tagged reverse primers. After shrimp alkaline phosphatase treatment, in vitro transcription was performed, and the generated transcript was subjected to an enzymatic base specific cleavage. The resulting fragments differ in size and mass depending on the sequence changes generated through bisulfite treatment. The fragment masses were determined by MALDI-TOF MS and the EpiTYPER software used to estimate percentages of methylation at CpG sites for each analyzed fragment.
3′ Rapid Amplification of cDNA Ends (3′ RACE)
3′ Race was performed with FirstChoice RLM-RACE Kit (Ambio by Applied Biosystem, CA) according to manufacturer’s protocol. 3′ Race outer and inner primers were designed from Exon 6 with Primer3 Plus, the first and nested PCR were done using an annealing temperature of 60°C. The highly expressed fragments after nested PCR were purified with QIAquick Gel Extraction Kit (Qiagen, CA), and then Sanger sequenced by using ABI 3730XL DNA Analyzer and sequence scanner v1.0 from ABI.