Posted by Gregory J Block MSc PhD on Aug 31, 2014
SUMMARY/SPECIFIC AIMS (not changed from the original application):
Facioscapulohumeral muscular dystrophy (FSHD) is caused by incomplete repression of the D4Z4 macrosatellite repeat array on the disease-permissive chromosome 4q that results in aberrant expression of DUX4, the candidate FSHD gene imbedded within the D4Z4 repeat. We hypothesized that ongoing sense and antisense transcription at the D4Z4 region is involved in DUX4 epigenetic silencing mediated by the D4z4-derived small RNAs and that disruption of this silencing may be linked either to deficiencies in small RNA production due to shortened D4Z4 arrays (FSHD1) or to deficiencies in chromatin compaction (FSHD2). Indeed, we have recently shown that Argonaute2 (AGO2) and Dicer1, key compounds of the smal RNA processing machinery, are necessary to maintain DUX4 epigenetic silencing in non-pathogenic condition and that augmenting levels of the D4Z4 small RNAs may induce heterochromatin assembly at D4Z4 and result in the silencing of DUX4 expression in FSHD muscle (described in the progress report on Dr. Jong-Won Lim project). To identify the D4Z4-derived small RNAs that may play a role in DUx4 silencing we proposed to perform deep high-throughput sequencing of small RNAs isolated from FSHD and control muscle in iPS/EB cells. From this small RNA sequencing project, we also expected to identify differentially expressed non-D4Z4 small RNAs that may be involved in FSHD pathogenesis.
STUDIES AND RESULTS:
In spite of the technical difficulties at the Genomics Core (several rounds of troubleshooting of failed small RNA library preparations and multiple efforts to increase sequencing depth resulting in additional sequence runs), most of the work proposed on this project has been completed: (1) we have prepared and validated small RNA libraries from myoblasts and myotubes from three control and three FSHD individuals, as well as from iPS and EBs from two controls and three FSHD patients, (2) we have performed deep high-throughput sequencing of the small RNA libraries using Illumina HiSeq 2000 technology and barcoding for multiplexing of pooled libraries (with 4-6 samples per lane) that allowed to obtain 20-40 millions of sequence reads per sample. Sequence reads were clustered, de-convoluted, trimmed from 3-prime adapters and assessed for quality control and size distribution; finally, the reads were aligned and annotated against the genome; (3) to identify D4Z4-derived small RNAs, the small RNA sequence reads were aligned to Lambda 42 sequence (AF117653) and then filtered to remove all the sequences that aligned somewhere else in the genome; in addition, (4) the small RNA reads from other genomic loci were compared between the groups (MB vs MT and control vs FSHD) to identify small RNAs specific for muscle differentiation and FSHD condition, respectively.
Re-sizing of the small RNA libraries from control and FSHD myoblasts and myotubes.
As we previously reported, our initial sequence analysis of the small RNA libraries that we prepared from control and FSHD myoblasts/myotubes (MB/MT) did not identify any D4Z4-derived small RNAs (Progress report #1). The genome-wide alignment of our RNA sequence reads revealed that more than 90% of the reads corresponded to snoRNAs/rRNAs/tRNAs ranging from ~50 to 80nt in size, whereas most of the small RNAs of interest including miRNAs were ~20-30nt long. Therefore, if we removed larger transcripts and re-sequenced the existing small RNA libraries we could increase the sequencing depth (by ~ 10 fold) necessary to detect D4Z4 small RNAs.
Another consideration we had was that since most of the D4Z4 small RNAs were identified in the chromatin fraction (previously reported analysis of the publically available small RNA datasets in Ameyar-Zazoua, et al., 2012 and Benhamed et al., 2012) and our small RNA libraries were prepared from the total RNA isolation from the whole cell lysate, some of the chromatin-associated D4Z4 small RNAs might not be represented in our libraries. In the end, we have decided to go ahead with the deeper sequencing of our existing small RNA libraries and depending on the sequencing results, consider isolating and sequencing small RNAs associated with the chromatin fraction in the future.
In collaboration with our Genomics Core, we have successfully re-sized our existing small RNA libraries from the FSHD and control muscle cells by pooling the original libraries and capturing 130-160 bp cDNA fragments corresponding to 10-40 nt long small RNAs. Sequencing (50 cycle single read) was done in 2 lanes with expected ~20-25 mln sequence reads per sample. The initial analysis of the output sequences (after de-multiplexing and removal of 3-prime adaptor sequences) confirmed that the size distribution of the sequence reads ranged from 10 to 40 nt this time with most of the reads corresponding to the two major peaks of ~15 and 35nt long small RNAs (Fig. 1A). Row counts were normalized by library sequencing depth. Genome-wide alignment and annotation of different types of small RNAs present in the small RNA libraries showed that miRNAs comprise 30-40% of the total small RNA reads in the resized libraries, which is 10 times higher compared to the original 3-4% (Fig. 1B) with the major peaks of miRNAs at 20-25nt, rRNAs/tRNAs at 30-35nt, and unknown ('other' or 'protein coding') small RNAs contributing to both 25 and 35nt peaks (Fig. 1C).
Identification of the endogenous D4Z4 small RNAs in control and FSHD muscle cells.
The small RNA sequence reads from the resized small RNA libraries were aligned to Lambda 42 sequence (AF117653) to identify D4Z4-derived small RNAs. The D4Z4 matched sequence reads (allowing up to two mismatches) were filtered to remove all the sequences that aligned somewhere else in the genome, including multiple hits corrsponding to simple repeats.
Constent with our previous observations based on the analysis of the publically available small RNA datasets (Ameyar-Zazoua, et al., 2012 and Benhamed et al., 2012), alignment of the identifed D4Z4-derived small RNAs with the L42 sequence (Fig. 2A) and in particular with a single D4Z4 unit (Fig. 2B) revealed that the D4Z4 small RNAs form clusters with the majority of reads mapping to the previously identified cluster (in both chromatin and cytoplasmic fractions) near the DUX4 transcription start site (TSS) and fewer reads corresponding to the clusters of chromatin-associated D4Z4 small RNAs at the DUX4 promoter region and the upstream near Kpnl site region (Fig. 2B). However, a closer analysis revealed that most of the D4Z4 small RNA reads were only 15-17nt long, too short for loading on AGO, suggesting that the functional AGO2- and chromatin-associated small RNAs that are involved in DUX4 silencing (please see Dr. Lim's progress report) may not be enriched in our small RNA libraries prepared from the whole cell lysate total RNA.
Identification of small RNAs differentially expressed during muscle differentiation in control and FSHD cells.
It has been recently reported that miRNA requlation is disrupted in FSHD with most changed affecting cell cycle and muscle specific miRNAs (Dmitriev et al., 2013 and Colangelo et al, 2014). We first perfomed an unsupervised hierarchical clustering analysis of miRNA expression in four control samples (NR135 MB/MT, NR 209 MB/MT) and four FSHD samples (2062 MB/MT, 2305 MB/MT). This unbiased analysis revealed two main clusters corresponding to myoblasts (MB) and myotubes (MT) samples (Fig. 3A), suggesting that the most prominent differences in miRNA expression between these samples are directly attributed to muscle differentiation. Consistent with the previous report (Colangelo et al., 2014), differential expression analysis of miRNA expression in our samples identified fewer miRNAs in which expression was significantly changed, mostly upregulated, during differentiation (MB vs MT) in FSHD cells (39) in comparison to control (78) (p<0.05). Interestingly, all 39 FSHD muscle-specific miRNAs overlapped with the muscle-specific miRNAs identified in control cells, including miR-1, miR133A, miR133B, and miR206 (Fig. 3B,C). In contrast to known miRNAs, differential expression analysis of all small RNAs, including miRNAs, tRNAs, and novel small RNAs, revealed over 100 small RNAs, mostly novel and small RNAs mapped to the protein-coding sequences, that are differentially expressed during muscle differentiation in FSHD cells but not in control cells (Fig. 3B,D).
Identification of small RNAs that are differentially expressed between control and FSHD muscle cells.
We also compared miRNA expression between control and FSHD cells (control MB vs FSHD MB and control MT vs FSHD MT, p<0.05 and lfc>2) to identify control- or FSHD-specific miRNAs (Fig. 4A). Only a small nmber of known miRNAs showed significant expression differences between control MB and FSHD MB (7) and between control MT and FSHD MT (6) with three of them overlapping between MB and MT samples (Fig. 4B, C). In contrast to the previous reports (Dmitriev et al., 2013 and Colangelo et al., 2014), we did not detect any significant differences in expression of the known muscle-specific miRNAs, such as miR-1, miR133A, miR133B, and miR206, between control and FSHD cells,all of these miRNAs were robustly upregulated during mygenesis in both control and FSHD cells (Fig. 3C and also see Progress report #1, Fig. 3). In fact, only 2 out of 10 FSHD-specific miRNAs (control vs FSHD, Fig. 4B) were found to be differentially expressed during muscle differentiation (miR1308 and miR504).
Our differential expression analysis of all small RNAs revealed more small RNAs, mostly novel small RNAs and tRNAs, that were differentially expressed between control and FSHD cells (Fig. 4B, D). Most of these small RNAs (111 out of 142) showed differential expression only in MB. Since all our FSHD lines are from FSHD2 patients carrying SMCHD1 mutations, the observed FSHD-specific changes in tRNA and novel small RNA expression, may be due to SMCHD1-related defects in chromatin compaction and may represent a specific signature for FSHD2 myoblasts. Indeed, hypomethylation of the same tRNA clusters has been documented in FSHD2 muscle cells (S. Tapscott, unpublished data). However, a number of the differentially expressed control- or FSHD-specific small RNAs, mostly novel small RNAs, showed differential expression in both MB and MT (17 oui of 142) or in MT only (14 out of 142) (Fig. 4B),which is unlikely to be related to SMCHD1 mutations since SMCHD1 protein is known to be expressed in MB, but not in MT. An example of such FSHD-specific novel small RNA, located in intron 2 of the EBF2 gene and its read counts for each sample are shown in Fig. 4E.
To test if any of the observed FSHD-specific changes in small RNA expression are due to aberrant DUX4 expression, we prepared two additional control samples for small RNA-seq: control myoblasts transduced either with Lenti-DUX4 (positive control for DUX4 expression) or Lenti-GFP (negative control) (Fig. 5A). Overexpression of the exogenous DUX4 in Lenti-DUX4 sample was confirmed by expression in ZSCAN4, a DUX4 target gene (Fig. 5B). Unexpectedly, our intial differential expression analysis of small RNA expression in these samples did not reveal any differentially expressed miRNAs or tRNAs (data not shown). Although, we are still analyzing these data for expression of novel small RNAs, our initial analysis suggests that at least some changes in small RNA expression between control and FSHD2 MB/MT may not be directly caused by aberrant DUX4 expression.
Identification of small RNAs that are differentially expressed between control and FSHD iPS/EB cells.
DUX4 is expressed in undifferentiated ES/iPS cells and then repressed during embryoid body (EB) differentiation (see original proposal). To identify small RNAs differentially expressed during EB differentiation in control or FSHD condition, which may be involved in D4Z4/DUX4 silencing or may function in cell fate determination and differentiation, we prepared small RNA libraries from four control samples (HFF1 iPS/EB and M83-9 iPS/EB) and six FSHD2 samples (3FSHD iPS/EB, 7FSHD iPS/EB, and 2062 iPS/EB) for small RNA-seq (Fig. 6A). Alignment of the output small RNA-seq reads with Lambda 42/D4Z4 sequence identified candidate D4Z4 small RNAs that, similar to the D4Z4 small RNAs from MB/MT samples, formed clusters corresponding to the peviously identified clusters of D4Z4 small RNAs (Fig. 2B and data not shown). Although the number of the D4Z4 small RNA reads was higher in iPS/EB samples compared to MB/MT, a large fraction of the reads were shorter than the AGO-bound small RNAs (15-17nt long), suggesting that most of the chromatin-associated D4Z4 small RNAs may not be represented in our libraries prepared from the whole cell lysate total RNA.
To further analyze the expression of non-D4Z4 small RNAs in control and FSHD2 iPS/EB samples we perfomed unbiased hierarchical clustering analysis of miRNA expression in all ten iPS/EB samples. This analysis revealed two main clusters corresponding to undifferentiated (iPS) and differentiated (EB) samples (Fig. 6B), suggesting that the most significant differences in miRNA expression between these samples are due to robust iPS to EB differentiation. To validate small RNA libraries from control and FSHD2 iPS/EB samples, we confirmed differential expression of ES-specific miRNAs (miR-367/3 and miR-302b) and EB-specific miRNA (miR-145.5) (Fig. 6C). Our initial differential analysis of iPS/EB samples revealed a number of miRNAs differentially expressed between control and FSHD2 cells. Interestingly, most of these miRNAs (22) were detected in iPS cells, compared to only 5 detected in EBs (Fig. 6D), suggesting that even prior to SMCHD1 activation at day 10 after ES cell differentiation (Gendrel, et al., 2012), there may already be some epigenetic changes in FSHD2 cells that could affect miRNA regulation in early development. We are further analyzing these data for differential expression of novel small RNAs. In addition, since one of our FSHD2 iPS/EB samples (3FSHD), which does not contain a documented SMCHD1 mutation (D. Miller, personal communication), seems to cluster separately from the rest of FSHD2 samples (Fig. 6B), we will re-analyz the two FSHD2 iPS/EB samples with SMCHD1 mutations (7FSHD and 2062) separately.
In summary, we have made significant progress on this project:
(1) We have identified potential D4Z4 derived small RNA reads that were aligned to Lambda 52 sequence (AF117653). In spite of their smaller size (15-17nt), compared to known AGO bound small RNAs/miRNAs, most of them formed clusters in the D4Z4 noncoding regions including the promoter region close to TSS, consistent with our previous analysis of publically available small RNA datasets of chromatin associated small RNAs. Thus, our small RNA-seq data strongly support our hyposthesis on the expression/existence of the D4Z4-derived small RNAs and their potential roles in D4Z4 silencing in control and FSHD muscle cells. However, based on our recent discovery of the role of the D4Z4 small RNAs in AGO2 recruitment and heterochromatin formation at the D4Z4 repeat (see progress report of Dr. Jong-Won Lim), we expect that sequencing of small RNAs associated with AGO/repressive chromatin fraction will allow more efficient identification of D4Z4 small RNAs involved in D4Z4/DUX4 silencing in control and FSHD muscle cells.
(2) We have identified a set of non-D4Z4 small RNAs that are differentially expressed in control and FSHD2 muscle cells. Although FSHD-specific changes in miRNA expression have recently been reported, these reports mostly focused on known miRNAs. In our small RNA-seq project, in addition to known miRNAs, we have identified a large number of novel small RNAs that, together with the tRNa expression changes, seem to represent the small RNA signature of FSHD2 myoblasts. In addition, we have identified a distinct set of novel small RNAs that show highly significant changes in expression between control and FSHD2 cells either in both myoblasts and myotubes or in myotubes only. We expect to determine whether any of the observed FSHD-specific changes in small RNa expression are due to aberrant DUX4 expression and/or SMCHD1-related chromatin compaction defects by analyzing small RNA expression in control myoblasts either with overexpressed DUX4 or with depleted SMCHD1, respectively. Further validation and studying of the newly identified FSHD-specific small RNAs and their roles in relation to FSHD pathogenesis may provide new therapeutic targets for FSHD treatment. If FSHD-specific elevated expression of these small RNAs is confirmed in various FSHD cell lines (and potetially in blood plasma), they may also be useful as potential biomarkers for FSHD diagnosis.