Progress Update: Identification of small RNAs generated from the D4Z4 array by deep high-throughput sequencing

SUMMARY/SPECIFIC AIMS (not changed from the original application):

Facioscapulohumeral muscular dystrophy (FSHD) is caused by incomplete repression of the D4Z4 macrosatellite repeat array on the disease-permissive chromosome 4q that results in aberrant expression of DUX4, the candidate FSHD gene imbedded within the D4Z4 repeat. We hypothesized that ongoing sense and antisense transcription at the D4Z4 region is involved in DUX4 epigenetic silencing mediated by the D4z4-derived small RNAs and that disruption of this silencing may be linked either to deficiencies in small RNA production due to shortened D4Z4 arrays (FSHD1) or to deficiencies in chromatin compaction (FSHD2). Indeed, we have recently shown that Argonaute2 (AGO2) and Dicer1, key compounds of the smal RNA processing machinery, are necessary to maintain DUX4 epigenetic silencing in non-pathogenic condition and that augmenting levels of the D4Z4 small RNAs may induce heterochromatin assembly at D4Z4 and result in the silencing of DUX4 expression in FSHD muscle (described in the progress report on Dr. Jong-Won Lim project). To identify the D4Z4-derived small RNAs that may play a role in DUx4 silencing we proposed to perform deep high-throughput sequencing of small RNAs isolated from FSHD and control muscle in iPS/EB cells. From this small RNA sequencing project, we also expected to identify differentially expressed non-D4Z4 small RNAs that may be involved in FSHD pathogenesis.

STUDIES AND RESULTS:

In spite of the technical difficulties at the Genomics Core (several rounds of troubleshooting of failed small RNA library preparations and multiple efforts to increase sequencing depth resulting in additional sequence runs), most of the work proposed on this project has been completed: (1) we have prepared and validated small RNA libraries from myoblasts and myotubes from three control and three FSHD individuals, as well as from iPS and EBs from two controls and three FSHD patients, (2) we have performed deep high-throughput sequencing of the small RNA libraries using Illumina HiSeq 2000 technology and barcoding for multiplexing of pooled libraries (with 4-6 samples per lane) that allowed to obtain 20-40 millions of sequence reads per sample. Sequence reads were clustered, de-convoluted, trimmed from 3-prime adapters and assessed for quality control and size distribution; finally, the reads were aligned and annotated against the genome; (3) to identify D4Z4-derived small RNAs, the small RNA sequence reads were aligned to Lambda 42 sequence (AF117653) and then filtered to remove all the sequences that aligned somewhere else in the genome; in addition, (4) the small RNA reads from other genomic loci were compared between the groups (MB vs MT and control vs FSHD) to identify small RNAs specific for muscle differentiation and FSHD condition, respectively.

Re-sizing of the small RNA libraries from control and FSHD myoblasts and myotubes.

Small RNA seqt-Figure 1
Figure 1. Increased depth of small RNA sequencing after re-capturing shorter reads from the FSHD and control MB/MT small RNA libraries. (A) Size distribution of the reads after re-sizing and capturing 130-160 nt cDNA fragments (10-40 nt long small RNA reads after trimming the adaptors) from the pooled FSHD and control MB/MT small RNA libraries. (B, C) Genome-wide annotation of different types of small RNAs present in the small RNA libraries shows that miRNAs comprise 30-40% of the total small RNA reads in the resized libraries compared to the original 3-4% (B), with major peaks of miRNAs at 20-25 nt. rRNAs/tRNAs at 30-35 nt, and unknown (other or protein coding) small RNAs contributing to both 25 and 35 nt peaks (C).

As we previously reported, our initial sequence analysis of the small RNA libraries that we prepared from control and FSHD myoblasts/myotubes (MB/MT) did not identify any D4Z4-derived small RNAs (Progress report #1). The genome-wide alignment of our RNA sequence reads revealed that more than 90% of the reads corresponded to snoRNAs/rRNAs/tRNAs ranging from ~50 to 80nt in size, whereas most of the small RNAs of interest including miRNAs were ~20-30nt long. Therefore, if we removed larger transcripts and re-sequenced the existing small RNA libraries we could increase the sequencing depth (by ~ 10 fold) necessary to detect D4Z4 small RNAs.

Another consideration we had was that since most of the D4Z4 small RNAs were identified in the chromatin fraction (previously reported analysis of the publically available small RNA datasets in Ameyar-Zazoua, et al., 2012 and Benhamed et al., 2012) and our small RNA libraries were prepared from the total RNA isolation from the whole cell lysate, some of the chromatin-associated D4Z4 small RNAs might not be represented in our libraries. In the end, we have decided to go ahead with the deeper sequencing of our existing small RNA libraries and depending on the sequencing results, consider isolating and sequencing small RNAs associated with the chromatin fraction in the future.

In collaboration with our Genomics Core, we have successfully re-sized our existing small RNA libraries from the FSHD and control muscle cells by pooling the original libraries and capturing 130-160 bp cDNA fragments corresponding to 10-40 nt long small RNAs. Sequencing (50 cycle single read) was done in 2 lanes with expected ~20-25 mln sequence reads per sample. The initial analysis of the output sequences (after de-multiplexing and removal of 3-prime adaptor sequences) confirmed that the size distribution of the sequence reads ranged from 10 to 40 nt this time with most of the reads corresponding to the two major peaks of ~15 and 35nt long small RNAs (Fig. 1A). Row counts were normalized by library sequencing depth. Genome-wide alignment and annotation of different types of small RNAs present in the small RNA libraries showed that miRNAs comprise 30-40% of the total small RNA reads in the resized libraries, which is 10 times higher compared to the original 3-4% (Fig. 1B) with the major peaks of miRNAs at 20-25nt, rRNAs/tRNAs at 30-35nt, and unknown ('other' or 'protein coding') small RNAs contributing to both 25 and 35nt peaks (Fig. 1C).

Identification of the endogenous D4Z4 small RNAs in control and FSHD muscle cells.

The small RNA sequence reads from the resized small RNA libraries were aligned to Lambda 42 sequence (AF117653) to identify D4Z4-derived small RNAs. The D4Z4 matched sequence reads (allowing up to two mismatches) were filtered to remove all the sequences that aligned somewhere else in the genome, including multiple hits corrsponding to simple repeats.

Small RNA seq-Figure 2
Figure 2. Identification of the andogenous D4Z4/DUX4 small RNAs in control and FSHD muscle cells. Alignment of the small RNA sequencing reads to the L42 sequence shown in (A) and with the DUX4 gene embedded in the distal D4Z4/pLAM sequence shown in (B) identified several D4Z4-derived small RNA reads in control and FSHD MB/MT indicated as black (A) and red (B) vertical marks. The majority of the reads mapped to the previously identified cluster of the D4Z4 small RNAs near the DUX4 TSS (Ameyar-Zazoua et al., 2012) shown in (B) as black (chromatin-associated) and orange (cytoplasmic small RNAs) vertical marks. Note that much fewer reads aligned with the previously identifed clusters of chromatin associated D4Z4 small RNAs at the DUX4 promoter region and the upstream near Kpn1 site region shown by black vertical marks in (B). Numbered vertical marks correspond to the exogenous siRNAs previously tested for DUX4 silencing (see progress report on Dr. Lim's project).

Constent with our previous observations based on the analysis of the publically available small RNA datasets (Ameyar-Zazoua, et al., 2012 and Benhamed et al., 2012), alignment of the identifed D4Z4-derived small RNAs with the L42 sequence (Fig. 2A) and in particular with a single D4Z4 unit (Fig. 2B) revealed that the D4Z4 small RNAs form clusters with the majority of reads mapping to the previously identified cluster (in both chromatin and cytoplasmic fractions) near the DUX4 transcription start site (TSS) and fewer reads corresponding to the clusters of chromatin-associated D4Z4 small RNAs at the DUX4 promoter region and the upstream near Kpnl site region (Fig. 2B). However, a closer analysis revealed that most of the D4Z4 small RNA reads were only 15-17nt long, too short for loading on AGO, suggesting that the functional AGO2- and chromatin-associated small RNAs that are involved in DUX4 silencing (please see Dr. Lim's progress report) may not be enriched in our small RNA libraries prepared from the whole cell lysate total RNA.

Identification of small RNAs differentially expressed during muscle differentiation in control and FSHD cells.

It has been recently reported that miRNA requlation is disrupted in FSHD with most changed affecting cell cycle and muscle specific miRNAs (Dmitriev et al., 2013 and Colangelo et al, 2014). We first perfomed an unsupervised hierarchical clustering analysis of miRNA expression in four control samples (NR135 MB/MT, NR 209 MB/MT) and four FSHD samples (2062 MB/MT, 2305 MB/MT). This unbiased analysis revealed two main clusters corresponding to myoblasts (MB) and myotubes (MT) samples (Fig. 3A), suggesting that the most prominent differences in miRNA expression between these samples are directly attributed to muscle differentiation. Consistent with the previous report (Colangelo et al., 2014), differential expression analysis of miRNA expression in our samples identified fewer miRNAs in which expression was significantly changed, mostly upregulated, during differentiation (MB vs MT) in FSHD cells (39) in comparison to control (78) (p<0.05). Interestingly, all 39 FSHD muscle-specific miRNAs overlapped with the muscle-specific miRNAs identified in control cells, including miR-1, miR133A, miR133B, and miR206 (Fig. 3B,C). In contrast to known miRNAs, differential expression analysis of all small RNAs, including miRNAs, tRNAs, and novel small RNAs, revealed over 100 small RNAs, mostly novel and small RNAs mapped to the protein-coding sequences, that are differentially expressed during muscle differentiation in FSHD cells but not in control cells (Fig. 3B,D).

Small RNA seq-Figure 3
Figure 3. Identifcation of small RNAs differentially expressed during muscle differentiation. (A) Hierarchical clustering heat map of 4 control samples (NR135 MB/MT, NR209 MB/MT) and 4 FSHD samples (2062 MB/MT, 2305 MB/MT). Color code shows degree of similarity among samples (blue: high and red: low). (B) Venn diagram showing overlapping and non-overlapping muscle-specific miRNAs (left) or all small RNAs (right) between control and FSHD samples. (C) Examples of most significantly differentially expressed miRNAs during muscle differentiation. miRNAs with high significance (padj) and log fold changes (lfc) MB vs MT are highlighted in blue for FSHD cells and in green for controls. (D) Summary table showing differentially expressed small RNA reads for each small RNA type for control and FSHD samples during muscle differentiation (MB vs MT).

Identification of small RNAs that are differentially expressed between control and FSHD muscle cells.

We also compared miRNA expression between control and FSHD cells (control MB vs FSHD MB and control MT vs FSHD MT, p<0.05 and lfc>2) to identify control- or FSHD-specific miRNAs (Fig. 4A). Only a small nmber of known miRNAs showed significant expression differences between control MB and FSHD MB (7) and between control MT and FSHD MT (6) with three of them overlapping between MB and MT samples (Fig. 4B, C). In contrast to the previous reports (Dmitriev et al., 2013 and Colangelo et al., 2014), we did not detect any significant differences in expression of the known muscle-specific miRNAs, such as miR-1, miR133A, miR133B, and miR206, between control and FSHD cells,all of these miRNAs were robustly upregulated during mygenesis in both control and FSHD cells (Fig. 3C and also see Progress report #1, Fig. 3). In fact, only 2 out of 10 FSHD-specific miRNAs (control vs FSHD, Fig. 4B) were found to be differentially expressed during muscle differentiation (miR1308 and miR504).

Small RNA seq-Figure 4
Figure 4. Identification of small RNAs differentially expressed between control and FSHD samples. (A) Deseq analysis of differentially expressed miRNAs in control MB vs FSHD MB and control MT vs FSHD MT. Only miRNAs that exhibited significant differential expression (p-value < 0.05, shown as red dots) and log2 fold change (relative to the mean of expression) ≥ 2 were considered for further analysis. (B) Venn diagram showing overlapping and non-overlapping FSHD-specific miRNAs (left) or all small RNAs (right) between MB and MT samples. (C) Most significantly differentially expressed miRNAs in control and FSHD cells (control MB vs FSHD MB and in control MT vs FSHD MT). FSHD-specific miRNAs with high significance (padj) and log fold changes (lfc) are highlighted in blue for MT samples and in green for MB samples. (D) Summary table showing differentially expressed small RNA reads of each small RNA types between control and FSHD cells (control vs FSHD) for MB and MT samples. (E) An example of a novel miRNA differentially expressed in control vs FSHD (FSHD-specific) cells. A gene browser screen shot shows location of the novel miRNA in intron 2 of the EBF2 gene and numbers of reads for control and FSHD samples.

Our differential expression analysis of all small RNAs revealed more small RNAs, mostly novel small RNAs and tRNAs, that were differentially expressed between control and FSHD cells (Fig. 4B, D). Most of these small RNAs (111 out of 142) showed differential expression only in MB. Since all our FSHD lines are from FSHD2 patients carrying SMCHD1 mutations, the observed FSHD-specific changes in tRNA and novel small RNA expression, may be due to SMCHD1-related defects in chromatin compaction and may represent a specific signature for FSHD2 myoblasts. Indeed, hypomethylation of the same tRNA clusters has been documented in FSHD2 muscle cells (S. Tapscott, unpublished data). However, a number of the differentially expressed control- or FSHD-specific small RNAs, mostly novel small RNAs, showed differential expression in both MB and MT (17 oui of 142) or in MT only (14 out of 142) (Fig. 4B),which is unlikely to be related to SMCHD1 mutations since SMCHD1 protein is known to be expressed in MB, but not in MT. An example of such FSHD-specific novel small RNA, located in intron 2 of the EBF2 gene and its read counts for each sample are shown in Fig. 4E.

Small RNA seq-Figure 5
Figure 5. Preparation of positive (Lenti-DUX4) and negative (Lenti-GFP) control samples for small RNA-seq. (A) Expression of DUX4 (immunostaining with anti-DUX4 antibody) and GFP in control myoblasts NR-135 transduced with Lenti-DUX4 and Lenti-GFP, respectively. (B) ZSCAN4, a DUX4 target gene, is significantly induced by Lenti-DUX4 transduction (qRT-PCR).

To test if any of the observed FSHD-specific changes in small RNA expression are due to aberrant DUX4 expression, we prepared two additional control samples for small RNA-seq: control myoblasts transduced either with Lenti-DUX4 (positive control for DUX4 expression) or Lenti-GFP (negative control) (Fig. 5A). Overexpression of the exogenous DUX4 in Lenti-DUX4 sample was confirmed by expression in ZSCAN4, a DUX4 target gene (Fig. 5B). Unexpectedly, our intial differential expression analysis of small RNA expression in these samples did not reveal any differentially expressed miRNAs or tRNAs (data not shown). Although, we are still analyzing these data for expression of novel small RNAs, our initial analysis suggests that at least some changes in small RNA expression between control and FSHD2 MB/MT may not be directly caused by aberrant DUX4 expression.

Identification of small RNAs that are differentially expressed between control and FSHD iPS/EB cells.

Small RNA seq-Figure 6
Figure 6. Small RNA-seq analysis of control and FSHD iPS/EB samples. (A) Study design for analysis of small RNAs in induced pluripotency stem (iPS) cells and embryoid bodies (EB) from control and FSHD2 individuals. Arrows indicate group comparisons. (B) Hierarchical clustering heatmap of miRNA expression in 4 control (M83-9 iPS/EB, HFF1 iPS/EB) and 6 FSHD (3FSHD iPS/EB, 7FSHD iPS/EB, and 2062 iPS/EB) samples. Color code shows degree of similarity among samples (blue: high and red: low). (C) Expression of ES/iPS-specific (miR-367.3 and miR-302b) and EB-specific (miR-145-5) miRNAs in control and FSHD iPS/EB samples. These miRNAs show expected differential expression between iPS and EB in both control and FSHD samples. (D) Summary table showing differentially expressed small RNA reads of each small RNA type between control and FSHD samples (control vs FSHD).

DUX4 is expressed in undifferentiated ES/iPS cells and then repressed during embryoid body (EB) differentiation (see original proposal). To identify small RNAs differentially expressed during EB differentiation in control or FSHD condition, which may be involved in D4Z4/DUX4 silencing or may function in cell fate determination and differentiation, we prepared small RNA libraries from four control samples (HFF1 iPS/EB and M83-9 iPS/EB) and six FSHD2 samples (3FSHD iPS/EB, 7FSHD iPS/EB, and 2062 iPS/EB) for small RNA-seq (Fig. 6A). Alignment of the output small RNA-seq reads with Lambda 42/D4Z4 sequence identified candidate D4Z4 small RNAs that, similar to the D4Z4 small RNAs from MB/MT samples, formed clusters corresponding to the peviously identified clusters of D4Z4 small RNAs (Fig. 2B and data not shown). Although the number of the D4Z4 small RNA reads was higher in iPS/EB samples compared to MB/MT, a large fraction of the reads were shorter than the AGO-bound small RNAs (15-17nt long), suggesting that most of the chromatin-associated D4Z4 small RNAs may not be represented in our libraries prepared from the whole cell lysate total RNA.

To further analyze the expression of non-D4Z4 small RNAs in control and FSHD2 iPS/EB samples we perfomed unbiased hierarchical clustering analysis of miRNA expression in all ten iPS/EB samples. This analysis revealed two main clusters corresponding to undifferentiated (iPS) and differentiated (EB) samples (Fig. 6B), suggesting that the most significant differences in miRNA expression between these samples are due to robust iPS to EB differentiation. To validate small RNA libraries from control and FSHD2 iPS/EB samples, we confirmed differential expression of ES-specific miRNAs (miR-367/3 and miR-302b) and EB-specific miRNA (miR-145.5) (Fig. 6C). Our initial differential analysis of iPS/EB samples revealed a number of miRNAs differentially expressed between control and FSHD2 cells. Interestingly, most of these miRNAs (22) were detected in iPS cells, compared to only 5 detected in EBs (Fig. 6D), suggesting that even prior to SMCHD1 activation at day 10 after ES cell differentiation (Gendrel, et al., 2012), there may already be some epigenetic changes in FSHD2 cells that could affect miRNA regulation in early development. We are further analyzing these data for differential expression of novel small RNAs. In addition, since one of our FSHD2 iPS/EB samples (3FSHD), which does not contain a documented SMCHD1 mutation (D. Miller, personal communication), seems to cluster separately from the rest of FSHD2 samples (Fig. 6B), we will re-analyz the two FSHD2 iPS/EB samples with SMCHD1 mutations (7FSHD and 2062) separately.

SUMMARY/SIGNIFICANCE/FUTURE PLANS

In summary, we have made significant progress on this project:

(1) We have identified potential D4Z4 derived small RNA reads that were aligned to Lambda 52 sequence (AF117653). In spite of their smaller size (15-17nt), compared to known AGO bound small RNAs/miRNAs, most of them formed clusters in the D4Z4 noncoding regions including the promoter region close to TSS, consistent with our previous analysis of publically available small RNA datasets of chromatin associated small RNAs. Thus, our small RNA-seq data strongly support our hyposthesis on the expression/existence of the D4Z4-derived small RNAs and their potential roles in D4Z4 silencing in control and FSHD muscle cells. However, based on our recent discovery of the role of the D4Z4 small RNAs in AGO2 recruitment and heterochromatin formation at the D4Z4 repeat (see progress report of Dr. Jong-Won Lim), we expect that sequencing of small RNAs associated with AGO/repressive chromatin fraction will allow more efficient identification of D4Z4 small RNAs involved in D4Z4/DUX4 silencing in control and FSHD muscle cells.

(2) We have identified a set of non-D4Z4 small RNAs that are differentially expressed in control and FSHD2 muscle cells. Although FSHD-specific changes in miRNA expression have recently been reported, these reports mostly focused on known miRNAs. In our small RNA-seq project, in addition to known miRNAs, we have identified a large number of novel small RNAs that, together with the tRNa expression changes, seem to represent the small RNA signature of FSHD2 myoblasts. In addition, we have identified a distinct set of novel small RNAs that show highly significant changes in expression between control and FSHD2 cells either in both myoblasts and myotubes or in myotubes only. We expect to determine whether any of the observed FSHD-specific changes in small RNa expression are due to aberrant DUX4 expression and/or SMCHD1-related chromatin compaction defects by analyzing small RNA expression in control myoblasts either with overexpressed DUX4 or with depleted SMCHD1, respectively. Further validation and studying of the newly identified FSHD-specific small RNAs and their roles in relation to FSHD pathogenesis may provide new therapeutic targets for FSHD treatment. If FSHD-specific elevated expression of these small RNAs is confirmed in various FSHD cell lines (and potetially in blood plasma), they may also be useful as potential biomarkers for FSHD diagnosis.

See grant Identification of small RNAs generated from the D4Z4 Array