【SureSelect RNA Capture 應用介紹 - 4】癌症 Gene Fusions — 腫瘤新抗原 (tumor neo-antigen) 新焦點


癌症由許多變異類型所驅動,包括 single-nucleotide substitution ( SNV, 單點突變 )、small insertion and deletion ( INDEL, 微小插入或缺失 )、copy-number variation ( CNV, 大片段增加或減少 )、absence of heterozygosity ( AOH, 染色體異質性發生異常 )、large structural variation ( SV, 染色體大型結構變異 ) 等。其中 gene fusions ( GFs, 基因融合,為 RNA 層次稱呼;DNA 層次稱為 translocation ) 為染色體大型結構變異的下游現象,於某些癌種中扮演關鍵角色。為人熟知的 gene fusions 有 BCR-ABL、NTRK fusions,它們除了診斷上的價值外,也成為不定腫瘤/組織療法 ( histology-agnostic therapy ) 的重要開發標的。

德國 TRON 轉譯癌症中心,為一非營利獨立生藥轉譯研究中心,位在谷騰堡 - 美茵茲大學 ( Johannes Gutenberg-Universität Mainz ) 校園內。趨勢上 WES 或 RNA-seq 越來越成為癌症 FFPE 樣品的標準實驗,因此研究團隊從臨床 FFPE RNA-seq 切入,演算分析出 RNA fusion 序列,開發個人化免疫治療。過往的 gene fusion 偵測演算法多基於高品質 DNA/RNA 所獲得的數據,與真實臨床樣品狀態具有落差,偵測分析時也未考量正常組織常見的良性 SV ( 如 KANSL-ARL17A/B ),或是相連基因的讀穿融合 ( cis-near read-through fusion,如 CTBS-GNG5 )。因此 TRON 研究團隊以 FFPE RNA 數據搭上 neo-antigen 機器學習演算,開發出新的 fusion 預測工具 — EasyFuse ( https://github.com/TRON-Bioinformatics/EasyFuse )。該團隊將預測的融合接點處 neo-peptides 進行 CD4+ 跟 CD8+ T 細胞免疫測試,獲得良好成果並發表在 2022 年 8 月的 Nature Biotechnology 期刊。以下是他們的研究內容:


1. 樣品與分析測試方法

作者團隊使用了數種樣品,包含 MCF7 及 SKBR3 這兩個已知具有 gene fusions 的乳腺癌細胞株、14 個 fresh-frozen ( FF ) 三陰性乳癌 ( Triple-Negative Breast Cancer, TNBC ) 樣品、以及 14 個不同癌種轉移狀態的 FFPE 樣品。每個樣品都技術二重複製備了 SureSelect V6 RNA 文庫,各文庫皆進行 50PE/75M 定序。定序數據以五個常見的 RNA fusion 分析工具 ( FusionCatcher、InFusion、SOAPfuse、STAR-Fusion、MapSplice2 ) 及作者團隊的 EasyFuse 進行分析預測。預測出的 fusions 再以 RT-qPCR 進行驗證。


2. 評估 MCF7 與 SKBR3 細胞株的 gene fusions 預測結果

過往文獻發現 MCF7 與 SKBR3 細胞株存在 54 個 gene fusions ( 其中有 34 個能在作者團隊的樣品以 qPCR 驗證到 ),五個常見的分析工具皆能在二重複的 RNA 數據裡穩定算出 29-33 個。除了已知的 gene fusions 外,這些分析工具分別產生數十、數百個或更多的 gene fusion 預測 ( 圖1a )。統合所有的 gene fusion 預測結果,僅有 12% 是在二重複數據中有再現 ( 圖1b )。爾後作者團隊挑了 133 個 gene fusion 預測結果進行 qPCR 驗證 ( 並確認產物片段大小 ),只在單一工具算出的 gene fusions ( n=33 ) 驗證度為 61% ( 圖1c );同時被多個工具或二重複數據都有被算出的 gene fusions 驗證度略高,達 65-81%。測試結果顯示,兩個細胞株內的 gene fusions 種類及數量,可能比文獻已知的多出不少,另外多工具的共同預測結果還有明顯的無法驗證率,因此難以只用文獻已知 fusions 及跨工具的共同預測結果,作為樣品內 fusions 族群真實存在樣貌。


3. 評估三陰性乳癌冷凍組織樣品預測結果

接著預測 14 個三陰性乳癌冷凍組織 ( TNBC Fresh-Frozen sample ) RNA-seq 數據中的 gene fusions 狀況,大約每個樣品可算出 302 個 gene fusions ( 圖1d )。二重複再現率跟細胞株相似,僅有 8% 的 gene fusions 被 2 個以上的工具算出。作者團隊挑了 492 個 gene fusion 預測結果進行 qPCR 驗證,其中 2 個以上工具重複算出的 gene fusions 可驗證比例達 78-100%,高於細胞株數據 ( 圖1e )。按可驗證度與預測數量推估,真實存在的 gene fusions 裡,可能 90% 只被單一工具所算出,僅 10% 會是多工具再現 ( 圖1f )。結果顯示目前常用的 " 多分析工具交集 " 做法,對靈敏度影響很大。

接著分析不同組織樣品中算出的 gene fusions,所有的 gene fusion 預測 ( n=2,772 ) 裡有 13% ( n=425 ) 在 2 個以上不同樣品被算出 ( 圖2a ),而這 13% 的 gene fusions 裡,有 71% ( n=302 ) 的預測斷點具有 cis-near 特性 ( same chromosome, same strand, within 1 Mb ) ( 圖2b ),這些 cis-near 特性 fusions 很可能是 read-through transcription 造成的相連基因融合 RNA 訊號。這些具有組織再現性的相連基因融合 RNA 預測結果,其 qPCR 驗證率跟整體差不多,不像是分析工具的偏差所造成。


4. 評估預測結果的 Tumor Specificity

為了瞭解 gene fusion 預測結果的腫瘤特異性 ( tumor specificity ),作者團隊分析了來自 48 種不同的正常組織的 136 個不相關樣品 ( 包含了 4 個乳房樣品 ),顯示在不同乳癌樣品中重複算出的 gene fusions,與正常乳房組織算出的 gene fusions 具高度重疊,這些重疊出現的 gene fusions 預測有 74% 為 cis-near 特性 ( 圖2c )。而不同乳癌樣品中重複算出、並具有 cis-near 特性的 gene fusions 預測有 39% 出現在正常乳房組織,而出現在其他正常組織的比例甚至達 49%。對比之下,trans-like gene fusions 的這兩個比例分別只有 1% 和 5%,顯示大部分的 cis-near gene fusions 預測,不是腫瘤特異性結果。

圖1 | Highly diverse GF prediction with different tools. a, GFs (n = 2,361) were predicted with five prediction tools for the MCF7 breast cancer cell line from two sequencing replicates. b, The number of distinct predicted GFs (y axis) in MCF7 is shown for different combinations of tools (x axis) as indicated by dots below the bars. The fraction of GFs identified by one tool (orange) or multiple tools (green) is shown in the pie chart, and the percentage of GFs identified in two sequencing replicates (gray diagonal pattern) for one and multiple tool predictions is shown in the horizontal bar chart. c, 133 GFs were validated by RT–qPCR in MCF7 and SKBR3. Shown is the fraction of positive (blue) and negative (red) tested GFs according to the number of detecting tools and according to identification in one or both sequencing replicates. Labels at the top indicate number of tested GFs. d, Predicted GFs for 14 primary triple-negative breast cancer (TNBC) samples are shown (n = 4,488; one tool = orange, multiple tools = green). The pie chart shows the fraction of GFs predicted by one tool (orange) and multiple tools (green). e, 492 GFs were validated by RT–qPCR in 14 primary TNBC samples. Shown is the fraction of positive (blue) and negative (red) tested GFs according to the number of detecting tools. Labels at the top indicate the number of tested GFs. f, Considering the confirmation rate from RT–qPCR and the predicted number of GFs according to the number of predicting tools, the number of true-positive GFs was estimated (one tool = light blue, multiple tools = dark blue).

圖2 | Recurrent GFs are enriched for cis-near fusions in normal tissue. a, The frequency of recurrence is shown for distinct GFs in 14 TNBC samples (bar chart). The proportion of recurrently (dark gray) and uniquely (light gray) predicted GFs is also depicted (pie chart). b, Recurrent GFs are enriched for cis-near configuration. The number of recurrent and unique GFs is shown according to the configuration type of the breakpoints. 'cis' indicates GFs with breakpoints on same chromosome, 'trans' on different chromosome. 'inv' indicates breakpoints on different strands. 'cis-near' indicates GFs with breakpoints on the same chromosome and strand, within 1 Mb distance, whereas 'cis-far' indicates GFs with breakpoints farther apart. Percentages and total numbers of GFs are indicated. c, The overlap between recurrent (dark gray) and unique (light gray) predicted GFs in 14 TNBC samples with four normal breast tissue samples (green) is shown (Venn). The proportion of GFs in 'cis-near' configuration (green) for shared recurrent and unique GFs is shown (pie chart).


5. EasyFuse 改善 FFPE tumor-specific fusion 預測

為了增進腫瘤特異性預測以及臨床 FFPE 樣品的適用度,作者團隊開發了 EasyFuse 演算法。該演算法最佳化 mapping 流程,並建立了一個機器學習篩選核心。Mapping 流程上加強 trans-like gene fusion 的分析力,針對 discordant read pairs ( >200 kb )、具有 soft-clip 的 split reads、unmapped reads 進行重點分析,並加強排除非腫瘤特異的 read-through transcription 資訊。首先輸入了 14 個不同癌種的臨床實際 FFPE 樣本 ( 腫瘤含量 20-90% ) 的二重複 SureSelect V6 FFPE RNA 定序數據 ( 圖4a ),並以 qPCR 驗證這些 gene fusions 預測 ( 圖4b、c )。接著挑其中 11 個樣品的數據及驗證結果來訓練 Random Forest 機器學習模型,餘下的 3 個樣品數據用於效能評估 ( 圖4b )。作者團隊發現,EasyFuse 的機器學習模型評估資料中," 斷點類型 ( type ) "、" 斷點使用已知 exon 的剪切邊界 ( exon_boundary ) "、" 橫跨斷點的 read pair 數 ( ft_span ) " 三項資料擁有主要預測權重 ( 圖4d )。另外,在二重複資料中都有被算出的 gene fusions,它們的預測分數也都高出許多。

接著將 Arriba、先前用過的 5 個工具與 EasyFuse 一起分析 3 個效能評估用的臨床 FFPE 數據,EasyFuse 顯示出較好的陽性預測值 ( Positive Predictive Value,PPV )、靈敏度 ( Sensitivity ) ( 圖4g ),對於 cis-near fusion、trans-like fusion 兩種類型的偵測,整體表現也都較好 ( 圖4h )

圖4 | Machine learning contributes to highly specific prediction of GFs in FFPE tumor samples. a, The number of predicted GFs (as unique breakpoint pairs) for 14 tumor samples. Breakpoint pairs that were identified in both replicates are shown in dark gray, whereas those found in single replicates are in light gray. The tumor type, histological tumor content of the sample and whether it is a primary (P) or metastasis (M) sample is indicated below. b, A total of 853 fusion breakpoints detected by EasyFuse across all samples were validated by RT–qPCR (positive in blue and negative in red); validation data were separated by samples into training and test datasets. c, The number and percent of GF breakpoints per sample detected in a single replicate or both replicates (top) as well as the number and percent of fusions validated positive (blue) or negative (red) in these subsets. d, Features used in the machine learning models. The heat map indicates which features (rows) are used in the different models (columns). The relative feature importance in the random forest model is shown. ft refers to the GF, and wt1 and wt2 refer to the corresponding wild-type variants. A more detailed description of all features can be found in Supplementary Table 10. g, Benchmark of fusion prediction performance between EasyFuse with the model 'EF_full' and six other tools on all validated fusions genes in the three test samples. PPV (top), sensitivity (middle) and F1 score (bottom) were calculated by weighting according to the concordance between tools as described in the Methods. h, Performance separately for GFs in cis-near (same chromosome and strand, less than 1 Mb distance) and trans-like configuration. The concordance bins were recalculated on the two subsets of fusions for weighted performance calculation.


6. Fusion neo-antigen 可應用於個人化的腫瘤免疫治療

新生抗原 ( neo-antigen ) 為基因突變導致的異常蛋白質,可做為個人化抗癌疫苗的抗原。為了確認 gene fusions 作為腫瘤免疫治療標的效果,作者團隊以 EasyFuse 預測 14 個 FFPE 黑色素瘤 ( melanoma ) 樣本裡可能的 gene fusions,並挑高可能性 neo-antigen 進行自體 T 細胞免疫辨識能力測試。作者團隊先把 non-coding transcripts 以及非腫瘤源的 fusions 從所有的 gene fusions預測結果中濾掉 ( 圖5a ),再將過濾後的 fusions 進行 HLA class I 或 class II 可匹配新生抗原決定位 ( neo-epitopes ) 計算 ( 圖5b )。最後篩選出的 30 個高可能性 neo-antigens,以 IFN-γ ELISpot 測試患者 PBMCs 對這些 neo-antigens 的 CD4+ 或 CD8+ T 細胞反應性。這 30 個neo-antigens 中,有 10 個引發陽性 CD4+ T 細胞反應 ( 圖5c )、1 個陽性 CD8+ T 細胞反應 ( 圖5c )。免疫反應特異性部分,除了 PPP1R12C-CNN2 是對 wild-type peptide 會直接引起反應,其他都是由 novel peptides ( 不論是橫跨斷點或 out-of-frame ) 產生反應。有反應的 novel peptides 中,ZNF417-TSPAN11 同時產生了 CD4+ 及 CD8+ T 細胞反應,產生反應的是兩條跨斷點、互有重疊的 fusion peptides ( 圖5e )。比對預測計算參數跟實際反應性,發現出現反應的結果皆為 neo-epitopes 預估 HLA 結合濃度 < 500nM 者 ( 圖5f ),但與 fusion 的 RNA 定序 read 數量多寡未有明確關連。

圖5 | Predicted GFs encode immunogenic neo-antigens eliciting CD4+ and CD8+ T cell responses. a,The EasyFuse machine learning model predicted a median of 46 GFs in 14 patients with melanoma. b, Candidate filtering and epitope prediction resulted in a median of nine GFs per patient that encode at least one MHC class I or class II epitope. c, IFN-γ ELISpots after IVS were carried out for 30 selected fusion peptides with patient-derived PBMCs and resulted in CD4+ T cell responses for ten fusion peptides. CD4+ T cell reactivity against GF targets for PA-043, PA-045, PA-046, uID004 and uID018 (samples indicated by *) were not determined owing to high ELISpot background. d, Quantification of IFN-γ CD4+ T cell responses after IVS (11 days) and re-stimulation with iDCs pulsed with target pool peptides for the respective patient sample. iDCs loaded with irrelevant peptides served as control. Data are shown as background-corrected mean spot count. e, Post-IVS-measured CD4+ and CD8+ T cell response to target peptide-loaded iDCs exemplified for one patient (^ indicates predicted breakpoint position). f, Predicted binding affinity of GF-encoded neo-epitopes, which were tested for CD8+ T cell response (MHC class I) or CD4+ T cell response (MHC class II). Binding affinity is shown in nanomolar (nM) as predicted by netMHCpan for MHC class I epitopes and netMHCIIpan for MHC class II epitopes with patient-specific alleles. The dotted line indicates an affinity of 500 nM. Peptides without predicted epitopes are not shown.


癌症研究、治療開發的速度一日千里,許多過去因技術限制忽略的資訊,皆用新一代技術突破並用於細胞治療、免疫治療。過去十年,SureSelect 技術協助科學界對腫瘤 DNA 突變更加的瞭解,建立泛癌腫突變數據庫。現在 SureSelect 可以讓您發現癌症 RNA fusions。不論以標準 whole exon,或是聚焦研究主題的 custom design,都能為您帶來可靠、有價值的研究結果,為不同的癌症,帶來新型治療的曙光。另外,過去您可能累積了許多的癌症 DNA 的突變資料,加入 gene fusions 後,能讓您的主題產生更高層次、更全觀的視野,為您奠定免疫治療、細胞治療的研究發展基礎。



【參考資料】
1. https://www.nature.com/articles/s41587-022-01247-9

留言