Roche logo

Roche: Real-World Evidence Integration for Immunology BRA

Immunology Oncology Post-Marketing Real-World Evidence
75%

Reduction in data preparation time

12

Data sources unified into single view

24/7

Real-time signal monitoring

40%

Faster PBRER cycle completion

Overview

Roche is one of the world's largest biopharmaceutical companies, with a significant presence in immunology and oncology. Its immunology portfolio includes Ocrevus (ocrelizumab), the leading treatment for relapsing and primary progressive forms of multiple sclerosis, and its oncology franchise features Tecentriq (atezolizumab), a PD-L1 checkpoint inhibitor approved across multiple tumor types including non-small cell lung cancer, hepatocellular carcinoma, and urothelial carcinoma.

As post-marketing safety surveillance obligations expanded across both therapeutic areas, Roche's Global Patient Safety organization recognized a critical need to modernize how real-world evidence was integrated into ongoing benefit-risk assessments. The existing infrastructure relied on fragmented, manually curated databases that could not keep pace with the volume and velocity of post-marketing data being generated globally.

ArcaScience partnered with Roche to deploy an enterprise-grade data integration layer that unified disparate safety data sources and automated the incorporation of real-world evidence into quantitative benefit-risk analysis workflows.

The Challenge

Roche's pharmacovigilance infrastructure had evolved organically over two decades, resulting in a patchwork of 12 distinct safety databases, each with its own data model, coding conventions, and access protocols. The challenge was multi-dimensional:

Siloed safety databases. Individual Case Safety Reports (ICSRs) from spontaneous reporting, clinical trials, post-authorization safety studies (PASS), and patient support programs were stored in separate systems. The global safety database (Argus), clinical trial safety data in Oracle Clinical, and regional pharmacovigilance systems in China, Japan, and Brazil each operated independently. Reconciling a single drug's safety profile required manual extraction from up to eight separate sources.

Fragmented real-world evidence. RWE from electronic health records (EHR), claims databases (Optum, Truven MarketScan), disease registries (MSBase for multiple sclerosis, Flatiron Health for oncology), and social media monitoring existed in disparate formats. The pharmacoepidemiology team spent an estimated 60% of their time on data wrangling rather than analysis. There was no standardized pipeline to incorporate RWE into the Periodic Benefit-Risk Evaluation Report (PBRER) workflow.

Manual data reconciliation. Each PBRER cycle for Ocrevus required approximately 14 weeks of data preparation, including manual MedDRA coding harmonization across sources, deduplication of cases reported through multiple channels, and reconciliation of divergent adverse event terminology. For Tecentriq, the complexity was amplified by the drug's use across six distinct indications, each with different expected safety profiles and comparator landscapes.

Regulatory pressure. EMA's GVP Module VII requirements for signal detection increasingly expected sponsors to incorporate RWE alongside spontaneous reporting data. FDA's Sentinel System integration expectations and PMDA's evolving RWE guidance meant that Roche needed a scalable, validated approach to multi-source evidence synthesis, not just for two products but as a platform capability for the entire portfolio.

The ArcaScience Solution

ArcaScience deployed its Data Intelligence Engine as the foundational integration layer, connecting Roche's 12 existing safety data sources into a unified, continuously updated evidence base. The implementation was executed in three phases over 16 weeks, with full GxP validation.

Phase 1: Data Harmonization & Integration

The ArcaScience platform established automated ETL pipelines to ingest data from Roche's Argus safety database, Oracle Clinical trial repositories, regional PV systems, and external RWE sources. A semantic harmonization layer mapped divergent coding systems (MedDRA versions 23.0 through 26.1, WHO-Drug dictionaries, and institution-specific terminologies) into a unified ontology. Automated deduplication algorithms identified and reconciled cases reported through multiple channels, reducing duplicate case counts by 18% for Ocrevus and 22% for Tecentriq.

Phase 2: Automated RWE Incorporation

ArcaScience configured continuous data feeds from MSBase (the international MS registry with 80,000+ patient records), Flatiron Health's oncology EHR database, Optum claims data, and CPRD (UK primary care records). Natural language processing models extracted structured adverse event data from unstructured clinical notes, while propensity score-weighted analyses adjusted for confounding in observational comparisons. The platform generated automated incidence rate calculations with confidence intervals, updated weekly, for all MedDRA Preferred Terms across both products.

Phase 3: Quantitative BRA Integration

The Decision Intelligence module consumed the unified evidence base to generate continuously updated benefit-risk assessments. For Ocrevus, the platform implemented a multi-criteria decision analysis (MCDA) framework incorporating relapse rate reduction, disability progression, infection risk (including PML risk stratification), and immunoglobulin depletion monitoring. For Tecentriq, indication-specific BRA models incorporated tumor response rates, immune-related adverse event profiles, and comparator data from checkpoint inhibitor competitors (pembrolizumab, nivolumab, durvalumab). All outputs were formatted for direct insertion into PBRER Section 8 (Benefit-Risk Analysis).

Real-Time Signal Monitoring

ArcaScience deployed a continuous signal detection dashboard that ran disproportionality analyses (PRR, ROR, BCPNN, MGPS) across all unified data sources simultaneously. The system generated automated alerts when statistical thresholds were exceeded, with contextualized clinical assessments powered by ArcaScience's causal inference models. Signal evaluation reports were generated in regulatory-ready format, aligned with EMA's GVP Module IX requirements for signal management.

Platform Modules Used

Data Intelligence Engine Decision Intelligence Regulatory Outputs RWE Integration Module Signal Detection Dashboard

Implementation Timeline

16 weeks

Products Covered

Ocrevus (ocrelizumab)

Tecentriq (atezolizumab)

Regulatory Jurisdictions

FDA, EMA, Swissmedic, PMDA, NMPA, ANVISA

Results & Impact

75%

Data Preparation Time Reduction

PBRER data preparation for Ocrevus decreased from 14 weeks to 3.5 weeks. The automated ETL pipelines eliminated manual data extraction, coding harmonization, and cross-source reconciliation. For Tecentriq, multi-indication data preparation dropped from 18 weeks to 4 weeks, freeing pharmacovigilance scientists to focus on clinical interpretation rather than data wrangling.

12

Unified Data Sources

Twelve previously siloed databases now feed into a single, continuously updated evidence base: Argus global safety database, Oracle Clinical, three regional PV systems (Japan, China, Brazil), MSBase registry, Flatiron Health, Optum claims, CPRD, EudraVigilance, FAERS, and VigiBase. All sources are harmonized to a common data model with full provenance tracking.

23

New Signals Detected

The unified multi-source signal detection approach identified 23 potential safety signals in the first 12 months of operation, including 7 that had not been detected through spontaneous reporting alone. Three signals were subsequently confirmed through targeted epidemiological studies and incorporated into risk management plans. Early detection of a hepatotoxicity signal for Tecentriq in the hepatocellular carcinoma indication led to proactive label updates.

40%

Faster PBRER Cycles

End-to-end PBRER preparation time reduced by 40% across both products. The automated benefit-risk analysis module generates Section 8 content with quantified benefit-risk trade-offs, effects tables, and value trees. Regulatory affairs teams reported that the structured, data-driven format improved consistency across jurisdictions and reduced health authority queries by 35% compared to previous submission cycles.

"For years, our pharmacovigilance teams spent the majority of every PBRER cycle on data preparation rather than scientific evaluation. ArcaScience fundamentally changed that equation. By unifying our 12 safety databases and automating RWE integration, we can now focus on what matters most: understanding the evolving benefit-risk profile of our medicines and making evidence-driven decisions that protect patients. The real-time signal monitoring capability alone has transformed our ability to detect and respond to emerging safety concerns."

Dr. Katharina Bergmann

Vice President, Global Patient Safety & Pharmacovigilance

Roche Pharma — Immunology & Oncology

Technical Details

Data Sources Integrated

  • Argus Safety Database: Global ICSR repository with 450,000+ cases across Roche portfolio
  • Oracle Clinical: Phase I-IV clinical trial safety data including patient-level adverse event reports
  • Regional PV Systems (3): PMDA-compliant system (Japan), NMPA-compliant system (China), ANVISA-compliant system (Brazil)
  • MSBase Registry: International MS registry with 80,000+ patient records tracking long-term outcomes for Ocrevus and comparators
  • Flatiron Health: Oncology-specific EHR-derived database covering 280+ cancer clinics in the US
  • Optum Claims Database: US commercial and Medicare claims data for population-level incidence rate estimation
  • CPRD (UK): Primary care electronic health records linked to hospitalization and mortality data
  • FAERS: FDA post-marketing spontaneous reporting data with 10+ years of historical reports
  • EudraVigilance: EMA adverse reaction reporting system for European market surveillance
  • VigiBase (WHO): Global ICSR database maintained by Uppsala Monitoring Centre for international signal validation

AI Models Applied

  • Semantic Harmonization Models (n=3): Cross-dictionary mapping between MedDRA versions, WHO-Drug, and institution-specific terminologies with 99.2% coding accuracy
  • Deduplication Algorithms (n=2): Probabilistic record linkage and deterministic matching to identify duplicate cases across reporting channels (18-22% duplicate reduction)
  • NLP Extraction Models (n=5): Adverse event extraction from unstructured EHR clinical notes, radiology reports, and discharge summaries with 94.7% F1 score
  • Signal Detection Models (n=4): Multi-algorithm approach running PRR, ROR, BCPNN, and MGPS simultaneously across unified data with Bayesian shrinkage correction
  • Causal Inference Models (n=3): Propensity score-weighted analyses for observational comparisons, Bradford Hill assessment, and temporal association modeling
  • MCDA Benefit-Risk Models (n=2): Indication-specific multi-criteria frameworks with swing weighting and stochastic sensitivity analysis

Validation & Compliance

The implementation was validated under Roche's GxP computer system validation framework:

  • GAMP 5 Category 5: Full validation lifecycle including User Requirements Specification, Functional Specification, Design Specification, IQ/OQ/PQ protocols
  • Data Integrity: All ETL pipelines validated for ALCOA+ compliance (Attributable, Legible, Contemporaneous, Original, Accurate + Complete, Consistent, Enduring, Available)
  • Audit Trail: Complete data lineage from source system to regulatory output, compliant with FDA 21 CFR Part 11 and EU Annex 11
  • Model Validation: Each AI model validated against expert-labeled gold standard datasets with documented performance metrics, bias assessments, and drift monitoring
  • Periodic Revalidation: Quarterly model performance reviews with automated degradation alerts and MedDRA version update protocols

Regulatory Context

This implementation was aligned with the following regulatory frameworks:

  • EMA GVP Module VII: Periodic Safety Update Reports requirements for structured benefit-risk evaluation
  • EMA GVP Module IX: Signal management requirements for multi-source signal detection and evaluation
  • ICH E2C(R2): Periodic Benefit-Risk Evaluation Report guidelines defining content and format expectations
  • ICH E2E: Pharmacovigilance Planning requirements for proactive safety monitoring
  • FDA REMS: Risk Evaluation and Mitigation Strategy integration for Ocrevus PML risk monitoring
  • FDA Sentinel Initiative: Alignment with FDA's active surveillance methodology for real-world safety monitoring
  • Outcome: First PBRER cycle using ArcaScience was accepted by EMA PRAC without major objections. FDA reviewers commended the structured RWE integration approach during the Ocrevus annual REMS assessment.

Related Case Studies

Oncology Signal Detection at Scale

AstraZeneca

50% faster signal evaluation

Immunology Portfolio Risk Management

Sanofi

Unified RMP across 8 products

CRO-Sponsor Safety Data Exchange

ICON plc

70% reduction in reconciliation effort

Unify Your Safety Data & Accelerate Benefit-Risk Analysis

See how ArcaScience can integrate your siloed safety databases and automate real-world evidence incorporation into your pharmacovigilance workflows. Talk to a scientist about your data integration challenges.

Talk to a Scientist Download PDF