Executive Summary

Real-World Evidence (RWE) offers a transformative opportunity to supplement and strengthen traditional pharmacovigilance approaches. While spontaneous adverse event reporting remains the backbone of post-marketing safety surveillance, its well-known limitations—under-reporting, reporting biases, and inability to estimate incidence rates—leave significant gaps in safety monitoring.

This whitepaper explores how real-world data sources including electronic health records (EHRs), insurance claims databases, disease registries, and emerging digital health data can be systematically integrated into signal detection workflows. We examine the methodological approaches, regulatory acceptance landscape, and practical implementation considerations, drawing on ArcaScience's experience deploying RWE-enhanced signal detection across multiple pharmaceutical portfolios.

1. RWE Data Sources for Signal Detection

Real-world data for pharmacovigilance comes from a diverse ecosystem of sources, each offering distinct strengths and limitations. Understanding these characteristics is essential for designing effective RWE-enhanced signal detection programs.

Electronic Health Records (EHRs)

EHR databases capture the longitudinal clinical journey of patients, including diagnoses, procedures, medications, laboratory results, vital signs, and clinical notes. Major EHR-derived research databases include Optum EHR, Flatiron Health (oncology-focused), CPRD (UK), and institutional consortia. EHRs provide rich clinical context that spontaneous reports typically lack, enabling more nuanced assessment of temporal relationships, confounders, and alternative explanations.

Longitudinal data Clinical context Lab values Unstructured notes

Insurance Claims Databases

Claims databases (e.g., Optum Claims, MarketScan, Medicare/Medicaid) capture healthcare utilization for large covered populations, including prescription dispensing, medical encounters, and diagnoses. Their key advantages are large population size (often tens of millions of patients), relatively complete capture of healthcare encounters within covered populations, and standardized coding (ICD-10, NDC, CPT). However, they lack clinical detail and may miss events that do not result in a coded healthcare encounter.

Large populations Prescription data Standardized coding Healthcare utilization

Disease and Product Registries

Registries provide focused, often prospective data collection for specific diseases or products. They typically include detailed clinical outcomes, treatment responses, and long-term follow-up that may not be available in other data sources. Examples include SEER (cancer), Cystic Fibrosis Foundation Registry, and manufacturer-sponsored product registries. Their depth of clinical detail is a significant strength, though they tend to cover smaller populations.

Prospective collection Clinical depth Long-term follow-up Disease-specific

Emerging Digital Health Sources

Patient-generated health data from wearable devices, mobile health apps, patient forums, and social media are increasingly recognized as potential sources for early signal detection. These data capture patient experiences in real time and can detect functional impacts (mobility changes, sleep disruption, mood changes) that may not be reported through traditional channels. While methodological frameworks for their use in pharmacovigilance are still maturing, early evidence suggests value for detecting quality-of-life impacts and patient-perceived adverse effects.

Real-time data Patient-reported Continuous monitoring Functional outcomes

2. Signal Detection Methodologies with RWE

Applying RWE to signal detection requires methodological approaches that account for the non-randomized, observational nature of these data. Several validated study designs are applicable:

2.1 New-User Cohort Designs

New-user (incident user) cohort designs follow patients from the point of treatment initiation, comparing outcomes against an appropriate reference population. This design avoids prevalent user bias and enables estimation of incidence rates and relative risks. Active comparator new-user designs, where the reference group consists of patients initiating an alternative treatment for the same indication, provide the strongest evidence by addressing confounding by indication.

2.2 Self-Controlled Designs

Self-controlled case series (SCCS) and self-controlled risk interval (SCRI) designs compare the rate of an outcome within defined risk periods after drug exposure to the rate during control periods within the same individual. These designs inherently control for time-invariant confounders (genetics, chronic conditions, socioeconomic factors) because each patient serves as their own control. They are particularly powerful for assessing acute-onset adverse events.

2.3 Sequence Symmetry Analysis

This method examines whether prescriptions for a suspected adverse drug reaction treatment are more commonly initiated after starting the drug of interest than before. For example, if proton pump inhibitor prescriptions are significantly more likely to follow initiation of a suspected GI-toxic drug than to precede it, this asymmetry provides signal evidence. The method is computationally efficient and can be applied to large claims databases as a screening tool.

2.4 Tree-Based Scan Statistics

Tree-based scan statistics (TreeScan) evaluate thousands of potential adverse outcomes simultaneously, organized hierarchically (e.g., by ICD-10 code hierarchy or MedDRA hierarchy). This approach identifies unexpected safety signals without requiring pre-specification of the outcome of interest, making it valuable for exploratory signal detection in newly marketed products.

3. Disproportionality Analysis with RWE

While disproportionality analysis was developed for spontaneous reporting databases, adapted versions can be applied to RWE sources to complement traditional approaches:

Traditional (Spontaneous Reports)

  • Compares reporting rates vs. background
  • Relies on voluntary reporting
  • Cannot estimate incidence
  • Subject to stimulated reporting
  • Limited temporal detail

RWE-Enhanced Approach

  • Compares diagnosis rates in exposed vs. unexposed
  • Complete capture within data source
  • Enables incidence rate estimation
  • Less subject to reporting biases
  • Precise temporal relationships available

ArcaScience's platform applies RWE disproportionality analysis by computing observed-to-expected ratios of diagnoses in drug-exposed patient populations compared to matched reference populations, adjusting for age, sex, comorbidity burden, and healthcare utilization patterns. This approach enables detection of signals that may be obscured in spontaneous reporting by under-reporting or confounding.

4. Case-Control Studies Automation

When a potential signal is identified through screening methods, case-control studies provide a rigorous framework for further evaluation. ArcaScience automates the key steps of RWE-based case-control studies:

  1. Case identification: Automated phenotyping algorithms identify cases of the outcome of interest using combinations of diagnosis codes, laboratory values, procedures, and NLP-extracted concepts from clinical notes. Validated phenotyping algorithms are applied where available; when novel phenotypes are required, the platform uses ensemble methods combining rule-based and machine-learning approaches.
  2. Control selection: Risk-set sampling matches controls to cases on index date and key confounders. The platform supports various matching strategies including exact matching, propensity score matching, and disease risk score matching.
  3. Exposure assessment: Drug exposure is determined from prescription records, dispensing data, or administration records, with algorithms to estimate exposure duration, handle gaps and overlaps, and classify current versus past exposure.
  4. Confounder adjustment: High-dimensional propensity score methods or disease risk score approaches automatically identify and adjust for hundreds of potential confounders from the patient's medical history.
  5. Effect estimation: Conditional logistic regression estimates odds ratios with confidence intervals, supplemented by sensitivity analyses varying exposure definitions, washout periods, and confounder sets.

5. Regulatory Acceptance of RWE for Signal Detection

Regulatory agencies have increasingly endorsed the use of RWE to support pharmacovigilance activities:

Agency Key Guidance/Initiative RWE Signal Detection Stance
FDA RWE Framework (2018), Sentinel System Active use of RWE for safety surveillance through Sentinel; encourages sponsor use of RWE for post-marketing safety studies
EMA DARWIN EU, ENCePP guidance DARWIN EU network for routine RWE safety analyses; ENCePP provides methodological standards for RWE studies
PMDA MID-NET (Medical Information Database Network) MID-NET enables active surveillance using hospital EHR data for post-marketing safety evaluation
Health Canada Drug Safety and Effectiveness Network DSEN supports post-market studies using RWE to evaluate safety signals
ICH E2E (Pharmacovigilance Planning) Recommends consideration of non-clinical and clinical data sources beyond spontaneous reporting for signal detection
Regulatory Best Practice

When using RWE for signal detection in regulatory submissions, agencies expect transparency about data source selection, study design rationale, potential biases, and the limitations of the analysis. ArcaScience's platform generates methodology documentation that meets these transparency requirements, including data source characterization, study protocol, statistical analysis plan, and detailed limitations discussion.

6. ArcaScience's RWE Platform Capabilities

ArcaScience's platform provides an integrated environment for RWE-enhanced signal detection, combining data access, analytical tools, and regulatory-ready output generation:

6.1 Data Network

6.2 Analytical Toolkit

6.3 Signal Integration Hub

Strengthen Your Signal Detection with Real-World Evidence

Discover how ArcaScience's RWE platform can enhance your pharmacovigilance capabilities.

Request a Demo  |  info@arcascience.ai