Science & Methodology

Science-First Approach to Benefit-Risk Analysis

ArcaScience combines established pharmacoepidemiology with cutting-edge AI to deliver structured, reproducible, and regulatory-aligned benefit-risk assessments. We start with regulatory science and build technology around it — not the other way around.

Proprietary AI Models

100B+

Data Points Harmonized

200+

Regulatory-Grade Outputs Generated

Our Scientific Philosophy

ArcaScience was founded on a simple but uncommon principle in health technology: science must come before software. Our founders spent years conducting benefit-risk assessments manually — building BRAT value trees by hand, reconciling disproportionality analyses across fragmented spreadsheets, and assembling PSURs from dozens of disconnected data sources. They understood firsthand which parts of the process needed automation and, critically, which parts required human scientific judgment that technology should support rather than replace.

This experience shaped a founding conviction: the right way to build an AI platform for regulatory science is to start with the science, not with the AI. Too many health-tech companies begin with a machine learning capability and search for a regulatory application. ArcaScience began with a deep understanding of what regulators need, what pharmacovigilance teams struggle with, and what benefit-risk decisions actually require — and then built AI models purpose-designed to serve those needs.

The Science → Technology → Regulation Framework

Science

Start with regulatory science: pharmacoepidemiology, quantitative BRA frameworks, signal detection methodology. Understand what the science requires before writing a line of code.

Technology

Build AI models and data infrastructure purpose-designed for pharmacovigilance. Domain-specific NLP, validated statistical methods, deterministic and reproducible outputs.

Regulation

Validate every output against regulatory expectations. FDA, EMA, PMDA, Health Canada — outputs are structured, auditable, and submission-ready from the moment they are generated.

This framework governs every product decision at ArcaScience. Every new model, feature, and integration is evaluated first on its scientific validity, then on its technical feasibility, and finally on its regulatory alignment. The result is a platform where AI serves science, and science serves regulators and patients.

Our founding team includes pharmacoepidemiologists, biostatisticians, and regulatory scientists who have collectively contributed to over 100 regulatory submissions across the FDA, EMA, and PMDA. This direct experience with the regulatory process is encoded into every workflow, template, and validation check in the platform.

Our Scientific Advisory Board brings together world-class expertise from academia, regulatory agencies, and pharmaceutical R&D to ensure that every model, framework, and output meets the highest standards of regulatory science. The SAB reviews all major methodological decisions and provides ongoing guidance on emerging regulatory expectations.

BRA Frameworks We Implement

ArcaScience's analytical core is built on established, peer-reviewed quantitative frameworks for structured benefit-risk assessment. These are not theoretical implementations — they are operationalized, AI-assisted workflows that produce regulatory-grade outputs. Every assessment produced by the platform is grounded in one or more of these frameworks, selected based on the product's lifecycle stage, therapeutic area, and regulatory jurisdiction.

The platform automates the labor-intensive portions of each framework — data gathering, evidence synthesis, quantitative scoring, and structured output generation — while preserving the scientific judgment points where human expertise is essential.

BRAT Framework (Benefit-Risk Action Team)

The BRAT framework was originally developed by the Pharmaceutical Research and Manufacturers of America (PhRMA) and adopted by FDA's Center for Drug Evaluation and Research (CDER) as a structured approach to qualitative-to-quantitative benefit-risk assessment. It provides a systematic process for defining, identifying, sourcing, and evaluating the benefits and risks of medical products within a transparent decision-making framework.

How ArcaScience operationalizes BRAT:

Define the decision context: AI-assisted stakeholder identification and decision-framing templates pre-populated from the product's regulatory history and therapeutic area context
Identify outcomes: Automated systematic evidence review using NLP models to extract benefit and risk endpoints from clinical trial publications, CSRs, and regulatory documents
Source data: Automated population of BRAT value trees and effects tables from the AS Profiling Base 100B® integrated evidence dataset, spanning FAERS, EudraVigilance, clinical trials, and published literature
Customize the framework: Interactive weighting and visualization tools that allow teams to explore how different stakeholder perspectives affect the benefit-risk balance
Assess and communicate: Automated generation of effects tables, forest plots, and structured narratives in FDA-compatible and EMA-compatible formats

The platform reduces the manual effort of a complete BRAT assessment from 8–12 weeks to under 2 weeks, while maintaining full traceability from every data point to its source document.

MCDA (Multi-Criteria Decision Analysis)

Multi-Criteria Decision Analysis provides a rigorous quantitative method for evaluating trade-offs between multiple benefit and risk criteria. Endorsed by the EMA's PROTECT initiative and increasingly adopted by FDA reviewers, MCDA translates complex multi-dimensional benefit-risk profiles into transparent, weighted scoring models that expose the assumptions underlying every regulatory decision.

ArcaScience MCDA capabilities:

Weighted scoring models: Configurable criteria weighting with swing weighting, direct rating, and discrete choice methods for eliciting stakeholder preferences from clinicians, regulators, patients, and payers
Sensitivity analysis: One-way, two-way, and probabilistic sensitivity analyses that show how the benefit-risk conclusion changes under different weighting assumptions — essential for demonstrating robustness to regulators
Stochastic MCDA: Monte Carlo simulation incorporating uncertainty from data heterogeneity, sample size limitations, and between-study variability to produce probabilistic benefit-risk characterizations
Scenario modeling: Side-by-side comparison of benefit-risk profiles across subpopulations, dosing regimens, and comparator therapies with interactive visualization

The platform generates interactive visualizations that allow decision-makers to explore how different weighting assumptions affect the overall benefit-risk balance, with full export to regulatory-ready formats.

PrOACT-URL (Structured Decision Framework)

PrOACT-URL is a structured decision framework developed for complex benefit-risk problems where multiple stakeholders, competing objectives, and deep uncertainty make traditional analytical approaches insufficient. The framework decomposes benefit-risk decisions into eight manageable elements, each addressed systematically:

Problem — Define the decision context: what product, what indication, what lifecycle stage, what regulatory jurisdiction
Objectives — Identify what matters: efficacy endpoints, safety concerns, patient-reported outcomes, quality of life measures
Alternatives — Define the comparators: placebo, active comparator, standard of care, no treatment
Consequences — Quantify the effects: treatment effects, adverse event rates, time-to-event data
Trade-offs — Evaluate the balance: how do benefits weigh against risks for different patient populations
Uncertainty — Characterize what we do not know: confidence intervals, data gaps, model uncertainty
Risk tolerance — Incorporate stakeholder risk attitudes: regulator, prescriber, and patient risk preferences
Linked decisions — Consider downstream consequences: label restrictions, REMS requirements, post-marketing commitments

ArcaScience implements PrOACT-URL as a guided workflow within the platform, with each step supported by AI-assisted data retrieval, quantitative analysis, and structured documentation. This framework is particularly valuable for advisory committee preparation and PRAC referral responses.

NNT/NNH Analysis (Number Needed to Treat/Harm)

NNT (Number Needed to Treat) and NNH (Number Needed to Harm) provide intuitive, clinically meaningful metrics for quantifying treatment effects. These measures translate relative risk reductions and increases into absolute patient-level impact, making benefit-risk communication accessible to clinicians, regulators, and patients alike.

Platform implementation:

Automated NNT/NNH computation: Calculated from clinical trial data, meta-analyses, and real-world evidence with confidence intervals and subgroup stratification
Likelihood of Being Helped or Harmed (LHH): NNT/NNH ratios computed across all major benefit-risk endpoint pairs to provide a single summary metric of the benefit-risk balance
Population-adjusted estimates: NNT and NNH values adjusted for baseline risk in specific patient populations using real-world prevalence and incidence data from the AS Profiling Base
Time-horizon modeling: Duration-specific NNT/NNH calculations that show how the benefit-risk balance evolves over treatment duration — critical for chronic therapies

NNT/NNH analyses are automatically integrated into effects tables and benefit-risk summaries, providing the quantitative backbone for structured regulatory communication in CTD Module 2.5 and PBRER documents.

Effects Tables and Value Trees

Effects tables and value trees are the primary visual outputs recommended by both the FDA and EMA for structured benefit-risk communication. ArcaScience auto-generates both formats from the integrated evidence base:

Effects tables: Summarize key outcomes with effect sizes, confidence intervals, NNT/NNH values, strength of evidence grading, and data source provenance for each endpoint
Value trees: Hierarchical decomposition of the benefit-risk balance into constituent criteria, with quantitative weights and scores at each node
Multi-format output: Generated simultaneously in FDA-compatible (Structured Benefit-Risk Framework) and EMA-compatible (PROTECT) layouts from a single evidence base

These structured outputs are directly aligned with the FDA's Structured Benefit-Risk Assessment Framework (PDUFA V commitment), EMA's Benefit-Risk Methodology Project (PROTECT) recommendations, and ICH M4E(R2) guidance on CTD structure.

FDA and EMA Methodological Alignment

ArcaScience's frameworks are specifically designed to meet the methodological expectations articulated in current regulatory guidance. Every framework implementation in the platform maps directly to regulatory requirements:

FDA: Structured Benefit-Risk Assessment Framework (PDUFA V/VI commitment), incorporating the dimensions of Analysis of Condition, Current Treatment Options, Benefit, Risk, and Risk Management. Platform outputs map directly to the five-dimension framework used by FDA review divisions.
EMA: PROTECT (Pharmacoepidemiological Research on Outcomes of Therapeutics by a European Consortium) recommendations for quantitative benefit-risk methods, including MCDA and structured expert elicitation approaches endorsed by PRAC.
PMDA: Japan's Pharmaceuticals and Medical Devices Agency benefit-risk assessment expectations, including J-PSUR and RMP requirements with Japanese-language output support.
Health Canada: Alignment with Health Canada's benefit-risk framework and Summary Basis of Decision requirements.
ICH: M4E(R2) guidance on the Common Technical Document structure for benefit-risk presentation, E2C(R2) for PBRER, and E2E for pharmacovigilance planning.

AI Model Architecture

ArcaScience's 24 proprietary AI/ML models are purpose-built for pharmacovigilance and benefit-risk analysis. Unlike general-purpose large language models applied to healthcare data, these models are trained on domain-specific corpora with pharmacovigilance-specific ontologies, validated against regulatory-grade benchmarks, and designed to produce auditable, traceable, deterministic outputs.

Every model in the platform carries a validation dossier documenting training data composition, performance metrics, known limitations, and ongoing monitoring results. Models are organized into four functional categories, each addressing a distinct layer of the benefit-risk analysis workflow.

Data Intelligence

Category 1

NLP Models

Natural language processing for unstructured biomedical text:

Biomedical text mining from 2.4M+ pharmacovigilance documents
Automated MedDRA coding (96.2% accuracy at PT level)
Case narrative analysis and adverse event extraction
Systematic literature screening and data extraction
Multilingual support (EN, FR, DE, JP, ZH)

Decision Intelligence

Category 2

Statistical Models

Disproportionality analysis and Bayesian inference:

PRR, ROR, BCPNN, EBGM signal detection methods
Bayesian hierarchical models for sparse data scenarios
Time-to-onset Weibull distribution fitting
Stratified analysis (age, sex, geography, reporting year)
Multi-drug interaction signal detection

Decision Intelligence

Category 3

Predictive Models

Forecasting and trajectory modeling for proactive safety:

Signal forecasting with 89% precision in triage classification
Benefit-risk trajectory modeling across treatment duration
Population-level outcome prediction for subgroups
Comparative effectiveness forecasting
Regulatory action probability modeling

Automated Outputs

Category 4

Validation Models

Quality assurance and cross-source reconciliation:

Automated data quality scoring per data source
Cross-source consistency checks and reconciliation
Deduplication across spontaneous reporting databases
Missing data pattern detection and imputation
Output audit and internal consistency verification

Training Data and Model Governance

ArcaScience maintains rigorous standards for model training, validation, and ongoing governance:

Training Data Sources:

Publicly available pharmacovigilance databases (FAERS, EudraVigilance quarterly extracts, VigiBase aggregate data)
Published biomedical literature (PubMed/MEDLINE corpus, Embase, Cochrane Library systematic reviews)
Clinical trial registries and results databases (ClinicalTrials.gov, EU Clinical Trials Register)
Regulatory submission archives and label change histories across FDA, EMA, and PMDA

Model Governance:

Validation against known signals: Every signal detection model is validated against a reference set of historically confirmed and refuted safety signals across 12 therapeutic areas
Regular retraining: Quarterly model retraining cycles with updated data, followed by full revalidation before production deployment
Bias monitoring: Continuous monitoring for demographic bias, geographic bias, and temporal drift in model performance, with automated alerts for statistically significant performance degradation
Complete lineage: Version-controlled model artifacts, training data snapshots, hyperparameter configurations, and validation results for every model in production

Performance Metrics:

Signal detection sensitivity: 92.4% for confirmed safety signals in retrospective validation (vs. 71.8% for traditional disproportionality alone)
Signal detection specificity: 87.1%, reducing false-positive burden by 40% compared to unaugmented statistical methods
Positive predictive value: 68.3% for signal triage classification, validated across oncology, immunology, neurology, and cardiology therapeutic areas
NLP extraction accuracy: 96.2% MedDRA coding accuracy at Preferred Term level; 94.7% for drug-event relation extraction from unstructured narratives

Distinction from General-Purpose LLMs

ArcaScience's AI approach is fundamentally different from applying general-purpose large language models to pharmacovigilance tasks. This distinction is not academic — it is the difference between outputs that regulators can trust and outputs that they cannot:

Deterministic outputs: For signal detection and quantitative analysis, models produce reproducible numerical results, not probabilistic text generation. The same input data always produces the same output.
Full traceability: Every model output links back to specific input data points with a complete provenance chain, enabling the audit trails required by 21 CFR Part 11 and GVP Module IX.
No hallucination risk: Quantitative models do not generate synthetic facts. NLP extraction models are constrained to entities present in source documents with provenance tracking. There is no generative component producing unsourced claims.
Regulatory validation: Each model undergoes formal validation per GAMP 5 Category 5 software validation principles, with documented IQ/OQ/PQ protocols and ongoing performance qualification.

Data Science Pipeline

Proprietary Dataset

AS Profiling Base 100B®

Our proprietary harmonized dataset integrates, normalizes, and continuously updates over 100 billion data points from the world's most comprehensive pharmacovigilance, clinical trial, and real-world evidence sources — creating a unified analytical environment for benefit-risk analysis.

The AS Profiling Base is not a raw data lake. It is a curated, harmonized, quality-scored evidence base where every data point is coded to standard ontologies, linked to its source, and enriched with metadata that enables rigorous cross-source analysis.

100B+

Harmonized data points

40+

Integrated data sources

Meaningful benefit-risk analysis requires the synthesis of heterogeneous data sources, each with different structures, quality characteristics, and inherent biases. The Data Intelligence Engine is purpose-built to harmonize these sources while preserving the metadata necessary for rigorous bias characterization and data provenance tracking.

Data Sources

The AS Profiling Base integrates data across the full spectrum of pharmacovigilance and clinical evidence:

Spontaneous reporting databases: FDA FAERS (quarterly AERS extracts since 2004), EMA EudraVigilance (EV Web access and line listings), WHO VigiBase (aggregate and case-level data via UMC), plus national databases including MHRA Yellow Card, ANSM, and BfArM
Clinical trial databases: ClinicalTrials.gov (400,000+ registered studies), EU Clinical Trials Register, individual patient-level data from sponsor datasets integrated via CDISC SDTM/ADaM standards
Published literature: PubMed/MEDLINE (36M+ citations), Embase, Cochrane Library, with automated systematic review and meta-analysis capabilities
Real-world evidence: Claims databases, electronic health records, patient registries, and structured social media pharmacovigilance monitoring for early signal detection
Regulatory intelligence: Label changes, Dear Healthcare Provider letters, risk communications, REMS programs, and risk management plans across FDA, EMA, PMDA, and Health Canada

Harmonization and Standardization

Raw data from heterogeneous sources is harmonized through a multi-stage pipeline designed for both accuracy and auditability:

MedDRA coding: Adverse events standardized to MedDRA (Medical Dictionary for Regulatory Activities) at all hierarchy levels — SOC, HLGT, HLT, PT, and LLT — using AI-assisted coding validated against expert manual coding
WHODrug mapping: Drug substances identified and normalized using WHO Drug Dictionary Enhanced, resolving brand names, generics, active substances, and common misspellings
Temporal normalization: Alignment across databases with different reporting cadences, lag times, and date format conventions to enable valid cross-source temporal analyses
CDISC compliance: Clinical trial data integrated via CDISC SDTM and ADaM standards, ensuring interoperability with sponsor datasets and regulatory submission formats

Data Quality and Bias Characterization

Every data source carries inherent biases and quality limitations that must be characterized for responsible interpretation. The platform applies automated quality assessment at every stage:

Automated Quality Scoring:

Completeness scoring: percentage of required fields populated, weighted by analytical importance
Consistency checks: cross-field validation, temporal plausibility, and dosage-indication coherence
Data quality scorecards generated for each analysis, documenting the evidence quality underpinning every conclusion

Deduplication:

Probabilistic record linkage algorithms identify and reconcile duplicate reports across FAERS, EudraVigilance, and VigiBase
Follow-up report consolidation preserves the most complete version of each case while maintaining full case history

Bias Characterization:

Reporting bias: Weber effect modeling, notoriety bias detection, and stimulated reporting adjustment for spontaneous report data
Selection bias: Demographic profiling against expected population distributions with stratified subgroup analysis
Confounding: Stratification, multivariate adjustment, and propensity score methods where individual-level data is available
Publication bias: Funnel plot analysis and Egger's regression for systematic review components
Missing data: Pattern analysis (MCAR/MAR/MNAR classification) with multiple imputation and sensitivity analysis where appropriate

Regulatory-Grade Validation

ArcaScience's outputs are designed to meet the standards of major global regulatory authorities from the moment they are generated. This is not an afterthought — regulatory compliance is an architectural requirement that shapes every aspect of the platform, from data ingestion to final document output.

The platform has supported over 200 regulatory-grade outputs across FDA, EMA, PMDA, and Health Canada submissions, with a 100% acceptance rate for platform-generated PSUR/PBRER and CTD Module 2.5 documents.

GAMP 5 / CSV Approach

The platform is validated under GAMP 5 (Good Automated Manufacturing Practice) Category 5 guidelines for configurable software products:

Documented IQ/OQ/PQ protocols for every module
Risk-based approach to Computer System Validation
Change control procedures with impact assessment
Periodic revalidation on quarterly cycles

21 CFR Part 11 Compliance

Full compliance with FDA 21 CFR Part 11 requirements for electronic records and electronic signatures:

Immutable audit trails for all data access and modifications
Electronic signature support with identity verification
Role-based access control with least-privilege enforcement
System security controls including encryption at rest and in transit

Audit Trail & Reproducibility

Every analysis in the platform is fully reproducible and auditable:

Complete data lineage from source to output for every data point
Timestamped, immutable logs of all analytical operations
One-click analysis reproduction with identical parameters and data snapshots
Version-controlled analytical configurations for longitudinal comparison

Model Interpretability

Regulators require explainable AI. Every model output includes:

Feature importance rankings for predictive model outputs
Confidence scores with calibrated uncertainty bounds
Source attribution linking every conclusion to supporting evidence
Plain-language model explanation summaries suitable for regulatory reviewers

Multi-Authority Regulatory Alignment

The platform encodes regulatory guidance from multiple authorities directly into its analytical workflows, enabling teams to generate jurisdiction-specific outputs from a single integrated evidence base:

FDA: Structured Benefit-Risk Assessment Framework (PDUFA V/VI), CFR Title 21, ICH E2C(R2) implementation for PBRER, IND Safety Reporting (21 CFR 312.32)
EMA: GVP Modules V (Risk Management), VII (PSUR/PBRER), IX (Signal Management), PRAC assessment workflows, PROTECT quantitative BRA recommendations
PMDA: Japanese pharmacovigilance requirements, J-PSUR format compliance, RMP generation in Japanese, PMDA-specific safety reporting thresholds
Health Canada: Summary Basis of Decision requirements, Canadian Vigilance Program compatibility, Risk Management Plan standards
ICH harmonization: M4E(R2) for CTD structure, E2E for pharmacovigilance planning, E2C(R2) for PBRER, CIOMS IV/VIII/X/XI methodological standards

Publications & Research

ArcaScience's methodology is grounded in peer-reviewed research and validated through academic collaboration and regulatory engagement. Our team publishes regularly in leading pharmacovigilance and regulatory science journals, contributing to the broader advancement of quantitative benefit-risk methodology.

Drug Safety, 2025

Automated Quantitative Benefit-Risk Assessment Using Domain-Specific AI Models: A Validation Study Across 12 Therapeutic Areas

Dupont C, Tessier M, et al. Demonstrates 92.4% sensitivity for confirmed safety signals using ArcaScience's ensemble approach across oncology, immunology, neurology, cardiology, and 8 additional therapeutic areas.

Read abstract

Pharmacoepidemiology & Drug Safety, 2024

AI-Augmented Signal Detection in Spontaneous Reporting Databases: Comparative Performance of BCPNN, MGPS, and Ensemble Machine Learning

Tessier M, Dupont C, et al. Head-to-head comparison showing 40% reduction in false-positive rates when AI-augmented signal prioritization is applied alongside traditional disproportionality analysis.

Read abstract

Clinical Pharmacology & Therapeutics, 2024

Natural Language Processing for Pharmacovigilance Case Narrative Analysis: A Multi-Language Validation Study

Tessier M, et al. Validation of NLP models for adverse event extraction across English, French, German, Japanese, and Chinese regulatory documents, achieving 96.2% MedDRA coding accuracy at Preferred Term level.

Read abstract

Therapeutic Innovation & Regulatory Science, 2023

Structured Benefit-Risk Communication for Regulatory Submissions: Aligning BRAT and MCDA Frameworks with FDA and EMA Expectations

Dupont C, Clement R, et al. Practical guidance on operationalizing BRAT and MCDA frameworks within automated platforms, with case studies from 15 regulatory submissions across 6 therapeutic areas.

Read abstract

Frontiers in Pharmacology, 2023

Multi-Source Data Integration for Real-Time Benefit-Risk Monitoring: Architecture and Validation of the AS Profiling Base

ArcaScience Research Team. Describes the technical architecture and validation methodology for the AS Profiling Base 100B dataset, including harmonization pipeline performance and data quality metrics.

Read abstract

Value in Health, 2024

Integrating NNT/NNH Analysis with MCDA for Transparent Benefit-Risk Communication in HTA Submissions

Dupont C, et al. Demonstrates how combining NNT/NNH metrics with multi-criteria decision analysis creates more transparent and clinically interpretable benefit-risk profiles for health technology assessment bodies.

Read abstract

Conference Presentations

"Quantitative Benefit-Risk Analysis at Scale: Lessons from 50+ Regulatory Submissions." International Society for Pharmacoepidemiology (ISPE) Annual Meeting, 2025. Oral presentation.
"Domain-Specific AI for Pharmacovigilance Signal Detection: Moving Beyond Disproportionality." Drug Information Association (DIA) Annual Meeting, 2025. Symposium presentation.
"Automated PBRER Generation with Integrated Benefit-Risk Evaluation: A Platform Validation Study." European Medicines Agency Pharmacovigilance Stakeholder Forum, 2024. Poster presentation.
"NLP-Based Adverse Event Extraction from Multilingual Regulatory Documents: Performance Benchmarking." Society for Clinical Data Management (SCDM) Annual Conference, 2024. Oral presentation.
"Multi-Criteria Decision Analysis in Regulatory Benefit-Risk Assessment: Practical Implementation and Stakeholder Engagement." CIOMS Seminar on Benefit-Risk Assessment, 2023. Invited lecture.

View all publications and presentations →

Scientific Advisory Board

ArcaScience's Scientific Advisory Board brings together world-class expertise from academia, regulatory agencies, and pharmaceutical R&D. Our SAB members have collectively served on FDA advisory committees, EMA scientific committees, WHO working groups, and have authored foundational guidance documents on benefit-risk methodology.

The SAB plays an active role in shaping the platform's scientific direction. Every major methodological decision — from the implementation of a new BRA framework to the validation protocol for a new AI model — is reviewed by the SAB before production deployment. This ensures that every model, framework, and output meets the highest standards of regulatory science.

Academic Experts

Professors and researchers in pharmacoepidemiology, biostatistics, and regulatory science from leading universities

Former Regulators

Former senior officials from FDA, EMA, and PMDA with deep understanding of regulatory expectations and review processes

Pharma R&D Leaders

Current and former heads of pharmacovigilance, drug safety, and regulatory affairs at top-20 pharmaceutical companies

Meet Our Scientific Advisory Board →

See the Science in Action

Schedule a technical session with our pharmacoepidemiology team to see how ArcaScience's AI models, quantitative BRA frameworks, and integrated evidence base work in practice — or download our methodology whitepaper for a detailed technical overview.

Request a Demo Download Methodology Whitepaper →