Skip to content

Enrollment Analysis Tutorial

This tutorial covers enrollment validation, gap calculation, and censoring for survival analysis.

Overview

Enrollment analysis ensures patients have observable data during the study period and determines when patients are censored due to enrollment gaps.

from alx_heor.enrollment import (
    get_enrollment,
    calculate_enrollment_gaps,
    filter_continuous_enrollment,
    get_censor_dates,
    analyze_enrollment,
)

When to Use This Module

Integrated with Cohort

If you're using get_cohort() with EnrollmentCriteria, enrollment analysis is automatic. Use this module directly for:

  • Custom enrollment requirements
  • Post-hoc enrollment analysis
  • Survival analysis censoring

Step 1: Get Enrollment Data

from alx_heor.database import RedshiftConnection
from alx_heor.enrollment import get_enrollment

conn = RedshiftConnection().connect()

# Get enrollment for specific patients
df_enroll = get_enrollment(
    conn,
    source="iqvia",
    schema="iqvia_pharmetrics_2024q3",
    patient_ids=df_cohort["pat_id"].tolist(),
    start_year=2015,
    end_year=2024,
)

The result contains monthly enrollment indicators:

pat_id month_id enrolled
123456 202001 1
123456 202002 1
123456 202003 0
123456 202004 1

Step 2: Calculate Enrollment Gaps

Gaps are calculated relative to each patient's index date and study window.

from alx_heor.enrollment import calculate_enrollment_gaps

df_gaps = calculate_enrollment_gaps(
    df_enroll,
    df_cohort,  # Must have pat_id and index_date columns
    patient_id_col="pat_id",
    index_date_col="index_date",
    months_pre=6,   # 6 months before index
    months_post=12, # 12 months after index
)

Output columns:

Column Description
pat_id Patient identifier
max_gap_months Largest consecutive gap in study window
total_enrolled_months Total months with enrollment
expected_months Expected months (6 + 12 = 18)

Step 3: Filter by Continuous Enrollment

from alx_heor.enrollment import filter_continuous_enrollment

# Keep patients with max gap <= 1 month
df_continuous = filter_continuous_enrollment(
    df_gaps,
    max_gap=1,  # Allow up to 1-month gaps
)

print(f"Continuous enrollment: {len(df_continuous)} patients")

Step 4: Calculate Censoring Dates

For survival analysis, patients are censored at:

  1. First enrollment gap exceeding threshold
  2. Study end date (whichever comes first)
from alx_heor.enrollment import get_censor_dates

df_censor = get_censor_dates(
    df_gaps,
    study_end="2024-03-31",
    max_gap_for_censor=3,  # Censor at gaps > 3 months
    patient_id_col="pat_id",
    index_date_col="index_date",
)

Output:

pat_id censor_date is_censored_by_gap
123456 2024-03-31 False
234567 2022-08-15 True

All-in-One Analysis

Use analyze_enrollment() for a streamlined workflow:

from alx_heor.enrollment import analyze_enrollment

result = analyze_enrollment(
    conn,
    df_cohort,
    source="iqvia",
    schema="iqvia_pharmetrics_2024q3",
    months_pre=6,
    months_post=12,
    max_gap=1,
    study_end="2024-03-31",
)

# Access results
df_enrolled = result.df_enrolled   # Patients with continuous enrollment
df_censor = result.df_censor       # Censoring dates
df_gaps = result.df_gaps           # Gap analysis

Survival Analysis Setup

Combine cohort and censoring for survival analysis:

# Merge cohort with censoring
df_survival = df_cohort.merge(
    df_censor[["pat_id", "censor_date", "is_censored_by_gap"]],
    on="pat_id",
)

# Calculate follow-up time
df_survival["follow_up_days"] = (
    pd.to_datetime(df_survival["censor_date"]) -
    pd.to_datetime(df_survival["index_date"])
).dt.days

# Event indicator (example: death or censored)
df_survival["event"] = 0  # Replace with actual event data

# Ready for survival analysis
print(df_survival[["pat_id", "index_date", "censor_date", "follow_up_days", "event"]].head())
pat_id index_date censor_date follow_up_days event
123456 2020-03-15 2024-03-31 1477 0
234567 2019-08-22 2022-08-15 1089 1

Understanding Gap Thresholds

max_gap Meaning
0 Strictly continuous (no gaps allowed)
1 Allow 1-month gaps (typical for claims data)
2-3 More permissive, allows short lapses

Recommendation

Use max_gap=1 for standard HEOR analyses. Claims data often has 1-month gaps due to processing delays.

Common Patterns

Different Windows for Baseline/Follow-up

# Strict baseline, permissive follow-up
df_gaps_baseline = calculate_enrollment_gaps(
    df_enroll, df_cohort,
    months_pre=6, months_post=0,
)

df_gaps_followup = calculate_enrollment_gaps(
    df_enroll, df_cohort,
    months_pre=0, months_post=12,
)

# Filter baseline strictly
df_strict_baseline = filter_continuous_enrollment(df_gaps_baseline, max_gap=0)

# Merge patient lists
eligible_patients = set(df_strict_baseline["pat_id"])

Variable Follow-up

For studies without fixed follow-up requirements:

# Get all available follow-up per patient
df_censor = get_censor_dates(
    df_gaps,
    study_end="2024-03-31",
    max_gap_for_censor=3,
)

# Calculate available follow-up
df_cohort_with_followup = df_cohort.merge(df_censor, on="pat_id")
df_cohort_with_followup["available_followup"] = (
    df_cohort_with_followup["censor_date"] - df_cohort_with_followup["index_date"]
).dt.days

# Filter to minimum required
df_min_followup = df_cohort_with_followup[
    df_cohort_with_followup["available_followup"] >= 180  # At least 6 months
]

Next Steps