Enrollment Analysis Tutorial¶

This tutorial covers enrollment validation, gap calculation, and censoring for survival analysis.

Overview¶

Enrollment analysis ensures patients have observable data during the study period and determines when patients are censored due to enrollment gaps.

from alx_heor.enrollment import (
    get_enrollment,
    calculate_enrollment_gaps,
    filter_continuous_enrollment,
    get_censor_dates,
    analyze_enrollment,
)

When to Use This Module¶

Integrated with Cohort

If you're using get_cohort() with EnrollmentCriteria, enrollment analysis is automatic. Use this module directly for:

Custom enrollment requirements
Post-hoc enrollment analysis
Survival analysis censoring

Step 1: Get Enrollment Data¶

from alx_heor.database import RedshiftConnection
from alx_heor.enrollment import get_enrollment

conn = RedshiftConnection().connect()

# Get enrollment for specific patients
df_enroll = get_enrollment(
    conn,
    source="iqvia",
    schema="iqvia_pharmetrics_2024q3",
    patient_ids=df_cohort["pat_id"].tolist(),
    start_year=2015,
    end_year=2024,
)

The result contains monthly enrollment indicators:

pat_id	month_id	enrolled
123456	202001	1
123456	202002	1
123456	202003	0
123456	202004	1

Step 2: Calculate Enrollment Gaps¶

Gaps are calculated relative to each patient's index date and study window.

from alx_heor.enrollment import calculate_enrollment_gaps

df_gaps = calculate_enrollment_gaps(
    df_enroll,
    df_cohort,  # Must have pat_id and index_date columns
    patient_id_col="pat_id",
    index_date_col="index_date",
    months_pre=6,   # 6 months before index
    months_post=12, # 12 months after index
)

Output columns:

Column	Description
`pat_id`	Patient identifier
`max_gap_months`	Largest consecutive gap in study window
`total_enrolled_months`	Total months with enrollment
`expected_months`	Expected months (6 + 12 = 18)

Step 3: Filter by Continuous Enrollment¶

from alx_heor.enrollment import filter_continuous_enrollment

# Keep patients with max gap <= 1 month
df_continuous = filter_continuous_enrollment(
    df_gaps,
    max_gap=1,  # Allow up to 1-month gaps
)

print(f"Continuous enrollment: {len(df_continuous)} patients")

Step 4: Calculate Censoring Dates¶

For survival analysis, patients are censored at:

First enrollment gap exceeding threshold
Study end date (whichever comes first)

from alx_heor.enrollment import get_censor_dates

df_censor = get_censor_dates(
    df_gaps,
    study_end="2024-03-31",
    max_gap_for_censor=3,  # Censor at gaps > 3 months
    patient_id_col="pat_id",
    index_date_col="index_date",
)

Output:

pat_id	censor_date	is_censored_by_gap
123456	2024-03-31	False
234567	2022-08-15	True

All-in-One Analysis¶

Use analyze_enrollment() for a streamlined workflow:

from alx_heor.enrollment import analyze_enrollment

result = analyze_enrollment(
    conn,
    df_cohort,
    source="iqvia",
    schema="iqvia_pharmetrics_2024q3",
    months_pre=6,
    months_post=12,
    max_gap=1,
    study_end="2024-03-31",
)

# Access results
df_enrolled = result.df_enrolled   # Patients with continuous enrollment
df_censor = result.df_censor       # Censoring dates
df_gaps = result.df_gaps           # Gap analysis

Survival Analysis Setup¶

Combine cohort and censoring for survival analysis:

# Merge cohort with censoring
df_survival = df_cohort.merge(
    df_censor[["pat_id", "censor_date", "is_censored_by_gap"]],
    on="pat_id",
)

# Calculate follow-up time
df_survival["follow_up_days"] = (
    pd.to_datetime(df_survival["censor_date"]) -
    pd.to_datetime(df_survival["index_date"])
).dt.days

# Event indicator (example: death or censored)
df_survival["event"] = 0  # Replace with actual event data

# Ready for survival analysis
print(df_survival[["pat_id", "index_date", "censor_date", "follow_up_days", "event"]].head())

pat_id	index_date	censor_date	follow_up_days	event
123456	2020-03-15	2024-03-31	1477	0
234567	2019-08-22	2022-08-15	1089	1

Understanding Gap Thresholds¶

max_gap	Meaning
0	Strictly continuous (no gaps allowed)
1	Allow 1-month gaps (typical for claims data)
2-3	More permissive, allows short lapses

Recommendation

Use max_gap=1 for standard HEOR analyses. Claims data often has 1-month gaps due to processing delays.

Common Patterns¶

Different Windows for Baseline/Follow-up¶

# Strict baseline, permissive follow-up
df_gaps_baseline = calculate_enrollment_gaps(
    df_enroll, df_cohort,
    months_pre=6, months_post=0,
)

df_gaps_followup = calculate_enrollment_gaps(
    df_enroll, df_cohort,
    months_pre=0, months_post=12,
)

# Filter baseline strictly
df_strict_baseline = filter_continuous_enrollment(df_gaps_baseline, max_gap=0)

# Merge patient lists
eligible_patients = set(df_strict_baseline["pat_id"])

Variable Follow-up¶

For studies without fixed follow-up requirements:

# Get all available follow-up per patient
df_censor = get_censor_dates(
    df_gaps,
    study_end="2024-03-31",
    max_gap_for_censor=3,
)

# Calculate available follow-up
df_cohort_with_followup = df_cohort.merge(df_censor, on="pat_id")
df_cohort_with_followup["available_followup"] = (
    df_cohort_with_followup["censor_date"] - df_cohort_with_followup["index_date"]
).dt.days

# Filter to minimum required
df_min_followup = df_cohort_with_followup[
    df_cohort_with_followup["available_followup"] >= 180  # At least 6 months
]

Next Steps¶

Medication Analysis Tutorial - Treatment patterns
API Reference: Enrollment - Complete documentation