Enrollment Analysis Tutorial¶
This tutorial covers enrollment validation, gap calculation, and censoring for survival analysis.
Overview¶
Enrollment analysis ensures patients have observable data during the study period and determines when patients are censored due to enrollment gaps.
from alx_heor.enrollment import (
get_enrollment,
calculate_enrollment_gaps,
filter_continuous_enrollment,
get_censor_dates,
analyze_enrollment,
)
When to Use This Module¶
Integrated with Cohort
If you're using get_cohort() with EnrollmentCriteria, enrollment analysis is automatic. Use this module directly for:
- Custom enrollment requirements
- Post-hoc enrollment analysis
- Survival analysis censoring
Step 1: Get Enrollment Data¶
from alx_heor.database import RedshiftConnection
from alx_heor.enrollment import get_enrollment
conn = RedshiftConnection().connect()
# Get enrollment for specific patients
df_enroll = get_enrollment(
conn,
source="iqvia",
schema="iqvia_pharmetrics_2024q3",
patient_ids=df_cohort["pat_id"].tolist(),
start_year=2015,
end_year=2024,
)
The result contains monthly enrollment indicators:
| pat_id | month_id | enrolled |
|---|---|---|
| 123456 | 202001 | 1 |
| 123456 | 202002 | 1 |
| 123456 | 202003 | 0 |
| 123456 | 202004 | 1 |
Step 2: Calculate Enrollment Gaps¶
Gaps are calculated relative to each patient's index date and study window.
from alx_heor.enrollment import calculate_enrollment_gaps
df_gaps = calculate_enrollment_gaps(
df_enroll,
df_cohort, # Must have pat_id and index_date columns
patient_id_col="pat_id",
index_date_col="index_date",
months_pre=6, # 6 months before index
months_post=12, # 12 months after index
)
Output columns:
| Column | Description |
|---|---|
pat_id |
Patient identifier |
max_gap_months |
Largest consecutive gap in study window |
total_enrolled_months |
Total months with enrollment |
expected_months |
Expected months (6 + 12 = 18) |
Step 3: Filter by Continuous Enrollment¶
from alx_heor.enrollment import filter_continuous_enrollment
# Keep patients with max gap <= 1 month
df_continuous = filter_continuous_enrollment(
df_gaps,
max_gap=1, # Allow up to 1-month gaps
)
print(f"Continuous enrollment: {len(df_continuous)} patients")
Step 4: Calculate Censoring Dates¶
For survival analysis, patients are censored at:
- First enrollment gap exceeding threshold
- Study end date (whichever comes first)
from alx_heor.enrollment import get_censor_dates
df_censor = get_censor_dates(
df_gaps,
study_end="2024-03-31",
max_gap_for_censor=3, # Censor at gaps > 3 months
patient_id_col="pat_id",
index_date_col="index_date",
)
Output:
| pat_id | censor_date | is_censored_by_gap |
|---|---|---|
| 123456 | 2024-03-31 | False |
| 234567 | 2022-08-15 | True |
All-in-One Analysis¶
Use analyze_enrollment() for a streamlined workflow:
from alx_heor.enrollment import analyze_enrollment
result = analyze_enrollment(
conn,
df_cohort,
source="iqvia",
schema="iqvia_pharmetrics_2024q3",
months_pre=6,
months_post=12,
max_gap=1,
study_end="2024-03-31",
)
# Access results
df_enrolled = result.df_enrolled # Patients with continuous enrollment
df_censor = result.df_censor # Censoring dates
df_gaps = result.df_gaps # Gap analysis
Survival Analysis Setup¶
Combine cohort and censoring for survival analysis:
# Merge cohort with censoring
df_survival = df_cohort.merge(
df_censor[["pat_id", "censor_date", "is_censored_by_gap"]],
on="pat_id",
)
# Calculate follow-up time
df_survival["follow_up_days"] = (
pd.to_datetime(df_survival["censor_date"]) -
pd.to_datetime(df_survival["index_date"])
).dt.days
# Event indicator (example: death or censored)
df_survival["event"] = 0 # Replace with actual event data
# Ready for survival analysis
print(df_survival[["pat_id", "index_date", "censor_date", "follow_up_days", "event"]].head())
| pat_id | index_date | censor_date | follow_up_days | event |
|---|---|---|---|---|
| 123456 | 2020-03-15 | 2024-03-31 | 1477 | 0 |
| 234567 | 2019-08-22 | 2022-08-15 | 1089 | 1 |
Understanding Gap Thresholds¶
| max_gap | Meaning |
|---|---|
| 0 | Strictly continuous (no gaps allowed) |
| 1 | Allow 1-month gaps (typical for claims data) |
| 2-3 | More permissive, allows short lapses |
Recommendation
Use max_gap=1 for standard HEOR analyses. Claims data often has 1-month gaps due to processing delays.
Common Patterns¶
Different Windows for Baseline/Follow-up¶
# Strict baseline, permissive follow-up
df_gaps_baseline = calculate_enrollment_gaps(
df_enroll, df_cohort,
months_pre=6, months_post=0,
)
df_gaps_followup = calculate_enrollment_gaps(
df_enroll, df_cohort,
months_pre=0, months_post=12,
)
# Filter baseline strictly
df_strict_baseline = filter_continuous_enrollment(df_gaps_baseline, max_gap=0)
# Merge patient lists
eligible_patients = set(df_strict_baseline["pat_id"])
Variable Follow-up¶
For studies without fixed follow-up requirements:
# Get all available follow-up per patient
df_censor = get_censor_dates(
df_gaps,
study_end="2024-03-31",
max_gap_for_censor=3,
)
# Calculate available follow-up
df_cohort_with_followup = df_cohort.merge(df_censor, on="pat_id")
df_cohort_with_followup["available_followup"] = (
df_cohort_with_followup["censor_date"] - df_cohort_with_followup["index_date"]
).dt.days
# Filter to minimum required
df_min_followup = df_cohort_with_followup[
df_cohort_with_followup["available_followup"] >= 180 # At least 6 months
]
Next Steps¶
- Medication Analysis Tutorial - Treatment patterns
- API Reference: Enrollment - Complete documentation