Getting Started¶
This guide walks you through installing alx-heor and building your first patient cohort.
Installation¶
# Basic install
pip install -e .
# With development tools (testing + docs)
pip install -e ".[dev,docs]"
Environment Setup¶
Create a .env file in your project root with your database credentials:
# Redshift (IQVIA, Optum)
REDSHIFT_HOST=your-cluster.redshift.amazonaws.com
REDSHIFT_DATABASE=your_database
REDSHIFT_USER=your_user
REDSHIFT_PASSWORD=your_password
Your First Cohort¶
The primary use case for alx-heor is building patient cohorts with complex inclusion/exclusion criteria. Here's a complete example:
1. Connect to the Database¶
2. Define Your Cohort Criteria¶
from alx_heor.cohort import (
get_cohort,
CohortCriteria,
DiagnosisCriteria,
ProcedureCriteria,
MedicationCriteria,
EnrollmentCriteria,
)
criteria = CohortCriteria(
# Primary diagnosis: gMG with 2+ claims, 30 days apart
primary_diagnosis=DiagnosisCriteria(
codes=["G700", "G7000", "G7001"],
min_count=2,
days_apart=30,
label="gMG diagnosis",
),
# Require C5 inhibitor treatment post-index
required_medications=[
MedicationCriteria(
generic_names=["eculizumab", "ravulizumab"],
window_start=0, # On or after index date
label="C5 inhibitor",
),
],
# Exclude patients with prior thymectomy
excluded_procedures=[
ProcedureCriteria(
codes=["60520", "60521", "60522"],
window_end=-1, # Before index date
label="Prior thymectomy",
),
],
# Continuous enrollment requirements
enrollment=EnrollmentCriteria(
months_before=6,
months_after=12,
max_gap_months=1,
),
# Demographics
min_age=18,
valid_sex_only=True,
)
3. Build the Cohort¶
result = get_cohort(
conn,
source="iqvia",
schema="iqvia_pharmetrics_2024q3",
criteria=criteria,
start_year=2015,
end_year=2024,
)
4. Review Results¶
Output:
Attrition Table
============================================================
gMG diagnosis: 45,231
Adults (18+): 42,105 (-3,126, 93.1%)
Valid sex: 41,892 (-213, 99.5%)
C5 inhibitor: 8,234 (-33,658, 19.7%)
Prior thymectomy (excluded): 7,891 (-343, 95.8%)
Continuous enrollment: 6,891 (-1,000, 87.3%)
============================================================
5. Access the Data¶
| pat_id | index_date | der_yob | der_sex | age_at_index |
|---|---|---|---|---|
| 123456 | 2020-03-15 | 1965 | F | 55 |
| 234567 | 2019-08-22 | 1978 | M | 41 |
What's in CohortResult?¶
| Attribute | Description |
|---|---|
df_cohort |
Final cohort with demographics and index dates |
df_claims |
All diagnosis claims for cohort patients |
attrition |
Dict of step-by-step patient counts |
df_enrollment |
Enrollment data (if enrollment criteria applied) |
df_censor |
Censoring dates for survival analysis |
df_payer |
Payer type classification |
Next Steps¶
- Cohort Building Tutorial - Deep dive into criteria options
- Enrollment Analysis - Continuous enrollment and censoring
- Medication Analysis - Treatment patterns and adherence
- Data Sources Reference - Column mappings by data source
Troubleshooting¶
Connection Issues¶
conn = RedshiftConnection()
conn.connect()
tables = conn.get_tables(schema="your_schema")
print(f"Found {len(tables)} tables")
Missing Environment Variables¶
If credentials aren't loading, pass them explicitly: