Skip to content

Getting Started

This guide walks you through installing alx-heor and building your first patient cohort.

Installation

# Basic install
pip install -e .

# With development tools (testing + docs)
pip install -e ".[dev,docs]"

Environment Setup

Create a .env file in your project root with your database credentials:

# Redshift (IQVIA, Optum)
REDSHIFT_HOST=your-cluster.redshift.amazonaws.com
REDSHIFT_DATABASE=your_database
REDSHIFT_USER=your_user
REDSHIFT_PASSWORD=your_password

Your First Cohort

The primary use case for alx-heor is building patient cohorts with complex inclusion/exclusion criteria. Here's a complete example:

1. Connect to the Database

from alx_heor.database import RedshiftConnection

conn = RedshiftConnection().connect()

2. Define Your Cohort Criteria

from alx_heor.cohort import (
    get_cohort,
    CohortCriteria,
    DiagnosisCriteria,
    ProcedureCriteria,
    MedicationCriteria,
    EnrollmentCriteria,
)

criteria = CohortCriteria(
    # Primary diagnosis: gMG with 2+ claims, 30 days apart
    primary_diagnosis=DiagnosisCriteria(
        codes=["G700", "G7000", "G7001"],
        min_count=2,
        days_apart=30,
        label="gMG diagnosis",
    ),

    # Require C5 inhibitor treatment post-index
    required_medications=[
        MedicationCriteria(
            generic_names=["eculizumab", "ravulizumab"],
            window_start=0,  # On or after index date
            label="C5 inhibitor",
        ),
    ],

    # Exclude patients with prior thymectomy
    excluded_procedures=[
        ProcedureCriteria(
            codes=["60520", "60521", "60522"],
            window_end=-1,  # Before index date
            label="Prior thymectomy",
        ),
    ],

    # Continuous enrollment requirements
    enrollment=EnrollmentCriteria(
        months_before=6,
        months_after=12,
        max_gap_months=1,
    ),

    # Demographics
    min_age=18,
    valid_sex_only=True,
)

3. Build the Cohort

result = get_cohort(
    conn,
    source="iqvia",
    schema="iqvia_pharmetrics_2024q3",
    criteria=criteria,
    start_year=2015,
    end_year=2024,
)

4. Review Results

print(result.summary())

Output:

Attrition Table
============================================================
gMG diagnosis: 45,231
Adults (18+): 42,105 (-3,126, 93.1%)
Valid sex: 41,892 (-213, 99.5%)
C5 inhibitor: 8,234 (-33,658, 19.7%)
Prior thymectomy (excluded): 7,891 (-343, 95.8%)
Continuous enrollment: 6,891 (-1,000, 87.3%)
============================================================

5. Access the Data

df_cohort = result.df_cohort
print(df_cohort.head())
pat_id index_date der_yob der_sex age_at_index
123456 2020-03-15 1965 F 55
234567 2019-08-22 1978 M 41
df_claims = result.df_claims
df_censor = result.df_censor

What's in CohortResult?

Attribute Description
df_cohort Final cohort with demographics and index dates
df_claims All diagnosis claims for cohort patients
attrition Dict of step-by-step patient counts
df_enrollment Enrollment data (if enrollment criteria applied)
df_censor Censoring dates for survival analysis
df_payer Payer type classification

Next Steps

Troubleshooting

Connection Issues

conn = RedshiftConnection()
conn.connect()
tables = conn.get_tables(schema="your_schema")
print(f"Found {len(tables)} tables")

Missing Environment Variables

If credentials aren't loading, pass them explicitly:

conn = RedshiftConnection(
    host="your-cluster.redshift.amazonaws.com",
    database="your_database",
    user="your_user",
    password="your_password",
)