Cohort¶
The cohort module is the primary entry point for building patient cohorts. It provides a declarative way to specify complex inclusion/exclusion criteria and returns structured results with attrition tracking.
When to Use¶
Use the cohort module when you need to:
- Identify patients matching diagnosis, procedure, or medication criteria
- Apply demographic filters (age, sex)
- Enforce continuous enrollment requirements
- Track attrition at each selection step
- Get a single cohort DataFrame ready for downstream analysis
Quick Example¶
from alx_heor.database import RedshiftConnection
from alx_heor.cohort import (
get_cohort, CohortCriteria, DiagnosisCriteria, EnrollmentCriteria
)
conn = RedshiftConnection().connect()
result = get_cohort(
conn,
source="iqvia",
schema="iqvia_pharmetrics_2024q3",
criteria=CohortCriteria(
primary_diagnosis=DiagnosisCriteria(
codes=["G700", "G7000", "G7001"],
min_count=2,
days_apart=30,
),
enrollment=EnrollmentCriteria(months_before=6, months_after=12),
min_age=18,
),
start_year=2015,
end_year=2024,
)
print(result.summary())
df_cohort = result.df_cohort
Common Patterns¶
Diagnosis with Time Windows¶
# Baseline malignancy (to exclude)
DiagnosisCriteria(
codes=["C00", "C01", "C02"], # Cancer codes
window_start=-365, # 1 year before index
window_end=0, # Up to index
label="Malignancy in baseline",
)
Treatment-Naive Patients¶
# Exclude patients with prior biologic use
CohortCriteria(
primary_diagnosis=...,
excluded_medications=[
MedicationCriteria(
generic_names=["rituximab", "eculizumab"],
window_end=-1, # Before index date
label="Prior biologic",
),
],
)
Multiple Required Criteria¶
CohortCriteria(
primary_diagnosis=DiagnosisCriteria(codes=["G700"]),
required_diagnoses=[
DiagnosisCriteria(codes=["G73.1"], label="MG crisis"),
],
required_medications=[
MedicationCriteria(generic_names=["pyridostigmine"]),
],
)
Related Modules¶
claims- Lower-level claims data accessenrollment- Enrollment gap analysismedications- Medication lookup tables
cohort ¶
Cohort identification with comprehensive inclusion/exclusion criteria.
This module provides a unified, high-level interface for identifying patient cohorts in retrospective healthcare database studies. Cohort identification is the foundation of any Real-World Evidence (RWE) study - it defines the patient population being studied based on clinical criteria.
Key Concepts:
Inclusion criteria: Conditions that MUST be met to enter the cohort (e.g., diagnosis of gMG, age ≥18, continuous enrollment).
Exclusion criteria: Conditions that REMOVE patients from the cohort (e.g., malignancy in baseline, pregnancy, prior use of study drug).
Index date: The anchor point for each patient's study timeline. Usually the first (or second) qualifying diagnosis date.
Baseline period: Time before index date (e.g., 6 months) used to assess patient characteristics and exclusion criteria.
Follow-up period: Time after index date for outcome assessment.
Attrition table: Tracking how many patients are lost at each selection step (essential for study transparency and reproducibility).
Supported Criteria Types:
- Diagnosis-based: ICD-9/ICD-10 codes with count and time requirements
- Procedure-based: CPT/HCPCS codes for surgical/medical procedures
- Medication-based: NDC codes, J-codes, or generic drug names
- Demographic: Age at index, sex (M/F)
- Provider specialty: Exclude diagnoses from certain specialties
- Enrollment: Continuous enrollment requirements pre/post index
Why Use This Module?
The get_cohort() function automates the entire cohort identification workflow
that would otherwise require multiple manual steps:
- Query claims → 2. Apply diagnosis criteria → 3. Calculate index dates →
- Add demographics → 5. Apply exclusions → 6. Check enrollment → 7. Track attrition
Each step is tracked in an attrition table, providing full transparency.
Example
Build a gMG cohort with standard RWE criteria:
from alx_heor import RedshiftConnection from alx_heor.cohort import get_cohort, CohortCriteria, DiagnosisCriteria
conn = RedshiftConnection().connect() criteria = CohortCriteria( ... primary_diagnosis=DiagnosisCriteria( ... codes=["G700", "G7000", "G7001"], # gMG ICD-10 codes ... min_count=2, # Require 2+ diagnoses ... days_apart=30, # At least 30 days apart ... label="gMG ≥2 Dx, 30 days apart", ... ), ... min_age=18, ... exclude_specialties=["OPHTHAL", "OPTOMTRY"], # Exclude ocular-only MG ... ) result = get_cohort( ... conn, ... source="iqvia", ... schema="iqvia_pharmetrics_2024q3", ... criteria=criteria, ... start_year=2015, ... end_year=2024, ... ) print(result.summary()) Attrition Table ============================================================ ≥1 diagnosis claim: 89,123 gMG ≥2 Dx, 30 days apart: 45,678 (-43,445, 51.3%) Age ≥18: 42,103 (-3,575, 92.2%) Valid sex (M/F): 41,892 (-211, 99.5%) Non-excluded specialty: 38,456 (-3,436, 91.8%)
See Also
claims.get_claims : Lower-level function for querying claims claims.get_index_dates : Lower-level function for index dates enrollment.analyze_enrollment : Detailed enrollment analysis medications.lookup_medications : Medication lookup utilities
Notes
- For studies spanning Oct 2015, include BOTH ICD-9 and ICD-10 codes
- The "2+ Dx 30 days apart" criterion is standard to reduce false positives
- Provider specialty exclusion addresses ocular MG misclassification
- Always verify attrition percentages against protocol expectations
DiagnosisCriteria
dataclass
¶
Diagnosis-based inclusion or exclusion criteria.
This dataclass defines criteria for selecting patients based on ICD diagnosis codes. It supports sophisticated requirements common in RWE studies, such as requiring multiple diagnoses over time (to reduce false positives from rule-out testing) and time-windowed criteria (e.g., checking for malignancy in baseline period only).
The "2+ diagnoses 30 days apart" pattern is the industry standard for reducing false positives. A single diagnosis may represent rule-out testing, while repeated diagnoses over time indicate a confirmed condition.
Use Cases:
Primary inclusion: Define the condition of interest DiagnosisCriteria(codes=["G700"], min_count=2, days_apart=30)
Baseline exclusion: Exclude patients with certain conditions before index DiagnosisCriteria(codes=["C00-C96"], window_start=-365, window_end=-1)
Follow-up requirement: Require certain events after index DiagnosisCriteria(codes=["F32"], window_start=0, window_end=365)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
codes
|
list[str]
|
ICD-9 or ICD-10 diagnosis codes to match. Include BOTH versions for studies spanning Oct 2015 (US ICD-10 transition). Examples: ['G700', 'G7000', 'G7001'] for gMG, ['G35'] for MS. |
required |
min_count
|
int
|
Minimum number of diagnosis occurrences required: - 1: Any single diagnosis (more sensitive, more false positives) - 2: Standard RWE criterion (fewer false positives) - 3+: Very restrictive (use for high-frequency conditions) |
1
|
days_apart
|
int
|
Minimum days between first and last diagnosis when min_count > 1. Common values: 0 (any 2+ dx), 30 (standard), 60 (restrictive). |
0
|
window_start
|
int
|
Days relative to index date for window start. Negative = before index. Example: -365 means "up to 1 year before index". None means no lower bound (all-time). |
None
|
window_end
|
int
|
Days relative to index date for window end. Negative = before index. Example: -1 means "up to 1 day before index" (baseline only). None means no upper bound (all-time). |
None
|
diagnosis_position
|
str
|
Which diagnosis positions to check: - "any": All diagnosis fields (diag1-12, diag_admit) - "primary": Only primary diagnosis (diag1) - "admit": Only admitting diagnosis (diag_admit) |
"any"
|
require_inpatient
|
bool
|
If True, only count diagnoses from inpatient encounters. Useful for more specific criteria. |
False
|
require_outpatient
|
bool
|
If True, only count diagnoses from outpatient encounters. |
False
|
label
|
str
|
Human-readable label for the attrition table. Example: "gMG ≥2 Dx, 30 days apart" or "Malignancy in baseline". |
""
|
See Also
CohortCriteria : Container that holds DiagnosisCriteria objects ProcedureCriteria : Similar criteria for procedures MedicationCriteria : Similar criteria for medications
Notes
- Codes are matched EXACTLY - 'G70' will NOT match 'G700'
- Use window_start/window_end for baseline/follow-up criteria
- Always provide a descriptive label for clear attrition reporting
Examples:
Primary inclusion - gMG with 2+ diagnoses 30 days apart:
>>> primary = DiagnosisCriteria(
... codes=["G700", "G7000", "G7001"],
... min_count=2,
... days_apart=30,
... label="gMG ≥2 Dx, 30 days apart",
... )
Baseline exclusion - malignancy in year before index:
>>> exclude_cancer = DiagnosisCriteria(
... codes=["C00", "C01", "C02"], # Add all cancer codes
... window_start=-365, # 1 year before
... window_end=-1, # Up to day before index
... label="Malignancy in baseline",
... )
Baseline exclusion - pregnancy:
>>> exclude_pregnancy = DiagnosisCriteria(
... codes=["O00", "O26", "Z33", "Z34"],
... window_start=-270, # ~9 months
... window_end=0,
... label="Pregnancy",
... )
Post-index requirement - depression diagnosis within 1 year:
>>> require_depression = DiagnosisCriteria(
... codes=["F32", "F33"],
... window_start=0,
... window_end=365,
... label="Depression post-index",
... )
Source code in alx_heor\cohort\__init__.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 | |
ProcedureCriteria
dataclass
¶
Procedure-based inclusion or exclusion criteria.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
codes
|
list[str]
|
CPT/HCPCS procedure codes to match. |
required |
min_count
|
int
|
Minimum number of procedure occurrences required. |
1
|
window_start
|
int
|
Days relative to index date for window start. |
None
|
window_end
|
int
|
Days relative to index date for window end. |
None
|
label
|
str
|
Human-readable label for attrition reporting. |
""
|
Source code in alx_heor\cohort\__init__.py
MedicationCriteria
dataclass
¶
Medication-based inclusion or exclusion criteria.
Can specify medications by generic name, NDC code, or procedure code (J-codes). At least one of generic_names, ndc_codes, or procedure_codes must be provided.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
generic_names
|
list[str]
|
Generic drug names to match (case-insensitive). E.g., ['eculizumab', 'ravulizumab']. |
None
|
ndc_codes
|
list[str]
|
NDC codes to match. |
None
|
procedure_codes
|
list[str]
|
HCPCS/J-codes to match (e.g., ['J1300', 'J1303']). |
None
|
min_count
|
int
|
Minimum number of medication claims required. |
1
|
window_start
|
int
|
Days relative to index date for window start. |
None
|
window_end
|
int
|
Days relative to index date for window end. |
None
|
label
|
str
|
Human-readable label for attrition reporting. |
""
|
Examples:
Any C5 inhibitor post-index:
>>> MedicationCriteria(
... generic_names=["eculizumab", "ravulizumab"],
... window_start=0, # On or after index
... label="C5 inhibitor post-index",
... )
Treatment-naive (exclude prior biologics):
>>> MedicationCriteria(
... generic_names=["eculizumab", "ravulizumab", "rituximab"],
... window_start=-365,
... window_end=-1, # Up to day before index
... label="Prior biologic use",
... )
Source code in alx_heor\cohort\__init__.py
EnrollmentCriteria
dataclass
¶
Continuous enrollment requirements.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
months_before
|
int
|
Required months of continuous enrollment before index date. |
0
|
months_after
|
int
|
Required months of continuous enrollment after index date. |
0
|
max_gap_months
|
int
|
Maximum allowed gap in enrollment (in months). |
1
|
label
|
str
|
Human-readable label for attrition reporting. |
""
|
Examples:
6 months baseline, 12 months follow-up:
Source code in alx_heor\cohort\__init__.py
CohortCriteria
dataclass
¶
Complete specification of cohort inclusion and exclusion criteria.
This dataclass is the "study protocol in code" - it defines all the criteria that determine which patients enter your cohort. By specifying criteria declaratively, you get reproducible cohort definitions that can be version controlled, shared, and audited.
The criteria are applied in a specific order: 1. Primary diagnosis (identifies initial population) 2. Additional required diagnoses 3. Required procedures 4. Required medications 5. Excluded diagnoses (removes patients) 6. Excluded procedures 7. Excluded medications 8. Age filter 9. Sex filter 10. Provider specialty filter 11. Continuous enrollment
Clinical Considerations
Why exclude specialties? Some conditions like Myasthenia Gravis (MG) have subtypes (ocular MG vs generalized MG). Diagnoses from ophthalmology/optometry may represent ocular-only MG, which has different treatment patterns.
Why require 2+ diagnoses? A single diagnosis may be rule-out testing. The patient presents with symptoms, gets tested, but doesn't have the condition. Requiring 2+ diagnoses separated by time increases diagnostic confidence.
Why check enrollment? Patients must be observable for the study period. A patient who drops enrollment can't be followed for outcomes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
primary_diagnosis
|
DiagnosisCriteria
|
Primary diagnosis criteria for cohort identification (required). This defines the target condition. |
required |
required_diagnoses
|
list[DiagnosisCriteria]
|
Additional diagnosis criteria that must be met. Patients must have ALL of these in addition to the primary diagnosis. |
[]
|
required_procedures
|
list[ProcedureCriteria]
|
Procedure criteria that must be met (e.g., require thymectomy). |
[]
|
required_medications
|
list[MedicationCriteria]
|
Medication criteria that must be met (e.g., require C5 inhibitor). |
[]
|
excluded_diagnoses
|
list[DiagnosisCriteria]
|
Diagnosis criteria for exclusion (e.g., malignancy, pregnancy). Patients meeting ANY of these are removed. |
[]
|
excluded_procedures
|
list[ProcedureCriteria]
|
Procedure criteria for exclusion. |
[]
|
excluded_medications
|
list[MedicationCriteria]
|
Medication criteria for exclusion (e.g., prior biologic use). |
[]
|
min_age
|
int
|
Minimum age at index date. None to skip age filter. 18 is standard for adult-only studies. |
18
|
max_age
|
int
|
Maximum age at index date. None means no upper limit. |
None
|
valid_sex_only
|
bool
|
If True, exclude patients with unknown/missing sex ('U'). Set False for sensitivity analyses. |
True
|
exclude_specialties
|
list[str]
|
Provider specialties to exclude from index diagnosis. Common: ['OPHTHAL', 'OPTOMTRY'] for gMG studies. |
None
|
require_specialty_confirmation
|
bool
|
If True, require at least one diagnosis from a non-excluded specialty. |
False
|
enrollment
|
EnrollmentCriteria
|
Continuous enrollment requirements (baseline + follow-up months). |
None
|
index_date_method
|
str
|
How to determine index date: - "first_dx": First diagnosis date (most common) - "second_dx": Second diagnosis date (useful for 2+ Dx criterion) - "first_rx": First qualifying medication date (for treatment studies) |
"first_dx"
|
See Also
get_cohort : Function that applies these criteria to build a cohort DiagnosisCriteria : Detailed diagnosis criteria specification EnrollmentCriteria : Continuous enrollment requirements
Notes
- Criteria are applied sequentially, with attrition tracked at each step
- Use descriptive labels in each criterion for clear attrition tables
- Test with smaller date ranges first to verify criteria before full run
Examples:
Basic gMG cohort (adults, 2+ Dx, exclude ocular specialists):
>>> criteria = CohortCriteria(
... primary_diagnosis=DiagnosisCriteria(
... codes=["G700", "G7000", "G7001"],
... min_count=2,
... days_apart=30,
... label="gMG ≥2 Dx, 30 days apart",
... ),
... min_age=18,
... exclude_specialties=["OPHTHAL", "OPTOMTRY"],
... )
Treatment-naive cohort with enrollment requirements:
>>> criteria = CohortCriteria(
... primary_diagnosis=DiagnosisCriteria(
... codes=["G700", "G7000", "G7001"],
... min_count=2,
... days_apart=30,
... label="gMG ≥2 Dx",
... ),
... excluded_medications=[
... MedicationCriteria(
... generic_names=["eculizumab", "ravulizumab"],
... window_start=-365,
... window_end=-1,
... label="Prior C5 inhibitor",
... ),
... ],
... enrollment=EnrollmentCriteria(
... months_before=6,
... months_after=12,
... label="6m baseline + 12m follow-up",
... ),
... min_age=18,
... )
Complex criteria with multiple exclusions:
>>> criteria = CohortCriteria(
... primary_diagnosis=DiagnosisCriteria(codes=["G700"], min_count=2, days_apart=30),
... excluded_diagnoses=[
... DiagnosisCriteria(codes=["C00-C96"], window_start=-365, window_end=-1, label="Malignancy"),
... DiagnosisCriteria(codes=["O00-O99", "Z33"], window_start=-270, window_end=0, label="Pregnancy"),
... DiagnosisCriteria(codes=["N18.5", "N18.6"], label="ESRD"),
... ],
... min_age=18,
... max_age=89, # Cap for data quality
... )
Source code in alx_heor\cohort\__init__.py
350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 | |
CohortResult
dataclass
¶
Results from cohort identification with attrition tracking.
Attributes:
| Name | Type | Description |
|---|---|---|
df_cohort |
DataFrame
|
Final cohort with patient demographics and index dates. |
df_claims |
DataFrame
|
All diagnosis claims for the cohort (for downstream analysis). |
attrition |
dict[str, int]
|
Patient counts at each step of the selection process. |
criteria |
CohortCriteria
|
The criteria used to generate this cohort. |
df_enrollment |
DataFrame
|
Enrollment data (if enrollment criteria was applied). |
df_censor |
DataFrame
|
Censoring dates (if enrollment criteria was applied). |
df_payer |
DataFrame
|
Payer type classification (if enrollment criteria was applied). |
Source code in alx_heor\cohort\__init__.py
522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 | |
summary ¶
Generate attrition table as formatted string.
Returns:
| Type | Description |
|---|---|
str
|
Formatted attrition table showing patient counts and percentage retained at each step. |
Source code in alx_heor\cohort\__init__.py
get_cohort ¶
get_cohort(conn: RedshiftConnection, source: str, schema: str, criteria: CohortCriteria, start_year: int, end_year: int, study_start: str | None = None, study_end: str | None = None, include_claims: bool = True) -> CohortResult
Identify a patient cohort with comprehensive inclusion/exclusion criteria.
This is the primary high-level function for cohort identification in RWE studies. It automates the entire workflow of querying claims, applying inclusion/exclusion criteria, calculating index dates, filtering by demographics, checking enrollment, and tracking attrition at each step.
The function applies criteria in a deterministic order, allowing you to specify
your study protocol once and reproduce results consistently. The returned
CohortResult includes an attrition table showing how many patients were
excluded at each step - essential for study transparency.
Workflow (Automated by this Function)
- Query claims matching primary diagnosis codes
- Apply min_count and days_apart criteria
- Calculate index dates
- Add demographics (age, sex)
- Apply required diagnosis/procedure/medication criteria
- Apply excluded diagnosis/procedure/medication criteria
- Filter by age and sex
- Apply provider specialty filter
- Check continuous enrollment requirements
- Generate attrition table
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
conn
|
RedshiftConnection
|
Active database connection. Must be connected before calling. |
required |
source
|
str
|
Data source name: 'iqvia', 'optum', 'komodo'. Determines column mappings and table patterns via config. |
required |
schema
|
str
|
Database schema (e.g., 'iqvia_pharmetrics_2024q3').
Use |
required |
criteria
|
CohortCriteria
|
Complete specification of inclusion and exclusion criteria. See CohortCriteria documentation for all available options. |
required |
start_year
|
int
|
First year of claims data to query (e.g., 2015). |
required |
end_year
|
int
|
Last year of claims data to query (e.g., 2024). |
required |
study_start
|
str
|
Study period start date (e.g., '2015-01-01'). If provided, excludes claims before this date. Useful for aligning with protocol dates. |
None
|
study_end
|
str
|
Study period end date (e.g., '2024-03-31'). Used for censoring and excluding claims after this date. |
None
|
include_claims
|
bool
|
If True, include full claims data in result for downstream analysis. Set False to save memory when only cohort demographics are needed. |
True
|
Returns:
| Type | Description |
|---|---|
CohortResult
|
Object containing: - df_cohort: Final filtered cohort (one row per patient) - df_claims: All diagnosis claims for cohort patients - attrition: Dict tracking patient counts at each step - criteria: The CohortCriteria used (for reproducibility) - df_enrollment: Enrollment data (if enrollment criteria applied) - df_censor: Censoring dates (if enrollment criteria applied) - df_payer: Payer classification (if enrollment criteria applied) |
See Also
CohortCriteria : Specification of all inclusion/exclusion criteria DiagnosisCriteria : Diagnosis-based criteria EnrollmentCriteria : Continuous enrollment requirements claims.get_claims : Lower-level function if you need custom queries enrollment.analyze_enrollment : Detailed enrollment analysis
Notes
- Execution time varies by cohort size (rare diseases: minutes, common: 30+ min)
- Memory usage can be high for large cohorts (use include_claims=False if needed)
- Always check attrition percentages against protocol expectations
- The function uses gc.collect() internally to manage memory
- For debugging, try with start_year=end_year first to reduce data volume
Examples:
Basic gMG cohort with standard RWE criteria:
>>> from alx_heor.cohort import get_cohort, CohortCriteria, DiagnosisCriteria
>>>
>>> criteria = CohortCriteria(
... primary_diagnosis=DiagnosisCriteria(
... codes=["G700", "G7000", "G7001"],
... min_count=2,
... days_apart=30,
... label="gMG ≥2 Dx, 30 days apart",
... ),
... min_age=18,
... exclude_specialties=["OPHTHAL", "OPTOMTRY"],
... )
>>>
>>> result = get_cohort(
... conn,
... source="iqvia",
... schema="iqvia_pharmetrics_2024q3",
... criteria=criteria,
... start_year=2015,
... end_year=2024,
... )
>>>
>>> print(result.summary())
Attrition Table
============================================================
≥1 diagnosis claim: 89,123
gMG ≥2 Dx, 30 days apart: 45,678 (-43,445, 51.3%)
Age ≥18: 42,103 (-3,575, 92.2%)
...
>>>
>>> # Access the final cohort
>>> df_cohort = result.df_cohort
>>> print(f"Final cohort: {len(df_cohort):,} patients")
Cohort with enrollment requirements and exclusions:
>>> criteria = CohortCriteria(
... primary_diagnosis=DiagnosisCriteria(
... codes=["G700", "G7000", "G7001"],
... min_count=2,
... days_apart=30,
... ),
... excluded_diagnoses=[
... DiagnosisCriteria(
... codes=["C00", "C01", "C02"], # Malignancy codes
... window_start=-365,
... window_end=0,
... label="Malignancy in baseline",
... ),
... ],
... excluded_medications=[
... MedicationCriteria(
... generic_names=["eculizumab", "ravulizumab"],
... window_start=-365,
... window_end=-1,
... label="Prior C5 inhibitor",
... ),
... ],
... enrollment=EnrollmentCriteria(
... months_before=6,
... months_after=12,
... ),
... min_age=18,
... )
>>>
>>> result = get_cohort(conn, source="iqvia", ...)
Quick test with limited data (for debugging):
>>> # Test with one year first
>>> result_test = get_cohort(
... conn, source="iqvia", schema="iqvia_pharmetrics_2024q3",
... criteria=criteria, start_year=2024, end_year=2024, # Single year
... )
>>> print(f"Test cohort: {len(result_test.df_cohort)} patients")
Source code in alx_heor\cohort\__init__.py
608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 | |