Scoring Pipeline

Overview

Every leaderboard request passes through three phases. Phases are skipped when cached data is fresh enough.

Phase 1 — Ingest (`runIngest`)

Entry point: runIngest(ctx) in lib/leaderboard/ingest.ts


ctx = { octokit, organization, organizationId, installationId, filters }

Steps

Acquire Redis lock — key leaderboard:lock:{stableIngestScopeId}. If locked, skip; another ingest is already running for this scope.
Fetch — fetchOrgScoringDataGraphQL(octokit, org, { since, repos, forceRefresh }). GraphQL, paginated. Per-repo limits apply (see Reference → Ingest limits).
Normalize — normalizeGitHubData(rawData) → Signal[]. Each signal gets a SHA-256 content hash (first 32 chars) derived from event_type + user + repo + timestamp + identifier for deduplication.
Store — upsertSignals(installationId, signals, repoNameToId). Writes to the signals table. The unique index on (user_id, type, repository_id, event_timestamp, content_hash) makes this idempotent.
Record completion — setLastSuccessfulIngestAt() stores timestamp, preset name, repo_count, and signal_count in ingest_log.

What normalizeGitHubData does

Converts raw GitHub commits/PRs/issues/reviews/comments into Signal[].
Splits code additions/deletions proportionally across co-authors.
Extracts boolean flags: isSelfReview, isSelfMerge, isInMergedPR, hasLinkedIssue, isBot.
Bot detection: author.type === 'Bot' OR login.endsWith('[bot]').
Stores per-signal metadata JSONB (max 2048 bytes per row — enforced by DB CHECK constraint).

Phase 2 — Score (`scoreActivePresetForEntity`)

Entry point: scoreActivePresetForEntity(ctx, repos, teamMemberships, entityType) in lib/leaderboard/score.ts

Steps

Load signals — getSignalsForOrg(installationId, repos). Reads entirely from the signals table. Zero GitHub API calls.
Loop over all 6 time periods — SCORE_TIME_PERIODS = ['today', 'week', 'month', 'quarter', 'half_year', 'all_time'] (defined in lib/constants/time.ts).
Per period:
- getDateRangeForScoreTimePeriod(period) → { from, to } (see table below)
- isSignalInRange(signal, from, to) filters signals in-memory
- aggregateLeaderboard(signals, rules, entityType, teamMemberships) → ranked entries
- Acquire Redis advisory lock, then writePresetComputedScores(presetId, scopeType, org, team, repo, entityType, period, entries) → writes to computed_scores + leaderboard_materializations
Background fill — scoreRemainingEntityTypes(ctx, repos, teamMemberships, primaryEntityType) scores the other two entity types asynchronously.

Time period → date range mapping

Period	from	to
`today`	00:00:00 UTC today	now
`week`	now − 7 days	now
`month`	now − 30 days	now
`quarter`	now − 90 days	now
`half_year`	now − 180 days	now
`all_time`	`null` (no filter)	`null` (no filter)

all_time is fully implemented. getSignalsForOrg() called without a date filter returns all rows in the signals table. Freshness TTL is 30 minutes.

Phase 3 — Serve (`serveLeaderboard`)

Entry point: serveLeaderboard(ctx) in lib/leaderboard/serve.ts

Two paths depending on whether the request includes a custom date range:

Standard request (no from/to)

servePrecomputedScores(presetId, timePeriod) — single DB lookup against computed_scores joined to leaderboard_materializations. Returns instantly with no computation.

Active preset: resolved via getActiveScoringRulesPreset() which queries scoring_presets WHERE is_active = true.

Custom date range (from/to present)

computeCustomDateRange(ctx):

getSignalsForOrg(installationId, repos, { since: from, until: to }) — reads signals from DB within the date range.
aggregateLeaderboard() — aggregates in-memory.
Returns immediately. No write to computed_scores.
Result is still written to the Redis response cache.

Custom ranges are not persisted, so repeated identical requests re-compute each time (though the Redis response cache may serve them).

Freshness enforcement (`evaluateIngestFreshness`)

Module: lib/leaderboard/ingest-freshness.ts


evaluateIngestFreshness(lastSuccessfulIngestAt, filters):
  ingestAgeMs = Date.now() - Date.parse(lastSuccessfulIngestAt)
  cooldownActive = (ingestAgeMs != null) AND (ingestAgeMs < 86_400_000)
  ingestStale = (ingestAgeMs > ttlMs) OR forceRefresh OR (never ingested)

24-hour cooldown

INGEST.COOLDOWN_MS = 86_400_000 (24 hours). Once a successful ingest completes, no new full ingest is triggered for 24 hours regardless of staleness. This prevents hammering the GitHub API.

Manual sync request during cooldown → 429 Too Many Requests.
Non-manual request during cooldown → skip INGEST, serve from cache or precomputed scores.

Per-period freshness TTLs (`INGEST.TTL_MS` in `lib/constants/time.ts`)

Period	TTL
`today`	60,000 ms (1 minute)
`week`	300,000 ms (5 minutes)
`month`	900,000 ms (15 minutes)
`quarter`	900,000 ms (15 minutes)
`half_year`	1,800,000 ms (30 minutes)
`all_time`	1,800,000 ms (30 minutes)

If ingestStale AND !cooldownActive, INGEST is triggered before SCORE and SERVE proceed.

Ingest limits

These constants are defined in lib/github/ingest-limits.ts:

Constant	Value
`MAX_COMMITS_PER_REPO`	100
`MAX_PULLS_PER_REPO`	50
`MAX_ISSUES_PER_REPO`	50
`REPO_CONCURRENCY`	6
`DEFAULT_LOOKBACK_DAYS`	90
`PAGE_SAFETY_CAP`	20

Overview

Every leaderboard request passes through three phases. Phases are skipped when cached data is fresh enough.

Phase 1 — Ingest (`runIngest`)

Entry point: runIngest(ctx) in lib/leaderboard/ingest.ts


ctx = { octokit, organization, organizationId, installationId, filters }

Steps

Acquire Redis lock — key leaderboard:lock:{stableIngestScopeId}. If locked, skip; another ingest is already running for this scope.
Fetch — fetchOrgScoringDataGraphQL(octokit, org, { since, repos, forceRefresh }). GraphQL, paginated. Per-repo limits apply (see Reference → Ingest limits).
Normalize — normalizeGitHubData(rawData) → Signal[]. Each signal gets a SHA-256 content hash (first 32 chars) derived from event_type + user + repo + timestamp + identifier for deduplication.
Store — upsertSignals(installationId, signals, repoNameToId). Writes to the signals table. The unique index on (user_id, type, repository_id, event_timestamp, content_hash) makes this idempotent.
Record completion — setLastSuccessfulIngestAt() stores timestamp, preset name, repo_count, and signal_count in ingest_log.

What normalizeGitHubData does

Converts raw GitHub commits/PRs/issues/reviews/comments into Signal[].
Splits code additions/deletions proportionally across co-authors.
Extracts boolean flags: isSelfReview, isSelfMerge, isInMergedPR, hasLinkedIssue, isBot.
Bot detection: author.type === 'Bot' OR login.endsWith('[bot]').
Stores per-signal metadata JSONB (max 2048 bytes per row — enforced by DB CHECK constraint).

Phase 2 — Score (`scoreActivePresetForEntity`)

Entry point: scoreActivePresetForEntity(ctx, repos, teamMemberships, entityType) in lib/leaderboard/score.ts

Steps

Load signals — getSignalsForOrg(installationId, repos). Reads entirely from the signals table. Zero GitHub API calls.
Loop over all 6 time periods — SCORE_TIME_PERIODS = ['today', 'week', 'month', 'quarter', 'half_year', 'all_time'] (defined in lib/constants/time.ts).
Per period:
- getDateRangeForScoreTimePeriod(period) → { from, to } (see table below)
- isSignalInRange(signal, from, to) filters signals in-memory
- aggregateLeaderboard(signals, rules, entityType, teamMemberships) → ranked entries
- Acquire Redis advisory lock, then writePresetComputedScores(presetId, scopeType, org, team, repo, entityType, period, entries) → writes to computed_scores + leaderboard_materializations
Background fill — scoreRemainingEntityTypes(ctx, repos, teamMemberships, primaryEntityType) scores the other two entity types asynchronously.

Time period → date range mapping

Period	from	to
`today`	00:00:00 UTC today	now
`week`	now − 7 days	now
`month`	now − 30 days	now
`quarter`	now − 90 days	now
`half_year`	now − 180 days	now
`all_time`	`null` (no filter)	`null` (no filter)

all_time is fully implemented. getSignalsForOrg() called without a date filter returns all rows in the signals table. Freshness TTL is 30 minutes.

Phase 3 — Serve (`serveLeaderboard`)

Entry point: serveLeaderboard(ctx) in lib/leaderboard/serve.ts

Two paths depending on whether the request includes a custom date range:

Standard request (no from/to)

servePrecomputedScores(presetId, timePeriod) — single DB lookup against computed_scores joined to leaderboard_materializations. Returns instantly with no computation.

Active preset: resolved via getActiveScoringRulesPreset() which queries scoring_presets WHERE is_active = true.

Custom date range (from/to present)

computeCustomDateRange(ctx):

getSignalsForOrg(installationId, repos, { since: from, until: to }) — reads signals from DB within the date range.
aggregateLeaderboard() — aggregates in-memory.
Returns immediately. No write to computed_scores.
Result is still written to the Redis response cache.

Custom ranges are not persisted, so repeated identical requests re-compute each time (though the Redis response cache may serve them).

Freshness enforcement (`evaluateIngestFreshness`)

Module: lib/leaderboard/ingest-freshness.ts


evaluateIngestFreshness(lastSuccessfulIngestAt, filters):
  ingestAgeMs = Date.now() - Date.parse(lastSuccessfulIngestAt)
  cooldownActive = (ingestAgeMs != null) AND (ingestAgeMs < 86_400_000)
  ingestStale = (ingestAgeMs > ttlMs) OR forceRefresh OR (never ingested)

24-hour cooldown

INGEST.COOLDOWN_MS = 86_400_000 (24 hours). Once a successful ingest completes, no new full ingest is triggered for 24 hours regardless of staleness. This prevents hammering the GitHub API.

Manual sync request during cooldown → 429 Too Many Requests.
Non-manual request during cooldown → skip INGEST, serve from cache or precomputed scores.

Per-period freshness TTLs (`INGEST.TTL_MS` in `lib/constants/time.ts`)

Period	TTL
`today`	60,000 ms (1 minute)
`week`	300,000 ms (5 minutes)
`month`	900,000 ms (15 minutes)
`quarter`	900,000 ms (15 minutes)
`half_year`	1,800,000 ms (30 minutes)
`all_time`	1,800,000 ms (30 minutes)

If ingestStale AND !cooldownActive, INGEST is triggered before SCORE and SERVE proceed.

Ingest limits

These constants are defined in lib/github/ingest-limits.ts:

Constant	Value
`MAX_COMMITS_PER_REPO`	100
`MAX_PULLS_PER_REPO`	50
`MAX_ISSUES_PER_REPO`	50
`REPO_CONCURRENCY`	6
`DEFAULT_LOOKBACK_DAYS`	90
`PAGE_SAFETY_CAP`	20

Overview

Phase 1 — Ingest (runIngest)

Steps

What normalizeGitHubData does

Phase 2 — Score (scoreActivePresetForEntity)

Steps

Time period → date range mapping

Phase 3 — Serve (serveLeaderboard)

Standard request (no from/to)

Custom date range (from/to present)

Freshness enforcement (evaluateIngestFreshness)

24-hour cooldown

Per-period freshness TTLs (INGEST.TTL_MS in lib/constants/time.ts)

Ingest limits

Overview

Phase 1 — Ingest (runIngest)

Steps

What normalizeGitHubData does

Phase 2 — Score (scoreActivePresetForEntity)

Steps

Time period → date range mapping

Phase 3 — Serve (serveLeaderboard)

Standard request (no from/to)

Custom date range (from/to present)

Freshness enforcement (evaluateIngestFreshness)

24-hour cooldown

Per-period freshness TTLs (INGEST.TTL_MS in lib/constants/time.ts)

Ingest limits

Phase 1 — Ingest (`runIngest`)

Phase 2 — Score (`scoreActivePresetForEntity`)

Phase 3 — Serve (`serveLeaderboard`)

Freshness enforcement (`evaluateIngestFreshness`)

Per-period freshness TTLs (`INGEST.TTL_MS` in `lib/constants/time.ts`)

Phase 1 — Ingest (`runIngest`)

Phase 2 — Score (`scoreActivePresetForEntity`)

Phase 3 — Serve (`serveLeaderboard`)

Freshness enforcement (`evaluateIngestFreshness`)

Per-period freshness TTLs (`INGEST.TTL_MS` in `lib/constants/time.ts`)