What gets checked.
And why it matters.
Every criterion is evidence-based: automatically collected from GitLab, not from manual inspection. Below is a complete guide to what each criterion measures, how it is collected, and what earns each grade band.
Process Quality
How students work — commit discipline, PR workflow, issue tracking, milestone delivery. Process criteria measure engineering habits, not just outcomes.
Commit History Quality
How consistently and thoughtfully commits are made across the project timeline.
Why it matters
Daily, incremental commits mirror professional engineering discipline. Commit dumps the night before a deadline signal learning-by-cramming, not development mastery.
How it is measured
GitLab API /repository/commits — frequency, message format, branching discipline, author distribution.
Band descriptors
Exemplary commit discipline. Atomic commits with clear scope and context. Branch-per-feature workflow. Active across entire milestone.
Consistent daily/weekly commits. Scoped messages follow a convention. Feature branches used.
Regular commits across most of the milestone. Messages describe intent, basic branching used.
Some commits across the timeline, but irregular. Messages vague or auto-generated.
Sparse commits concentrated at deadlines. No evidence of ongoing development.
Pull Request Quality
How merge requests are described, reviewed, and managed.
Why it matters
Code review is the primary feedback loop in professional teams. Meaningful MR descriptions and review engagement show collaborative engineering maturity.
How it is measured
GitLab API /merge_requests — description completeness, review threads, approval counts, time-to-merge.
Band descriptors
Thorough MR descriptions referencing issues and design decisions. Multi-reviewer workflow. Approval gatekeeping enforced. Review comments acted upon.
Consistent MR descriptions with context and test evidence. Meaningful reviews with constructive feedback.
MRs with basic descriptions. At least one reviewer per MR. Some inline comments.
Some MRs created but descriptions are empty or trivial. Minimal review.
No MRs, or all merged directly to main. No review activity.
Issue Traceability
How well tasks, features, and bugs are tracked through GitLab issues and linked to code.
Why it matters
Traceability — from user story to issue to branch to MR — is a core professional practice. It makes work visible, reversible, and auditable.
How it is measured
GitLab API /issues + MR references — issues linked to commits/branches/MRs, label usage, assignment.
Band descriptors
Full traceability: every feature/bug has an issue, branch, and MR. Labels and milestones used systematically. Issue boards reflect actual workflow.
All significant work tracked in issues. Issues linked to MRs/branches consistently. Milestone coverage complete.
Most issues reference an MR or branch. Milestone assignment used for some issues.
Issues created but not systematically linked to branches or MRs.
No issues or issues unconnected to any code. Work happens outside the issue tracker.
Milestone Delivery
How well the team delivers planned scope by each milestone.
Why it matters
Milestone delivery is the core signal of planning and execution discipline. Teams that consistently deliver on schedule demonstrate professional project management.
How it is measured
GitLab API /milestones — open/closed issue ratio per milestone, overdue issue count, milestone state.
Band descriptors
Exemplary delivery: issues closed incrementally, not in a final rush. All milestones met or scoped down intentionally. Proactive issue management.
Consistent delivery: ≥75% of issues closed by milestone due date across multiple milestones.
Most issues closed on time for at least one milestone. Some planning evidence.
Milestones set, but fewer than half the issues closed by the due date.
No milestones defined, or milestones consistently missed with most issues open at due date.
Product Quality
What students produce — requirements, code, tests, CI/CD, and release management. Product criteria measure the quality and completeness of deliverables.
Requirements Quality
Whether user stories or acceptance criteria are defined and maintained.
Why it matters
Engineering without documented requirements produces unmaintainable software. Acceptance criteria define "done" before code is written — a professional habit.
How it is measured
Repository file scan — presence and quality of user stories, acceptance criteria in docs/, wiki, or issues.
Band descriptors
Living requirements document updated across milestones. Acceptance criteria testable and linked to test cases. Stories reflect real user needs.
Complete user stories with well-formed acceptance criteria. Consistent format used throughout.
User stories exist with basic acceptance criteria. Linked to issues or milestones.
Brief requirements exist but lack structure. No acceptance criteria.
No requirements documentation. No user stories or acceptance criteria anywhere in the repo.
Repository Setup
Whether the repository has a proper README, .gitignore, and project structure.
Why it matters
A well-configured repository is the foundation of professional collaboration. It signals that the team thinks about maintainability from day one.
How it is measured
File scan — README quality, .gitignore completeness, directory structure clarity, license file.
Band descriptors
Professional-grade README with badges, architecture diagram, contribution guide, and API docs. Project structure aligns with the chosen framework conventions.
Detailed README with architecture overview, setup, and usage. .gitignore comprehensive. Clear project structure.
README with setup instructions. .gitignore covers common generated files. Logical directory layout.
Minimal README. Basic .gitignore. No clear directory structure.
No README. No .gitignore. Repo is a flat dump of files.
CI/CD Configuration
Whether automated pipelines are configured, maintained, and producing passing builds.
Why it matters
Continuous integration is industry standard. A team that lets its pipeline stay red is shipping blind. A maintained pipeline is the team's commitment to code quality.
How it is measured
GitLab CI config file presence + API /pipelines — pipeline run count, pass/fail rate, stage coverage.
Band descriptors
Full pipeline: test → lint → build → optional deploy. Near-100% pass rate. Pipeline protected; merges blocked on red builds.
Pipeline with test, lint, and build stages. High pass rate. Failures addressed promptly.
Pipeline runs consistently. At least one test stage. Some failures but addressed.
Pipeline exists but frequently broken. No automated tests in pipeline.
No CI/CD configured, or pipeline never runs.
Code Quality (SAST)
Static analysis findings on the codebase — security issues, code smells, and anti-patterns.
Why it matters
Automated code quality tools surface issues that manual review misses. Using them is a professional practice; ignoring them is a liability.
How it is measured
Semgrep / SonarQube analysis output — severity distribution of findings, trend over milestones.
Band descriptors
Exemplary code quality: near-zero findings across all severities. Quality gates block merges on new violations. Trend improving across milestones.
Clean SAST report: zero high/critical findings. Medium findings triaged. Quality gate enforced.
No critical/high findings. Medium findings acknowledged. SAST integrated in pipeline.
High-severity findings present. SAST tool configured but findings ignored.
Critical security findings or no SAST tooling present.
Testing Practice
Whether automated tests exist and are maintained alongside the codebase.
Why it matters
Tests are the first signal of engineering discipline. Code without tests is unmaintainable in the long run — and a team that skips tests is borrowing against future velocity.
How it is measured
CI pipeline artifacts — test run results, coverage percentage, test count over time.
Band descriptors
Exemplary test suite: ≥85% coverage, comprehensive edge cases, tests serve as documentation. TDD evidence or test-first discipline visible in commit history.
High coverage (≥70%). Tests written alongside features, not as afterthought. Coverage trend positive.
Reasonable test coverage (≥40%). Tests run in CI. Coverage reported.
A few tests exist but coverage is minimal (<20%). Tests often broken.
No automated tests. No test files in the repository.
Test Case Quality
Whether tests are well-designed — meaningful assertions, edge case coverage, test isolation.
Why it matters
Coverage without quality is a false signal. A test that always passes regardless of behavior is worse than no test — it creates false confidence.
How it is measured
Code review of test files — assertion quality, edge case coverage, use of mocks/stubs, test naming clarity.
Band descriptors
Exemplary test quality: property-based or mutation-tested, parameterized test cases, isolated and deterministic. Tests serve as living specification.
Thorough test design: boundary conditions, error paths, and integration scenarios. Tests document intended behavior.
Mix of happy path and error case tests. Meaningful assertions. Tests are readable.
Tests cover happy paths only. Assertions verify output but miss boundary conditions.
Tests exist but assertions are trivial or always pass. No edge cases tested.
Release Management
Whether releases are versioned, tagged, and aligned with milestones.
Why it matters
A team that cannot release reliably cannot ship. Release management demonstrates end-to-end delivery capability — not just code writing.
How it is measured
GitLab API /releases + /repository/tags — release count, semver tag usage, release note quality, milestone alignment.
Band descriptors
Professional release process: semver, changelog, release notes referencing issues closed. Automated release via pipeline. Consistent cadence.
Semver-tagged releases with meaningful release notes. Releases aligned to milestone deliverables.
Tagged releases per milestone. Basic release notes present.
Some tags exist, but inconsistent. No release notes.
No tags or releases. No versioning of any kind.
Individual Contribution & Team Collaboration
Per-student metrics that separate individual contribution from team output. These criteria ensure no one coasts, and that collaboration skills are rewarded.
Individual Commit Volume
Per-person commit count and spread across the milestone timeline.
Why it matters
Commit frequency indicates active participation. One team member making 90% of commits signals contribution imbalance — a risk for team projects.
How it is measured
GitLab API /repository/contributors — commit count per author, weekly distribution, % of total commits.
Band descriptors
High commit volume with excellent spread. Active every week of the milestone. Proportional contribution to team total.
Consistent commits across the milestone. Contributing meaningfully to team output.
Reasonable commit count, spread across most of the milestone.
Low commit count (<15% of team total) or highly concentrated near deadline.
Fewer than 5 commits, or all commits in the final 48 hours.
Individual Code Contribution
Per-person lines added/deleted and files touched.
Why it matters
Contribution size (with context) indicates how much of the codebase an individual actually built. Low contribution may signal free-riding.
How it is measured
GitLab API /repository/contributors — additions, deletions, net lines, file count per author.
Band descriptors
Exemplary contribution: significant lines of meaningful code, spread across features and layers of the stack.
Above-average contribution. Files spread across features, not concentrated in one area.
Average contribution relative to team size. Files changed across the codebase.
Below-average contribution. Evidence of minimal individual work.
Negligible code contribution (<5% of team total).
PR Review Participation
How actively each individual reviews others' code.
Why it matters
Code review is a learnable skill. Students who review code actively develop critical thinking about software design — and contribute to team quality.
How it is measured
GitLab API /merge_requests/:iid/approvals and MR notes — review count, inline comments, approvals per member.
Band descriptors
Exemplary reviewer: detailed comments that demonstrably improve code quality, identifies issues not caught by others, approvals backed by engagement.
Consistent reviewer: meaningful inline comments, constructive feedback, engages with revisions.
Regular reviews with some inline comments. Engages with most MRs.
One or two reviews, no inline comments. Approvals given without engagement.
No reviews given. No evidence of reviewing any MR.
Task Assignment Completion
The ratio of assigned issues closed by the milestone due date.
Why it matters
Committing to tasks and delivering them is the baseline of professional accountability. Issues assigned and never closed represent broken commitments.
How it is measured
GitLab API /issues?assignee_id= — ratio of closed vs total assigned issues per member, late close rate.
Band descriptors
Exemplary execution: all assigned issues closed on time or early. Proactive — re-scopes or escalates rather than letting issues go stale.
High completion (≥85%). Issues closed incrementally, not in a rush before deadline.
Good completion rate (≥70%). Most issues closed before or on the due date.
Most assigned issues eventually closed, but often late.
Fewer than half of assigned issues closed. Many issues abandoned or transferred.
Qualitative Peer Feedback
Anonymous peer-rated score from teammates on collaboration, contribution, and professionalism.
Why it matters
Peer feedback surfaces soft skills and team dynamics that automated metrics cannot detect. A high technical score with a low peer rating is a signal worth investigating.
How it is measured
External peer evaluation survey (imported via CSV) — averaged score per student, rater count.
Band descriptors
Score 4.1+. Consistently high across all raters. Peers describe exemplary collaboration.
Score 3.5–4.0. Above average. Peers describe positive contributions.
Score 3.0–3.4. Around team average. Mixed feedback.
Score 2.0–2.9. Below team average. Some negative themes.
Average peer score below 2.0 / 5. Consistent negative feedback from multiple raters.
Ready to see how your repository measures up?