Landing Page Improvement Eval Baseline

This repository is a testbed for evaluating AI agent capabilities on frontend design tasks.

The starting point (index.html + styles.css) is intentionally bad:

weak visual hierarchy and inconsistent typography
dated/unpolished UI decisions
accessibility issues
fixed-width layout and poor responsive behavior (min-width: 1460px)

The goal is to use this baseline to compare how different models/agents improve the same page under the same constraints.

Why This Repo Exists

Use this repo to benchmark model behavior on practical frontend redesign work:

taking a low-quality landing page to production-level quality
preserving product message while improving UX and conversion structure
making the page responsive and more accessible
comparing design quality differences between models from a shared baseline

Baseline Files

index.html: intentionally weak landing page structure/content
styles.css: intentionally poor styling and non-responsive CSS

Suggested Evaluation Flow

Start from the repo as-is.
Run the same prompt (or prompt set) against multiple models.
Compare outputs on:
- visual design quality
- responsive behavior
- accessibility and semantics
- conversion-focused structure (hero, CTA, pricing, trust)
Record qualitative notes and any measurable differences.

Prompt Examples And Model Behavior

These are prompts already tested in this repo, with observed behavior:

Prompt 1

/frontend-design improve the design of this website

Claude: works well
Codex (gpt5.3, gpt5.2): works poorly; output feels too dull

Prompt 2

improve the design of this website. use a similar design system to airbnb.

Claude: works well
Codex: works okay, but still too dull

Prompt 3 (more constrained)

Improve an existing SaaS landing page into a production-quality, conversion-focused page while keeping the same core product message.

Constraints:
- Use plain HTML/CSS/JS (no framework build step).
- Keep it easy to run locally by opening index.html.
- Preserve semantic HTML and accessibility.
- Mobile-first responsive behavior.
- Keep load lightweight (no heavy animation libraries).

Required improvements:
1. Visual hierarchy and typography system.
2. Better color system and contrast compliance.
3. Stronger hero section with clear CTA hierarchy.
4. Features section redesign (cards, spacing, icons/visual cues).
5. Social proof/testimonial or trust strip.
6. Pricing/CTA block with clear user flow.
7. Accessible navigation and footer.

Observed result:

Claude Code (Opus 4.5/4.6): still much better and more professional

Note: Codex performs marginally better on front-end tasks when instructed to use tailwind. Stlags far behind Opus 4.5/4.6 in quality, though.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
index.html		index.html
styles.css		styles.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Landing Page Improvement Eval Baseline

Why This Repo Exists

Baseline Files

Suggested Evaluation Flow

Prompt Examples And Model Behavior

Prompt 1

Prompt 2

Prompt 3 (more constrained)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Landing Page Improvement Eval Baseline

Why This Repo Exists

Baseline Files

Suggested Evaluation Flow

Prompt Examples And Model Behavior

Prompt 1

Prompt 2

Prompt 3 (more constrained)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages