PhD Research Project – Quality Benchmarking for LLM-Based Qualitative Analysis

Research · Stockholm (Hybrid) · Paid Research Project / PhD Collaboration (part-time)

Apply for this job

Nava Insights is building an AI-native qualitative user research platform where AI conducts real voice interviews and produces insights traditionally delivered by senior human researchers.

Our hardest challenge is not generating insights, but evaluating their quality in a rigorous, defensible, and repeatable way. In qualitative reasoning there is no ground truth, no single correct answer, and no obvious metric. Yet decisions depend on getting this right.

This research project tackles one of the most difficult open problems in applied AI today:
How do we evaluate, compare, and safeguard the quality of AI-generated qualitative reasoning over time?


Research Context

Large language models are increasingly used to analyse interviews, synthesize themes, and generate insights. However:

  • Qualitative synthesis has no objective ground truth
  • Models can produce outputs that sound plausible but are subtly wrong
  • Standard evaluation metrics do not apply
  • Even evaluation systems themselves can drift or bias results
  • Quality must be preserved across model updates, prompt changes, and system evolution

Most AI systems avoid this problem entirely.
At Nava Insights, we are building infrastructure to solve it.


Research Objective

Design and validate a statistical and methodological framework for evaluating qualitative insight quality produced by AI systems.

The project will result in:

  • A formal Gold Standard definition for qualitative findings
  • A benchmark framework that allows comparison across:
    • Different AI models
    • Different prompting and agent strategies
    • Human versus AI analysis
  • A regression testing approach that ensures insight quality does not degrade as systems evolve

Research Questions (Examples)

  • How can qualitative insight quality be operationalized without ground truth?
  • Which dimensions of insight quality are stable across evaluators?
  • How can uncertainty, confidence calibration, and evidence grounding be measured?
  • How do different AI systems fail in qualitative synthesis, and how can these failures be detected early?
  • How can statistical methods support evaluation of inherently judgment-based outputs?

What You Will Work On

  • Formalising Nava Insights Gold Standard – Findings (v1.0) from expert human analyses
  • Defining evaluation dimensions for insights, themes, and behavioural archetypes
  • Designing benchmark datasets based on real qualitative studies
  • Developing scoring and aggregation methods that handle subjective judgments
  • Comparing:
    • Generic LLM-based analysis
    • Method- and judgment-driven AI agent systems
  • Building a reusable evaluation and regression framework for ongoing model updates

Ideal Background

This project is well suited for a PhD student or advanced MSc student in:

  • Statistics
  • Machine Learning
  • Data Science
  • Applied Mathematics
  • Computer Science with strong statistical focus
  • Human–AI interaction with quantitative methods

You likely:

  • Are comfortable working with uncertainty and imperfect data
  • Enjoy problems without clean labels or ground truth
  • Have experience with statistical evaluation, experimental design, or measurement theory
  • Are interested in AI evaluation, model comparison, or AI safety
  • Can work independently and define structure in open-ended research problems

Prior experience with qualitative research is a plus, but not required.


Why This Project Is Academically Strong

  • Tackles a real, unsolved problem in AI evaluation
  • Combines statistics, AI systems, and human judgment
  • Produces results applicable beyond Nava Insights
  • Suitable for academic publication or thesis work
  • Bridges theory and production systems

What Nava Insights Provides

  • Access to real-world qualitative datasets and AI systems
  • Close collaboration with product, engineering, and research leaders
  • A problem space with direct industrial relevance
  • Opportunity to see research results deployed in production
  • Flexible setup suitable for PhD collaboration or thesis alignment

Practical Details

  • Paid research project or PhD collaboration
  • Flexible duration (typically 3-6 months depending on scope)
  • Hybrid setup in Stockholm
  • Possibility to align with university supervision

If you are interested in working on AI evaluation where correctness cannot be reduced to accuracy, we would love to talk.

Department Research
Location Stockholm (Hybrid)
Employment type Paid Research Project / PhD Collaboration (part-time)
Apply for this job

Apply today! Interviews are conducted on an ongoing basis. If you have any questions about the position or our recruitment process, please feel free to contact us or send your application to amin@navainsights.io.

You can also submit your application using the form below.

By submitting this form, I agree that I have read the Privacy Policy and confirm that Nava Insights store my personal details to be able to process my data.