PhD Research Project – Quality Benchmarking for LLM-Based Qualitative Analysis

Research · Stockholm (Hybrid) · Paid Research Project / PhD Collaboration (part-time)

Nava Insights is building an AI-native qualitative user research platform where AI conducts real voice interviews and produces insights traditionally delivered by senior human researchers.

Our hardest challenge is not generating insights, but evaluating their quality in a rigorous, defensible, and repeatable way. In qualitative reasoning there is no ground truth, no single correct answer, and no obvious metric. Yet decisions depend on getting this right.

This research project tackles one of the most difficult open problems in applied AI today:
How do we evaluate, compare, and safeguard the quality of AI-generated qualitative reasoning over time?

Research Context

Large language models are increasingly used to analyse interviews, synthesize themes, and generate insights. However:

Qualitative synthesis has no objective ground truth
Models can produce outputs that sound plausible but are subtly wrong
Standard evaluation metrics do not apply
Even evaluation systems themselves can drift or bias results
Quality must be preserved across model updates, prompt changes, and system evolution

Most AI systems avoid this problem entirely.
At Nava Insights, we are building infrastructure to solve it.

Research Objective

Design and validate a statistical and methodological framework for evaluating qualitative insight quality produced by AI systems.

The project will result in:

A formal Gold Standard definition for qualitative findings
A benchmark framework that allows comparison across:
- Different AI models
- Different prompting and agent strategies
- Human versus AI analysis
A regression testing approach that ensures insight quality does not degrade as systems evolve

Research Questions (Examples)

How can qualitative insight quality be operationalized without ground truth?
Which dimensions of insight quality are stable across evaluators?
How can uncertainty, confidence calibration, and evidence grounding be measured?
How do different AI systems fail in qualitative synthesis, and how can these failures be detected early?
How can statistical methods support evaluation of inherently judgment-based outputs?

What You Will Work On

Formalising Nava Insights Gold Standard – Findings (v1.0) from expert human analyses
Defining evaluation dimensions for insights, themes, and behavioural archetypes
Designing benchmark datasets based on real qualitative studies
Developing scoring and aggregation methods that handle subjective judgments
Comparing:
- Generic LLM-based analysis
- Method- and judgment-driven AI agent systems
Building a reusable evaluation and regression framework for ongoing model updates

Ideal Background

This project is well suited for a PhD student or advanced MSc student in:

Statistics
Machine Learning
Data Science
Applied Mathematics
Computer Science with strong statistical focus
Human–AI interaction with quantitative methods

You likely:

Are comfortable working with uncertainty and imperfect data
Enjoy problems without clean labels or ground truth
Have experience with statistical evaluation, experimental design, or measurement theory
Are interested in AI evaluation, model comparison, or AI safety
Can work independently and define structure in open-ended research problems

Prior experience with qualitative research is a plus, but not required.

Why This Project Is Academically Strong

Tackles a real, unsolved problem in AI evaluation
Combines statistics, AI systems, and human judgment
Produces results applicable beyond Nava Insights
Suitable for academic publication or thesis work
Bridges theory and production systems

What Nava Insights Provides

Access to real-world qualitative datasets and AI systems
Close collaboration with product, engineering, and research leaders
A problem space with direct industrial relevance
Opportunity to see research results deployed in production
Flexible setup suitable for PhD collaboration or thesis alignment

Practical Details

Paid research project or PhD collaboration
Flexible duration (typically 3-6 months depending on scope)
Hybrid setup in Stockholm
Possibility to align with university supervision

If you are interested in working on AI evaluation where correctness cannot be reduced to accuracy, we would love to talk.

Department Research

Location Stockholm (Hybrid)

Employment type Paid Research Project / PhD Collaboration (part-time)

Apply for this job

Apply today! Interviews are conducted on an ongoing basis. If you have any questions about the position or our recruitment process, please feel free to contact us or send your application to amin@navainsights.io.

You can also submit your application using the form below.

By submitting this form, I agree that I have read the Privacy Policy and confirm that Nava Insights store my personal details to be able to process my data.