Overview
Quality Assurance Engineer Jobs in European Union at InData Labs
Title: Quality Assurance Engineer
Company: InData Labs
Location: European Union
Role Overview
We are looking for a highly technical AI Quality & Validation Engineer to build the framework for testing and certifying our AI agents and applications. This isn't a traditional "Automation QA" role; you will be responsible for the scientific and statistical validation of non-deterministic systems. You will design how we measure "intelligence," safety, and reliability in Multi-Agent Systems and complex Chain-of-Thought workflows.
Core Responsibilities:
- Advanced AI Validation Frameworks: Design and implement end-to-end testing systems for AI agents. Go beyond simple assertions to validate complex logic, decision-making sequences, and multi-agent orchestrations.
- Statistical & Prompt Evaluation: Develop methodologies for Prompt Evaluation. Utilize statistical testing to measure model drift, hallucinations, and output variance across large-scale datasets.
- Metric Definition (Regline & Guardrails): Define and monitor sophisticated quality metrics (e.g., Faithfulness, Relevancy, Answer Correctness, Semantic Similarity). Establish "guardrails" to ensure agents operate within safety and regulatory boundaries.
- LLM Ops Integration: Integrate validation layers into the LLM Ops lifecycle. Automate versioning, monitoring, and performance benchmarking at scale within the CI/CD pipeline.
- Input/Output Engineering: Build systems to validate structured inputs and outputs, ensuring that the interface between the LLM and the application layer is robust and type-safe.
Technical Requirements:
- AI/ML Domain Expertise: Deep understanding of how LLMs work "under the hood." Experience with LangChain, LlamaIndex, or CrewAI/AutoGPT frameworks.
- Advanced Programming: Expert-level Python skills. Ability to read, understand, and contribute to complex AI/ML codebases.
- AI Evaluation Toolkit: Hands-on experience with evaluation tools such as RAGAS, DeepEval, LangSmith, TruLens, or Promptfoo.
- Statistical Testing Mastery: Knowledge of how to test non-deterministic outputs using statistical hypothesis testing and model-based evaluation (using an LLM to judge an LLM).
- Engineering Infrastructure: Strong experience with Docker, CI/CD pipelines, and monitoring tools (e.g., MLflow, Weights & Biases).
- AI-Native Workflow: Proficiency in using and configuring AI Coding Agents (e.g., Cursor, GitHub Copilot) to automate internal testing and development tasks.
What Success Looks Like
You aren't just looking for "bugs" in the code; you are proving—statistically and systematically—that our AI agents will perform reliably on massive scales. You understand the Chain-of-Thought, you can debug a Multi-Agent loop, and you know how to turn "subjective quality" into "hard data."
Key Tech Stack:
- Languages: Python, SQL.
- AI Frameworks: LangChain, AutoGPT, OpenAI/Anthropic APIs.
- Validation: RAGAS, DeepEval, Statistical Sampling.
- Ops: GitHub Actions, Kubernetes, LLM Monitoring.