Role Description
- Responsible for building production-ready backend services for agentic workflow
components aligned to solution architecture and platform standards. - Implement solution designs by building Python services, worker processes, and
reusable libraries following defined architecture, patterns, and standards. - Develop agentic workflow components: tool connectors, orchestration steps, state
management modules, retrieval components, and approval/escalation flows. - Build reliable LLM interaction layers: tool/function calling, schema-validated
structured outputs, guardrails, safe tool execution boundaries, and fallback
behaviors. - Implement robust backend patterns: async execution, job queues,
retries/idempotency, compensating actions, and failure isolation for long-running
workflows. - Deliver production readiness: logging/tracing, metrics, decision logs, run replay
support, performance profiling, and cost/latency controls. - Write clean, maintainable, testable code with strong review discipline:
unit/integration tests and regression testing for prompts/agents where applicable. - Collaborate closely with the Agentic AI Architect and technical leads; support
delivery across DEV/UAT/PROD including defect triage and operational support.
Location
- The role supports one of our top-tier banking clients in London (Canary Wharf) and
requires a minimum of three days on-site presence. - This is a permanent position based in the UK. We will only consider applicants who
are eligible to work in the UK. For this role do NOT offer visa sponsorship.
Experience Requirements & Qualifications
Core Experience
- 4+ years in Python backend development, including building production
APIs/services and/or worker-based processing systems. - Demonstrable experience in implementing Generative AI, AI/LLM-enabled features
or systems (agentic workflows, RAG, tool calling, evaluation/monitoring) is preferred. - Strong capability in backend fundamentals: service boundaries, API contracts, async
execution, retries/idempotency, error handling, and performance optimization. - Advanced Python engineering skills: clean architecture, modularity, testability,
packaging, secure coding, and maintainability at team scale. - Strong experience building API-first services (FastAPI or equivalent), RESTFul APIs
including auth patterns (OAuth2/JWT/API keys), versioning, and backwards
compatibility. - Integrate and manage relational and vector databases.
- Strong schema/data contract practice using typed models and validation (e.g.,
Pydantic-style patterns), including strict structured outputs and schema evolution. - Working with version control tools like GitHub (branching, PR reviews, release
tagging, CI-friendly workflows). - Strong experience with context grounding methods, and context engineering when
working with LLMs (RAG, evidence capture, context selection, prompt/context
structuring). - Experience using automation tools and integrating with external applications (API-
based integrations, workflow triggers/actions, third-party systems). - Experience building integration-heavy systems: consuming/producing APIs, handling
enterprise data formats, and creating maintainable connectors. - Working knowledge of distributed execution patterns: background jobs, scheduling,
worker pools, and stateful workflows. - Ability to work with ambiguity, break down requirements, and deliver reliably with
strong ownership and communication.
Nice to Have
- Experience with agent orchestration frameworks (e.g., LangGraph-like patterns) and
LLM observability/evaluation tools (Langfuse-like capabilities). - Experience integrating enterprise-hosted LLMs (including vertex AI / managed
equivalents) and working with provider-agnostic abstraction layers (routing, fallback,
cost-aware selection). - Experience with job queues, distributed tracing, dashboards/alerts, and runbook-
driven operational practices. - Experience supporting regulated enterprise delivery: audit-friendly logging, change
controls, secure configuration, and controlled deployments. - Platform/DevOps awareness (preference): Docker basics; Kubernetes/OpenShift
fundamentals; logging/monitoring patterns; secrets management and environment
separation (DEV/UAT/PROD).
Main Tasks and Responsibilities
1) Build Python Services and Agentic Components
- Develop production-grade backend services and worker processes aligned to the
defined solution architecture. - Implement orchestration components: job queues, scheduling, state
management/state machines, retries, idempotency, and compensating actions. - Build and maintain tool connectors/integrations with enterprise systems (APIs,
databases, files), following safe execution boundaries and permission controls. - Contribute reusable libraries and shared components to accelerate delivery across
multiple client solutions.
2) Implement Reliable LLM and RAG Capabilities
- Integrate LLM capabilities into services using tool/function calling, structured
outputs, and strict schema validation. - Develop and maintain RAG pipelines: ingestion, indexing, retrieval, grounding, and
evidence capture/citations where required. - Apply context engineering practices: selecting/structuring context, minimizing
irrelevant context, maintaining traceability, and improving response determinism. - Implement guardrails and safety controls: input validation/sanitization, output
validation, refusal/fallback handling, and policy-aligned tool usage.
3) Testing, Quality, and Release Discipline
- Build and maintain test suites: unit, integration, and regression testing (including
prompt/agent regression where applicable). - Participate in code reviews and follow engineering standards for maintainability,
security, and correctness. - Use GitHub-based workflows effectively: PR hygiene, branching strategies, code
owner reviews, and CI/CD integration. - Support release processes with strong documentation, configuration discipline, and
readiness checks.
4) Observability, Performance, and Operational Readiness
- Implement logging, tracing, metrics, and decision logs for services and agent runs;
support run replay and incident investigation. - Profile performance bottlenecks and optimize latency, throughput, and cost across
critical paths. - Contribute to dashboards, alerts, runbooks, and operational procedures to maintain
stable production systems.
5) Security, Compliance, and Enterprise Delivery
- Implement secure coding practices, secrets handling, and least-privilege patterns in
tool execution and integrations. - Follow enterprise governance expectations: audit-friendly logs, change controls,
environment separation, and controlled deployments. - Collaborate closely with the Agentic AI Architect, infra Teams, COE to deliver
compliant, production-ready solutions .


