Role Description
- The Technical Analyst – Data Lineage will support a Data Governance, Controls, and Reporting program for a top-tier banking client.
- Responsible for establishing and validating end-to-end lineage across critical datasets used in operational and regulatory reporting.
- Translate governance and reporting requirements into actionable lineage deliverables (source-to-target mapping, lineage diagrams, metadata standards, and audit evidence).
- Work with Data Platform, Data Engineering, Architecture, Risk, Compliance, and Security teams to define lineage standards, metadata capture, and control points.
- Maintain lineage artifacts for Critical Data Elements (CDEs), key reports, and priority data products.
- Support control design to ensure traceability from source systems → transformations → curated layers → consumption (dashboards/reports/APIs).
- Actively participate from discovery workshops through to implementation and continuous improvement.
- Ensure traceability from data definition → transformation logic → lineage evidence → audit readiness.
Location
- The role supports one of our top-tier banking clients in London (Canary Wharf) and requires a minimum of three days on-site presence.
- This is a permanent position based in the UK. We will only consider applicants who are eligible to work in the UK. For this role do NOT offer visa sponsorship.
Experience Requirements & Qualifications
- Minimum 3 years of relevant experience in data analytics, data quality, reporting controls, or data transformation programs (preferably in financial services).
Core Skills & Experience
- Minimum 3 years of relevant experience in data governance, lineage, metadata management, or controls programs within finance/banking.
- Strong understanding of data lineage concepts: technical lineage, business lineage, column-level lineage, impact analysis, and provenance.
- Hands-on experience with data lineage / metadata tooling in enterprise environments (e.g., Collibra, Alation, Informatica EDC/IDMC, IBM Infosphere, Microsoft Purview, Apache Atlas, Amundsen, DataHub or similar).
- Proven ability to build lineage for complex platforms: data lakes, warehouses, marts, and distributed processing (Spark-based pipelines).
- Strong proficiency in SQL for tracing transformations and validating mappings across layers.
- Working knowledge of ETL/ELT patterns, data modeling (dimensional + normalized), and batch scheduling dependencies.
- Ability to interpret data transformation logic from pipelines (Spark SQL / PySpark / Hive queries / orchestration configs).
- Strong documentation capability: source-to-target mappings, lineage diagrams, data dictionaries, metadata standards, and control evidence packs.
Technical Skills
- Strong proficiency in Python (data analysis/automation for metadata extraction, validation scripts, rule checks).
- Hands-on experience with PySpark and Spark SQL in production environments.
- Solid knowledge of Hive, Impala, HDFS, and Parquet.
- Advanced SQL skills; experience with Oracle databases is preferred.
- Working knowledge of Autosys & Apache Airflow.
- Experience with CI/CD tools (Git, Harness, UrbanCode Deploy (UCD), Red Hat OpenShift).
- Familiarity with AWS S3 for large-scale data storage.
- Exposure to Tableau (understanding data sources, extracts, dependencies) is a plus.
Nice-to-Have
- Experience with regulatory reporting data domains (risk, liquidity, capital, finance, BCBS 239 alignment, etc.).
- Knowledge of data governance operating models: CDEs, data ownership, stewardship, data quality dimensions.
- Experience creating audit-ready documentation and participating in audit walkthroughs.
- Experience working in Agile/Scrum delivery models.
- Familiarity with monitoring and alerting tools for data pipelines.
Experience Requirements & Qualifications
- Conduct discovery workshops to identify priority reports, data products, and Critical Data Elements (CDEs).
- Build and maintain end-to-end lineage across systems, including column-level mappings where required.
- Produce and maintain Source-to-Target Mapping (STTM) documentation and metadata standards.
- Validate lineage accuracy by tracing logic through SQL/Spark transformations and pipeline configurations.
- Support impact analysis for proposed changes (upstream/downstream dependencies, report impact, control impact).
- Partner with engineers and platform teams to improve metadata capture and lineage automation (where possible).
- Define lineage-related control points and produce audit-ready evidence (diagrams, mappings, query proofs, run evidence).
- Support UAT by validating that reported numbers can be traced and explained back to trusted sources.
- Maintain the lineage backlog and track changes across releases to ensure artifacts remain current.


