Senior Software Engineer (Grafana Databases, Managed Services)

Grafana Labs

senior

Location

Germany

Work Type

Remote

Open to applicants in

Germany

Seniority

senior

Posted

July 3, 2026

Total Compensation

€133,250

Yearly Savings (Comfortable)

€71,000

Want to apply for this job?

Subscribe to access the application link and 8,000+ more jobs

Job Description

The Managed Services team is a newly formed squad within the Databases department. It owns and operates shared, production-critical infrastructure that powers Grafana Cloud’s next-generation database products (Mimir, Loki, and Tempo). Today, this includes operating 100+ WarpStream clusters across multiple cloud providers and regions, with continued growth anticipated for the future. WarpStream acts as the streaming backbone for ingestion and read/write decoupling across databases. It sits directly on the hot path for metrics, logs, and traces, handling high-throughput, multi-consumer workloads at massive scale
In addition to streaming infrastructure, the team works closely with high-volume analytical and storage systems that power query-heavy and aggregation-heavy workloads, where latency, compression behavior, storage layout, and scaling characteristics matter deeply
As a Senior Engineer on Managed Services, you will take ownership of running these systems in production. This involves:
Operating and evolving 100+ multi-cloud streaming clusters and related database infrastructure
Diagnosing and eliminating cross-layer failure modes (e.g., object storage latency, noisy neighbors, control-plane bottlenecks, query performance regressions, etc.)
Designing safe upgrade and rollout strategies at scale
Improving observability, automation, and operational ergonomics
Partnering closely with database and platform teams to ensure safe scaling, partitioning, consumer fan-out, and query performance
Working directly with distributed systems behavior, Kubernetes scheduling dynamics, storage engines, compression trade-offs, etc
Serving as a primary escalation point and on-call for relevant incidents
Owning the relationship with all system vendors, including WarpStream Labs and others
As we are remote-first and our engineering organization is largely remote, we provide guidance and meet regularly using video calls, so an independent attitude and good communication skills are a must
This role blends deep distributed systems work with the opportunity to influence how the team approaches reliability, scaling, and operational excellence
Regular 1:1s with your manager and close collaboration with teammates across regions
Reviewing and defining SLOs for shared database infrastructure, proactively reducing error budgets through improvements to monitoring, automation, scaling strategies, and system design
Improving the diagnosability of core streaming and database systems in production, where possible
Implementing solutions that ensure reliability, scalability, and performance of high-throughput, multi-cloud infrastructure
Developing fault-tolerant patterns that account for distributed system realities such as storage latency, partition imbalance, noisy neighbors, and control-plane dependencies
Planning and executing safe upgrades and rollouts across dozens of production clusters
Collaborating with database and platform engineering leaders to influence architecture, roadmap priorities, and long-term strategy
Participating in PR review and contributing to design documents, automation, tooling, and code improvements that reduce operational risk
Sharing best practices and distributed systems knowledge with partner teams
Participating in incident response, from investigation through resolution and post-incident reviews (PIR)

Benefits

Vacation: Balance is key. Our team enjoys 30 days of paid vacation each year on top of national holidays, parental leave, and sick leave. We also take a breather on a number of Grafana Shutdown Days each year
Healthcare: We’re proud to provide health coverage or stipends for our colleagues in the US, UK, Canada, the Netherlands, Sweden, Singapore, and India
Retirement planning: There’s no time like the present to start saving for your future. We make employer contributions into the pension pots of our team members in the US, UK, Canada, the Netherlands, Sweden, and Germany
Professional development: On top of a $1,500 annual learning and development stipend, Grafanistas have thousands of on-demand courses at their fingertips to help them grow professionally. Want to attend a conference or training? Go ahead. Just pass on what you learned
Work location: Vast majority of our roles are fully remote, focused on hiring the best talent and allowing you to perform from the comfort of your home. If you fancy a change of scene, we’ll also reimburse you up to $175 a month for a personal co-working space
Choice of tech: There’s no one-size-fits-all when it comes to the tech required to do your job. Choose the laptop and accessories you need when you join us, and we’ll refresh them every three years
Mindfulness: When you join the team, you can sign up for a complimentary subscription to Headspace to take advantage of the benefits of mindfulness and meditation. Our wellbeing resource group also organize sessions run by fellow Grafanistas or external trainers
Global Employee Assistance Program: We offer all team members a 100% confidential support service with 24/7 365 access to professionally qualified counsellors and specialists
Paid parental leave: Grafana offers paid parental leave to all eligible new parents. This offers Grafanistas time to bond with and care for their children in the first year after birth or adoption- You may not meet every requirement, and that’s okay. If this role excites you, we’d love you to raise your hand for what could be a truly career-defining opportunity
Proficiency in at least one programming language (Go preferred, but not required)
Clear communicator who can collaborate across teams and work autonomously
Strong Kubernetes experience in AWS, GCP, or Azure, and familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet, etc.)
Solid understanding of distributed systems design and large-scale system trade-offs
Working knowledge of Linux internals, networking, cloud storage, and performance/scaling behavior
Experience operating distributed systems in production (e.g., streaming systems, analytical databases, large-scale storage backends). Examples of these include Kafka, Redpanda, WarpStream, Postgres, ClickHouse, Snowflake, or Cassandra
6+ years of engineering experience, including meaningful time in SRE, platform engineering, production engineering, infrastructure engineering, or distributed systems roles
Experience participating in blameless incident response and writing high-quality post-incident reviews
Curious, pragmatic, action-oriented, and kind (this is important!)

More Jobs You Might Like

Senior Agent Customer Service (German Speaker)

OKX

Budapest Budapest Hungary

Payroll Specialist Lead - Ukraine

Remote

Remote-Ukraine

Payroll Lead Implementations - Denmark (Nordics)

Remote

Remote-Denmark

Payroll Specialist Lead - Poland (Individual Contributor)

Remote

Remote-Poland

Payroll Specialist Lead - Poland (Individual Contributor)

Remote

Remote-Spain

Helpful Resources

Salary & Savings Calculator

Compare salaries across European cities and calculate your potential savings. Understand cost of living and take-home pay for tech jobs in Europe.

Career Guides

Expert advice on landing high-paying tech jobs in Europe. Tips on interviews, salary negotiation, and career growth from The European Engineer.

Access 8,000+ High-Paying Tech Jobs

Get unlimited access to our full database of 8,000+ jobs with advanced filters, salary comparisons, and exclusive career guides from The European Engineer.