Posted 3 weeks ago

SRE and Observability Lead

Company: Parts Town
Category: IT

 Addison, Hybrid

Type: Full Time

Parts Town seeks a SRE and Observability Lead to join their team. This newly established position at Parts Town is responsible for developing and leading the company’s enterprise observability and reliability capability. The SRE and Observability Lead will collaborate across multiple teams to ensure comprehensive monitoring of all environmental components. This role will designate Dynatrace as the system of record for platform health and apply SRE practices to improve availability, performance, and incident outcomes across applications, infrastructure, and integrations.
Salary range for this role is $99,133.63 – 133,784.51 which is based on including but not limited to qualifications, experience, and geographical location. Benefits package includes health, dental and vision insurance, 401(k) with match, employee assistance programs, paid time off, paid sick time off, paid holidays, paid parental leave, and professional development opportunities.
Responsibilities:

  • Own enterprise observability using Dynatrace across cloud, on-prem, ERP, WMS, eCommerce, APIs, and integrations.
  • Design service topology, dashboards, alerts, and health indicators that reflect business impact.
  • Apply SRE principles (SLIs, SLOs, error budgets where appropriate) to reduce incidents and improve resilience.
  • Accelerate incident detection and root-cause analysis; lead post-incident reviews focused on systemic fixes.
  • Identify reliability, performance, and capacity risks before they impact the business.
  • Define observability and SRE standards and enable teams to use them effectively.