Share this Job

Program Manager - Cloud Reliability Engineering - Cloud Enablement

Date:  Sep 7, 2022
Location: 

remote, MA, US, remote

Onsite or Remote:  Remote
Company Name:  EBSCO Information Services

EBSCO Information Services (EIS) provides a complete and optimized research solution comprised of e-journals, e-books, and research databases - all combined with the most powerful discovery service to support the information needs and maximize the research experience of our end-users. Headquartered in Ipswich, MA, EIS employs more than 2,700 people worldwide, most now working hybrid or remotely. We are the leader in our field due to our cutting-edge technology, forward-thinking philosophy, and outstanding team. EIS is a company that will motivate you, inspire you, and allow you to grow. Our mission is to transform lives by providing relevant and reliable information when, where, and how people need it. We are looking for bright and creative individuals whose unique differences will allow us to achieve this inclusive mission around the world.

 

Reliability Engineering Domain Leader

EIS is currently seeking a Reliability Engineering Domain Lead to drive sustainable reliability and availability outcomes for our next generation products and services.  This role will evangelize and enable industry leading best practices in the field of Reliability Engineering at an enterprise scale, involving many products, applications, services and engineering teams. Working with various stakeholders in Executive Leadership, Technology, Product Management, and Customer Satisfaction, this domain will help define and prioritize work that focuses on improving the “Operational Excellence”, “Reliability” and “Performance Efficiency” pillars of the AWS Well Architected Framework. As a leader of this domain, you will be responsible for managing and executing the overall program which improves the capability of DevOps teams to operate the software they create to meet service level objectives and eliminate the root causes of incidents. 

 

Responsibilities:

  • Partnering with engineering and application/domain leads to establish and maintain service level objectives (SLOs), recovery time objectives (RTOs) and recovery point objectives (RPOs) and build roadmaps to meet them
  • Aggregate and represent reliability efforts which are being executed across EIS
  • Partner with cloud enablement domain owners to develop strategic roadmaps Develop and maintain a strategic roadmap that enhances platform capabilities while aligning to business objectives
  • Work closely with Incident Command to analyze issues reported by customers, communicate root cause of issues, and ensure backlogs have work to address issues.
  • Partner with engineering teams to define and implement service-level indicators (SLIs)
  • Own and maintain the Enterprise Reliability Risk Register
  • Evangelize reliability practices across all engineering groups and enable them to adopt those practices
  • Partner with relevant stakeholders to create and execute the strategy and vision for incident management, disaster recovery, performance testing, chaos engineering, etc.
  • Participate in all pertinent SAFe ceremonies for the Cloud Platform Operations
  • Work with various stakeholders to help prioritize Reliability related work across various ART teams ensuring that highest business value is being achieved
  • Use, modify, create, and maintain tools, analytics, and reporting to summarize and highlight Reliability Domain key performance indicators (KPI) which include utilization, platform health and platform costs.
  • Apply data driven, pragmatic approach to understand how changes in reliability features impact customers and business outcomes.
  • Understand current policies related to various compliance/regulations and drive audits to ensure we are complying where applicable. Where not, create plan and prioritize work to ensure we comply.

 

Skills 

Requirements:

  • 10+ years of experience in systems/business analysis, software engineering or related field.
  • 3+ years of site reliability or reliability engineering domain experience
  • 5+ years of working as an individual contributor in a technical role (SRE, software/test engineer, architect, DevOps, etc.)
  • Prior experience working with executive stakeholders.
  • Strong communication and presentation skills.
  • Ability to communicate findings and recommendations to non-technical audiences.

Preferred Qualifications:

  • Experience with industry leading observability tools
  • Prior experience as a consultant
  • Prior experience with managing incidents/incident command.
  • Prior experience managing cross-functional programs which involve adoption of new practices, standards, etc.
  • Prior experience with change management
  • 3+ years of prior technology management experience
  • Experience with authoring Features / Capabilities / Epics and helping with their lifecycle management.
  • AWS

SAFe Practitioner

 

Target Annual Salary Range: $ 138,720-198,170. This Pay Target range includes a bonus target of 5% of base salary. The actual salary offer will carefully consider a wide range of factors including your skills, qualifications, education, training, and experience, as well as the position’s work location. EBSCO provides a generous benefits program including medical, dental, vision, life and disability insurance, flexible spending accounts, a retirement savings plan, paid parental leave, holidays and paid time off (PTO), as well as tuition reimbursement. View more about EBSCO’s benefits here: https://www.ebsco.com/about/benefits

 

We are an equal opportunity employer and comply with all applicable federal, state, and local fair employment practices laws. We strictly prohibit and do not tolerate discrimination against employees, applicants, or any other covered persons because of race, color, sex, pregnancy status, age, national origin or ancestry, ethnicity, religion, creed, sexual orientation, gender identity, status as a veteran, and basis of disability or any other federal, state or local protected class. This policy applies to all terms and conditions of employment, including, but not limited to, hiring, training, promotion, discipline, compensation, benefits, and termination of employment. We comply with the Americans with Disabilities Act (ADA), as amended by the ADA Amendments Act, and all applicable state or local law.


Job Segment: Test Engineer, Cloud, Testing, Program Manager, Engineering Manager, Engineering, Technology, Management