Share this Job

Engineering Manager, Cloud Reliability-SRE (remote)

Date:  Nov 15, 2021
Location: 

remote, MA, US, remote

Company Name:  EBSCO Information Services

EBSCO Information Services (EIS) provides a complete and optimized research solution comprised of e-journals, e-books, and research databases — all combined with the most powerful discovery service to support the information needs and maximize the research experience of our end-users. Headquartered in Ipswich, MA, EIS employs more than 3,300 people worldwide. We are the leader in our field due to our cutting-edge technology, forward-thinking philosophy, and top-notch workforce. EIS, a division of EBSCO Industries Inc., based in Birmingham, AL, is ranked in the top 200 of the nation’s largest, privately held corporations according to Forbes magazine. EBSCO is a company that will motivate you, inspire you, and allow you to grow. We are looking for the best. If you are too, we encourage you to explore our unique opportunities.

Cloud Reliability Engineering Manager (SRE)

As the Cloud Reliability Engineering Manager, you will play a key role in helping EBSCO become a cloud first company. You will provide our internal engineering teams the support and resources to learn and adopt reliability engineering best practices for cloud-based workloads which operate in AWS.  The Cloud Reliability Engineering Team is responsible for providing Incident Management and Reliability Engineering Consultation across the organization as well as providing troubleshooting assistance on high-impact incidents which the application teams are unable solve.

 What you’ll do

  • Provide thought leadership to the evolution of reliability engineering at EBSCO.
  • Reduce toil for development teams and SRE in any way you can
  • Provide consultation to development teams to help them build reliable and scalable services, and resolve any production issues as quickly as possible
  • Assist development teams to establish and report on SLAs, SLOs, SLIs
  • Stay abreast of the latest SRE methodologies, and skillfully adopt the appropriate ones
  • Foster innovation within the team, and join others manifesting the new SRE discipline
  • Assist with debugging production issues involving sophisticated, large scale distributed systems.
  • Provide leadership to the team of incident managers, ensuring successful and repeatable management of production incidents
  • Facilitate blameless postmortems and document outage impact and lessons learned to ensure problems don’t repeat
  • Work with leaders to prioritize root cause resolution, automation, and projects that improve customer experience and the teamwork environment
  • Partner with Cloud Platform Engineering to identify and implement automation opportunities to improve observability and operations
  • Mentor and coach team members on a regular basis, help plan and support career growth, and recruit top engineering talent

Skills or experience you need

  • Master’s or Bachelor’s degree in computer science or engineering related field or equivalent work experience
  • Experience partnering with senior leaders as part of a broad organizational product development leadership team that focuses on ensuring and delivering customer value
  • Strong expertise in managing production incidents with mission-critical software, driving for resolution and stakeholder communication during incidents. 
  • Experience with SRE tools, metrics, process, and culture
  • Prior experience leading SRE teams in a SaaS environment
  • Strong software engineering and operations background
  • Experience leveraging and extending modern observability tooling (Prometheus, Grafana, Logstash, etc.)
  • A keen interest in Cloud Transformation
  • Ability to work independently and as part of a team
  • Ability to multi-task and prioritize deadlines
  • Excellent verbal and written communication skills with great attention to detail and accuracy

Skills or experience that will set you apart

  • Experience leading through an SRE transformation - leading application development teams to develop engineering capabilities that improve ability to deliver at scale
  • Prior experience as a systems/software architect
  • AWS Solutions Architect certification
  • A solid grasp of development, automation, and monitoring tools and practices (Cloudformation/Terraform/Atlantis/Grafana/Ansible/Puppet/PowerShell/Python/GitHub Actions) as well as an expert understanding of cloud networking technologies.
  • Thorough understanding of AWS infrastructure and services including but not limited to EKS, ECS, EC2, EBS, S3, CloudWatch, Cloud Trail, API Gateway, ALB, Route 53, Transit Gateway, AWS VPN, AWS Direct Connect, IAM, AWS Config, etc.
  • Experience leading strong personalities by influence
  • Exhibits sound judgement and can make wise decisions despite ambiguity, identifies root causes and gets beyond treating symptoms

Who you are

  • A self-directed, rapid learner
  • Collaborative by choice
  • Good at sorting through ambiguity
  • Fun and reasonable
  • Prepared and responsible
  • Empowered through humility
  • Goal and detail oriented
  • Make good choices and accepts and address mistakes
  • Effective written and face-to-face communicator
  • Intensely curious and ask thoughtful questions to evolve your thinking

 

 

EBSCO Industries, Inc.is an equal opportunity employer and complies with all applicable federal, state, and local fair employment practices laws.  EBSCO strictly prohibits and does not tolerate discrimination against employees, applicants, or any other covered persons because of race, color, sex (including pregnancy), age, national origin or ancestry, ethnicity, religion, creed, sexual orientation, gender identity, status as a veteran, and basis of disability or any other federal, state or local protected class.  This policy applies to all terms and conditions of employment, including, but not limited to, hiring, training, promotion, discipline, compensation, benefits, and termination of employment.

EBSCO complies with the Americans with Disabilities Act (ADA), as amended by the ADA Amendments Act, and all applicable state or local law.

View EEO PDF


Job Segment: Cloud, Engineer, Manager, Developer, Computer Science, Technology, Engineering, Management