Share this Job

Sr DevOps Engineer

Date:  May 11, 2022
Location: 

remote, MA, US, remote

Onsite or Remote:  Remote
Company Name:  EBSCO Information Services

EBSCO Information Services (EIS) provides a complete and optimized research solution comprised of e-journals, e-books, and research databases - all combined with the most powerful discovery service to support the information needs and maximize the research experience of our end-users. Headquartered in Ipswich, MA, EIS employs more than 2,700 people worldwide, most now working hybrid or remotely. We are the leader in our field due to our cutting-edge technology, forward-thinking philosophy, and outstanding team. EIS is a company that will motivate you, inspire you, and allow you to grow. Our mission is to transform lives by providing relevant and reliable information when, where, and how people need it. We are looking for bright and creative individuals whose unique differences will allow us to achieve this inclusive mission around the world.

Focus

Leading with an automation-first mindset, the Sr. DevOps Engineer serves as an important strategic and technical resource within the Incident Command team; of which is responsible for supporting and automating our incident response process to ensure high availability, redundancy, recoverability, and capacity. This position is a key contributor and enabler to the business information systems. Opportunities include working with on-prem & cloud infrastructure alike in coordination with executive & architectural teams.

Primary Responsibilities

· You will play a critical role on the Incident Command (IC) team to put a heavy emphasis on automation and high-fidelity rules with enough context to streamline all steps end-to-end

· Take ownership of innovating the incident command process by spearheading initiatives to identify and eliminate all forms of manual toil from our workflows; permanently replacing the need of human touch

 

· Collect data insights from incident trends affecting business-critical uptime metrics, prepare and communicate incident reports for a broad audience of individual contributors to C-suite executives.

· Automate analytical processes such as system & network log analysis to re-assemble and replay incident event history for root cause analysis & impact costs.

 

· Strategically taking into consideration what details are prudent in order to accurately measure outage frequency, volume, downtime costs, and reliability threats on a proactive basis.

· Reduce operational inefficiencies in the incident management process to ensure the fastest path to SREs through automation and continuous process improvement

· Ensure resolution of product/service defects, process improvements and documentation enhancement to address live incidents and prevent failures from needing to be reported by the customer

 

 

Requirements

· 5+ years of experience managing incidents and taking ownership of the end-to-end workflow of incident management, preferably in enterprise class, loosely coupled environments

· 3+ years of demonstrated capability of fully working across technical stacks while leveraging the right tools for the right job, such as, but not limited to: Kubernetes, Spinnaker, Secrets Manager, & IaC tools

· Advanced knowledge of ephemeral services in a complex cloud environment; evaluate new technologies and industry trends, develop proof-of-concepts and present findings to the engineering organization

· Strong programming / scripting skills – Python, Go, Shell, YAML, etc. Must be focused on creating clear documentation that leads to repeatable actions/SOPs, and finally into automations

 

· Analytical and creative problem-solving experience with large-scale programs, capable of interpreting bottlenecks in complex systems with a high attention to quality & detail of all deliverables

 

· Excellent ability to communicate in all directions of the org chart and understand the importance of laying out context for the team, instead of prescribing solutions

Preferred Skills

· Creative problem solving & thinking outside of the box (how do we make our current solutions, better?)

• Intermediate to expert knowledge of infrastructure automation tools (Terraform, Ansible, Buddy, etc.)

• Advanced understanding of application performance monitoring, resiliency testing & chaos engineering

• Experience with managing private, hybrid & public cloud environments (AWS preferred)

· Experience working in a structured change control and configuration management environment

· Demonstrable experience with distributed systems design and integration architectures of business apps using microservices, containers and cloud infrastructure

COVID VACCINATION REQUIREMENT: As directed by Executive Order 14042: Ensuring Adequate COVID Safety Protocols for Federal Contractors, all current and newly-hired EIS employees in the United States are required to be fully vaccinated by January 18, 2022 or by their date of hire.
We are an equal opportunity employer and comply with all applicable federal, state, and local fair employment practices laws. We strictly prohibit and do not tolerate discrimination against employees, applicants, or any other covered persons because of race, color, sex, pregnancy status, age, national origin or ancestry, ethnicity, religion, creed, sexual orientation, gender identity, status as a veteran, and basis of disability or any other federal, state or local protected class. This policy applies to all terms and conditions of employment, including, but not limited to, hiring, training, promotion, discipline, compensation, benefits, and termination of employment. We comply with the Americans with Disabilities Act (ADA), as amended by the ADA Amendments Act, and all applicable state or local law.


Job Segment: Engineer, Information Systems, Business Process, Engineering, Technology, Management