System Reliability Operations Engineer
Apply NowApply Later Job ID 10050587 Location Lake Buena Vista, Florida, United States Business The Walt Disney Company (Corporate) Date posted May 24, 2023Job Summary:
Within Disney Enterprise Technology, the Disney Technology Operations Command Center (DTOC) is a 24x7x365 critical services operation center responsible for service availability, with main focus to rapidly respond to, correlate for, and reduce impact of outages. We are accountable for identifying and facilitating the resolution of service impacting events, and collaborating with other technology teams to prevent future impact through proactive event management, incident and problem analysis. DTOC drives the execution of the major incident process including communication to executives and key partners, including owning and implementing Crisis Management plans and processes. DTOC also provides ongoing first and second-level technical support of requests, performs validation procedures for routine system/service checks, and fulfills proactive monitoring of significant business events.
System Reliability Operations (SRO) Engineers ensure all processes and functions within our environment operate correctly and efficiently – monitoring, identifying, and coordinating with other technologists across segments to fine-tune system operations and resolve service interruptions. This role is responsible for the end-to-end reliability and operations of IT services and performing consultations and training to other clients and segments across Disney. SROs consistently and reliably triage reported or automated incidents, apply recovery procedures, and engage domain experts to restore steady-state operations. Additionally, this position will drive service improvement initiatives through proactive monitoring and enhancement actions from gaps identified through analytics and problem management.
Responsibilities:
Supervise the performance and availability of enterprise applications, systems, and infrastructure, ensuring they meet or exceed established service level objectives (SLOs)
Proactively identify, diagnose, fix, and resolve infrastructure, application, and IT operations issues in collaboration with other IT support teams
Develop, implement, and maintain automation tools and scripts to improve the efficiency and reliability of IT operations and infrastructure
Implement and maintain technology observability and alerting solutions to provide real-time insights into system health, performance, and compliance
Effectively apply Problem & Incident Analysis techniques during an incident and post-incident
Address outages in a timely fashion, ensuring work streams towards resolution following department procedures while presenting business impacts
Analyze and publish operational utilization and service performance metrics
Identify and drive service availability improvement opportunities by driving leading practices
Ensure that all DTOC services are designed to deliver the levels of availability required by the business
Perform DR/BCP activities for critical events and emergency onsite response
Identify service improvement opportunities through trend analysis, proactive techniques, and after-action reviews
Required
2+ years experience supporting converged infrastructure stacks including application, compute, storage, and networking
2+ years incident recovery with demonstrated experience with Service and Event Management tools
Proficiency in one or more scripting/automation languages (ex. Python, PowerShell, Bash, Ruby)
Experience within network technologies (WAN/LAN, wireless infrastructure, DNS/DHCP, Load-Balancers, Accelerators)
Solid understanding of observability, monitoring, and alerting tools (ex. Splunk, New Relic, Grafana, ELK Stack, Datadog)
Demonstrated experience in systems integration, application infrastructure support, and middleware operations.
Experience with hands-on support of cloud operations (AWS, Google Cloud, Azure)
Experience with x86 hardware technology, Windows, Linux, RISC operating systems, P-Series hardware, SAN, NAS, and data protection technologies
Experience in enterprise IT operations including system administration, application platforms, infrastructure, networking fundamentals, and IT service management
Experience working in a 24x7 IT operations environment
Strong technology problem-solving and analytical skills, with the ability to quickly diagnose and resolve technical issues.
BA/BS in Computer Science, Engineering or related field; or equivalent work experience
Preferred
Master’s degree in a technical field
Certification/s within Kepner-Tregoe, ITIL Foundations (V3), operating systems, visualization, and/or hardware platforms
About The Walt Disney Company (Corporate):
At Disney Corporate you can see how the businesses behind the Company’s powerful brands come together to create the most innovative, far-reaching and admired entertainment company in the world. As a member of a corporate team, you’ll work with world-class leaders driving the strategies that keep The Walt Disney Company at the leading edge of entertainment. See and be seen by other innovative thinkers as you enable the greatest storytellers in the world to create memories for millions of families around the globe.
About The Walt Disney Company:
The Walt Disney Company, together with its subsidiaries and affiliates, is a leading diversified international family entertainment and media enterprise with the following business segments: Disney Entertainment, ESPN, Disney Parks, and Experiences and Products. From humble beginnings as a cartoon studio in the 1920s to its preeminent name in the entertainment industry today, Disney proudly continues its legacy of creating world-class stories and experiences for every member of the family. Disney’s stories, characters and experiences reach consumers and guests from every corner of the globe. With operations in more than 40 countries, our employees and cast members work together to create entertainment experiences that are both universally and locally cherished.
This position is with Disney Worldwide Services, Inc., which is part of a business we call The Walt Disney Company (Corporate).
Disney Worldwide Services, Inc. is an equal opportunity employer. Applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Disney fosters a business culture where ideas and decisions from all people help us grow, innovate, create the best stories and be relevant in a rapidly changing world.
Watch Our Jobs
Sign up to receive new job alerts and company information based on your preferences.
For Disney Job Alerts to work, JavaScript must be enabled in your browser.