Sr Site Reliability Engineer
Burbank, California, United StatesApply NowApply Later
Job ID 772860BR Location Burbank, California, United States Business Disney Media & Entertainment Distribution Date posted Feb. 01, 2021
Job Summary:As a Senior Site Reliability Engineer, you will use your deep skill and experience to setup, monitor, optimize, and maintain of our various applications and infrastructure on AWS – and in so doing, keep all of our services and production systems running smoothly. You will be an important part of a motivated team where we’ll be looking for you to collaborate with the team to apply engineering best practices, influence a strong culture of operational discipline, and drive automation and repeatability in everything we do. We want someone that has grown up through the system administration ranks that has embraced the cloud and modern infrastructure engineering practices. You should possess 7 or more years’ experience in supporting the deployment and operations of highly scalable, secure, performant, and mission-critical web applications, services, and infrastructure. The ideal candidate would possess deep skills in infrastructure engineering and automation, while having expertise with orchestration and containerization platforms (Docker and Kubernetes), coupled with hands on AWS infrastructure experience.
While The Walt Disney Company is a large company, ideally, we’d like if you have some previous experience in a U.S. startup environment and be comfortable performing multiple roles. We strive to get things done quickly, with high quality, and we are committed to the agile methodology. Our work is guided by lean principles (looking at value and looking for waste; not doing anything ‘for the sake of doing it’).
We’re looking for a passionate, impactful, senior technologist who wants to come here to do their very best work and make their mark, add their chapter to the long and storied history of The Walt Disney Company. Someone who holds themselves and their teammates accountable in a professional, collaborative manner. A collaborative teammate who seeks to bring the best out in themselves and those around them. Someone who can provide a fresh perspective and innovative insight to our initiatives.
- Ownership of our current cloud infrastructure, including Kubernetes Clusters - install, upgrade, maintain all necessary middleware components within Kubernetes (EKS, AKS, on-premise) and other cloud services necessary for application and service operations
- Maintain CI/CD Pipeline Ecosystem - maintain and expand upon the CI/CD ecosystem
- Uptime and Availability - implement the necessary facilities to maintain uptime and availability of the Development and Production (Kubernetes clusters and other cloud services) as well as the CI/CD ecosystem
- Change Management - implement the facilities necessary to allow software engineering teams to perform progressive rollouts (blue/green, canary, dark launch, etc.) of new versions of software applications
- Auditing – work to ensure the audit-ability of the Kubernetes clusters and CI/CD ecosystem, so that every change to the infrastructure is captured
- Security – work to ensure that the Kubernetes clusters, cloud services, and CI/CD ecosystem is secure against potential threats from outside and within the Disney network
- Monitoring - monitoring facilities shall to be in place to perform active monitoring of infrastructure, CI/CD ecosystem, and application health
- Observability - implement and provide facilities to enable the observability of clouds services, including Kubernetes clusters and CI/CD ecosystem, so that when an incident occurs, SREs and the software engineers are able to effectively collaborate and develop a solution in a timely manner
- Incident Response – work to compile a runbook that identifies all known, potential risks and incidents, and have well-defined procedures to mitigate or eliminate the risk if they occur, this role does have on-call and incident resolution responsibilities
- Problem Management - conduct regular post-mortem sessions in conjunction with the relevant software engineering teams for any critical incidents that occur in production systems
- Collaboration - work with the software engineering teams to continually optimize the design of the software and infrastructure to minimize the need for emergency response
- Proactive testing - conduct chaos exercises to test the emergency response readiness of software, infrastructure and response team members to critical incidents
- Capacity Planning - engage in capacity planning in conjunction with software engineering management and technical leadership
- Have 7+ years of experience in system administration, SRE or software engineering in a large enterprise environment
- Be able to demonstrate significant experience working with AWS cloud infrastructure and services, k8s, Helm, Docker, Terraform, CloudFormation, Istio, AWS networking, IAM roles, scripting in Perl/Python/Shell
- Have a strong interest in advancing technology whenever possible/practical
- Possess good communication skills and enjoy mentoring and helping others to succeed as a team
- Care about your craft and have opinions about the “right” way to do things with technology
- Previous work experience in Ad Technology a plus
- Experience working with vendor teams to deliver high quality results
- Strong curiosity about how Disney delivers the Magic and a desire to be a part of it
- Hold a Bachelor’s degree in Computer Science, Computer Information Systems, Engineering, or another technical field
- Master's degree in Computer Science, Software Engineering or related technical discipline is highly desirable
About Disney Media & Entertainment Distribution:
Comprised of the Company’s international business units and various direct-to-consumer streaming services, Disney Media & Entertainment Distribution (DMED) aligns technology, media distribution and advertising sales into a single business segment to create and deliver personalized entertainment experiences to consumers around the world.
About The Walt Disney Company:
The Walt Disney Company, together with its subsidiaries and affiliates, is a leading diversified international family entertainment and media enterprise with the following business segments: media networks, parks and resorts, studio entertainment, consumer products and interactive media. From humble beginnings as a cartoon studio in the 1920s to its preeminent name in the entertainment industry today, Disney proudly continues its legacy of creating world-class stories and experiences for every member of the family. Disney’s stories, characters and experiences reach consumers and guests from every corner of the globe. With operations in more than 40 countries, our employees and cast members work together to create entertainment experiences that are both universally and locally cherished.
This position is with Disney Streaming Technology LLC, which is part of a business segment we call Disney Media & Entertainment Distribution.
Disney Streaming Technology LLC is an equal opportunity employer. Applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, or protected veteran status or any other basis prohibited by federal, state or local law. Disney fosters a business culture where ideas and decisions from all people help us grow, innovate, create the best stories and be relevant in a rapidly changing world.