Discovery Communications, LLC
Receive alerts when this company posts new jobs.
Director Site Reliability Engineering
at Discovery Communications, LLC
- Requisition ID
- Career Category
- IT & Technical Operations
- Company Employee Full-Time
Reporting to the VP Technology Operations this position is critical in lead the Service & Engineering Improvement team as part of the Technology Operations Group. The post holder will lead a small team of Service Reliability, Monitoring, Automation and Date Reporting specialists who support an effective operations group. The Director of Site Reliability Engineering is a self-starter willing to take the initiative. To succeed in this role the post-holder needs to be creative clever, passionate and love building and running teams.
Highlights of the role
This team supports our two global Technology Operations Centers. As a function these are the 24/7 Command & Control Hub for all our all Distribution and IT support services. The position is key to ensuring organizational improvements, consistently improving and maintaining our availability and uptime, establish effective automation and monitoring to deliver successes and areas of opportunity.
This role will partner with engineering and workforce technology teams to advocate sensible, scalable systems design as well as building the best tools to diagnose, resolve and prevent issues. Although this is not necessarily a hands-on operations role, as an engineering leader, the Director Site Reliability Engineering the voice of Technology Operations, should be able to lead in technical discussions, challenging or supporting them as needed. The post holder is an ambassador for Service Reliability Engineering and good design within GT&O and so should be a great communicator and enthusiastic champion of Technology Operations.
This position is a member of the leadership team for Technology Operations and will guide the development of the team, and communicate the direction of the organization. The post holder is expected to work regular office hours but during large events should expect to work outside of this including weekends and nights occasionally.
1. Collaborate with engineering and product teams to provide a path to live operations that support development objectives
2. Partner with relevant GT&O and Digital leadership teams on technology implementation
3. Ensure impacts on the department are understood and that mechanisms in place to manage these impacts and ensure service continuity
4. Delivering overall path to live operations that form a standard platform into engineering and products teams
5. Leads the development of a KPI and Dashboard roadmap across all of Global Infrastructure Services.
6. Leads the performance review cycle for technical services within Global Infrastructure Services
7. Track & implement corrective actions around achieving 99.995% availability
8. Collaborates with Architects and Engineers to improve the resilience of Discovery systems
9. Conducts formal operational readiness reviews of proposed engineering designs, controls, and test plans.
10. Drives continuous improvement to monitoring and tooling platforms used in Technology Operations
11. Ensures the delivery of real time and meaningful data and service reporting to support mature decision making
12. Exploits automation to optimise operational effectiveness. Develops sound business proposals cases to support the drive to data and automation.
13. Designs and implements a governance framework to ensure that event, incident, major incident, problem and knowledge management processes are working effectively.
14. Identifies failings and delivers improvements in internal workflows and partners business stakeholders in their workflows that exploit technology
15. Leads root cause analysis reports and accountable for any remediation planning that results from lessons learnt. Perform incident analysis and provide recommendation, including pushing for delivery
16. Ensures that knowledge bases including the Known-Error Database are maintained by the team and are up to date
17. Ensure service levels and targets are adhered to and corrective measures in place to maintain performance targets
18. Maintains skills and career path framework for. Ensures these are in place for all staff
19. Motivate other teams through effective and proactive leadership techniques through stressful situations
20. Guide and mentor GT&O and user base in service reliability and availability framework. Ensures these are in place for all staff
21. Lead and deliver small to mid-size projects or organisational change within operations centre scope
22. Responsible for implementing a team culture based on collaboration, best practices, standards, efficiency, and commitment to effective service delivery and responsiveness to the needs of the business
23. Ensures communications are accurate, timely and messages for multiple audiences
24. Develop and maintain strong working relationships with key business leads and senior stakeholders within the customer base
25. Develop and maintain strong working relationships across all IT disciplines
26. Develop and maintain strong working relationships across GT&O
27. Develop and maintain strong working relationships with 3rd party suppliers and outsourced service partners
28. Deputises for VP Technology Operations as required
* Bachelor’s degree in IT Management, Software or Broadcast Engineering, or equivalent work experience
* 3+ years direct management experience in an IT, Broadcast or Digital Support function
* 10+ years’ experience in an Enterprise-level support environment. Experience in a service delivery environment and understanding of technical support processes and workflow. Breadth of experience by having a background in both operations and technology architecture, design, and development. Can demonstrate through experience the impact on operations the decisions made upstream in engineering and architecture
* Strong background in System Administration/architecture
* Strong background in Configuration and management of large scale platforms. (Virtualization, Cloud, Unix, Linux, Java, SQL, Oracle…)
* Demonstrable expertise in monitoring and logging of large scale platforms. (Solarwinds, Nagios, Splunk….)
* Proven experience of implementing change to enforce high availability on large scale platforms.
* Understanding of Agile/Scrum and deep understanding of Dev Ops Practice within a linear and digital environment
* Working knowledge of ITIL required. Foundation certification expected. Must be able to effectively communicate with owners of ITIL Disciplines (Incident, Problem, Change, Release, and Configuration) to provide effective IT support to the end-users.
* Excellent verbal, written, interpersonal communication and customer service skills
* Strong organizational and conceptual skills combined with proven critical thinking, analytic, problem solving, and decision-making abilities
* Ability to multi task within related functions
* Demonstrated ability to recruit, develop, and retain staff
* Strong ability to demonstrate and execute professional communication skills to all levels of management
* Project management experience desired
* Ability to proactively communicate to senior leadership on areas of opportunity and a solutions oriented problem solver
* Positive attitude and experience with motivating a team
* Able to demonstrate a high degree of flexibility, including flexibility in working hours to support employees and customers across multiple time zones.
* Experience of working for a Media Company/Broadcast is desirable but not essential
* Knowledge of local employment laws is beneficial.
* Must have the legal right to work in US
Sterling, Virginia, VA