Connect the world with us

Director, Site Reliability Engineering

San Francisco

The Incident Response Team is a newly formed team, reporting directly to the VP of Engineering, Security and Infrastructure and will be at the heart of Global Operations at MuleSoft. This team will be responsible for the initial response and triage of all operational incident issues and will be the champion for the lifecycle of these incidents, working directly with Engineering Managers to groom work backlogs to prioritize high impact fixes.

The Director, Site Reliability Engineering will build and lead the Incident Response Team responsible for making sure our services maintain the highest availability. You will lead Incident Retrospectives across the engineering teams who identify failures in people, process, and technology that lead to incidents and develop corrective actions and track through to completion. This will involve communicating statuses of incidents to the business and support for communication outbound to customers. You will have the ability to lead, own, develop, and refine the Change Management Process, the overall Cost Management Initiatives, and the Change Control Review Board (CCRB), as well as developing statistical measures of success for the CCRB. You will own the end-to-end Incident Management and Problem Management Processes, build the policies and procedures to respond to incidents and match the business needs, and partner with various groups. 

Goals for your first three months:

30 days:

  • Collaborate with the Engineering and DevOps teams to start to understand the environments and staffing requirements for operating a 24/7 team to respond to incidents
  • Build the overall Incident and Event Management Policies and Process
  • Work with various stakeholders in the organization to build requirements and identify gaps in documents and runbooks
  • Start to hire a team in both SF, BA or ORD (the team doesn’t need to be 24/7 to start)

60 days:

  • Establish and exercise the incident response plan for operational issues
  • Build metrics around SLAs, MTTx and other core KPIs for the team and start to own the statistical reporting and data management functions for incidents (SLAs, Mean Time to X calculations), Change (Change Induced Incident Minutes, etc.), and Problem Management (Actions, Completion %)
  • Work with engineering teams to make sure that we have full coverage of operational issues across all services
  • Start to build end-to-end knowledge and instrumentation of the system to identify if we have issues

90 days:

  • Establish the cadence of the team and have all the foundational set of policies and procedures in place
  • Have buy-in from all engineering management and leadership for the direction of the team
  • Have the team off the ground and working incidents, RCA process, and change management

The ideal candidate will have:

  • Senior leadership experience with incident, change and problem management in a software engineering organization with dozens of stakeholders and conflicting priorities, and the ability to build a team from the ground-up 
  • PMO, PGM, Jira, and Agile experience
  • Experience and ability to build and present SLA and other technical data to executive management 
  • Certifications involving disaster, security, incident and problem management (GIAC, SANS, ITIL, CERT, FEMA, etc.) - these are helpful but not required 

What you’ll get from us:

We realize exceptional people don’t choose jobs based solely on benefits, but we do our best to make sure that you’re set up for success so you can do your best work. As a Muley, you’ll be based in our downtown Union Square HQ and receive comprehensive health benefits, life insurance, paid parental leave, 401K, equity, and flexible vacation time. Plus the fun stuff, like a fully stocked kitchen, catered lunches, volunteer opportunities, onsite happy hours and free yoga classes, annual rafting trip and offsite activities, and MeetUp, our annual all-company offsite in California. Check out our Life at MuleSoft page to learn more!





About Us

Our mission is to help organizations change and innovate faster by making it easy to connect the world’s applications, data, and devices. Companies like Spotify, Salesforce, McDonald’s, and Unilever rely on MuleSoft to stay agile, deliver faster, and make the most of their IT investment with API-led connectivity. Hiring exceptional people who want to build a great company together is our number one priority, and we’re committed to providing an equal opportunity workplace where everyone is supported and inspired to do their best work. We work tirelessly to build this culture, and we’re proud to have been named the #1 Top Workplace in the Bay Area and a best place to work 5 years in a row. We’re growing fast, and there’s plenty of opportunity for you to make an impact—join us!

To all recruitment agencies: MuleSoft does not accept unsolicited agency resumes. Please do not forward resumes to MuleSoft employees or to any other company location. MuleSoft is not responsible for any fees related to unsolicited resumes and will not pay fees to any third-party agency or company that does not have a signed agreement with the company. MuleSoft provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, marital status, amnesty, or status as a covered veteran in accordance with applicable federal, state and local laws. MuleSoft complies with applicable state and local laws governing non-discrimination in employment in every location in which the company has facilities. This policy applies to all terms and conditions of employment, including, but not limited to, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training. MuleSoft expressly prohibits any form of unlawful employee harassment based on race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status. Improper interference with the ability of MuleSoft employees to perform their expected job duties is absolutely not tolerated.


Refer a friend