MuleSoft careers

Director, Site Reliability Engineering

San Francisco

The Incident Response Team is a newly formed team, reporting directly to the VP of Engineering, Security and Infrastructure and will be at the heart of Global Operations at MuleSoft. This team will be responsible for the initial response and triage of all operational incident issues and will be the champion for the lifecycle of these incidents, working directly with Engineering Managers to groom work backlogs to prioritize high impact fixes.

The Director, Site Reliability Engineering will build and lead the Incident Response Team responsible for making sure our services maintain the highest availability. You will lead Incident Retrospectives across the engineering teams who identify failures in people, process, and technology that lead to incidents and develop corrective actions and track through to completion. This will involve communicating statuses of incidents to the business and support for communication outbound to customers. You will have the ability to lead, own, develop, and refine the Change Management Process, the overall Cost Management Initiatives, and the Change Control Review Board (CCRB), as well as developing statistical measures of success for the CCRB. You will own the end-to-end Incident Management and Problem Management Processes, build the policies and procedures to respond to incidents and match the business needs, and partner with various groups. 

Goals for your first three months:

30 days:

  • Collaborate with the Engineering and DevOps teams to start to understand the environments and staffing requirements for operating a 24/7 team to respond to incidents
  • Build the overall Incident and Event Management Policies and Process
  • Work with various stakeholders in the organization to build requirements and identify gaps in documents and runbooks
  • Start to hire a team in both SF, BA or ORD (the team doesn’t need to be 24/7 to start)

60 days:

  • Establish and exercise the incident response plan for operational issues
  • Build metrics around SLAs, MTTx and other core KPIs for the team and start to own the statistical reporting and data management functions for incidents (SLAs, Mean Time to X calculations), Change (Change Induced Incident Minutes, etc.), and Problem Management (Actions, Completion %)
  • Work with engineering teams to make sure that we have full coverage of operational issues across all services
  • Start to build end-to-end knowledge and instrumentation of the system to identify if we have issues

90 days:

  • Establish the cadence of the team and have all the foundational set of policies and procedures in place
  • Have buy-in from all engineering management and leadership for the direction of the team
  • Have the team off the ground and working incidents, RCA process, and change management

The ideal candidate will have:

  • Senior leadership experience with incident, change and problem management in a software engineering organization with dozens of stakeholders and conflicting priorities, and the ability to build a team from the ground-up 
  • PMO, PGM, Jira, and Agile experience
  • Experience and ability to build and present SLA and other technical data to executive management 
  • Certifications involving disaster, security, incident and problem management (GIAC, SANS, ITIL, CERT, FEMA, etc.) - these are helpful but not required 

About Our Benefits:

  • Equity and generous Employee Stock Purchase Program
  • Unlimited vacation
  • Gym discounts and weekly onsite yoga classes
  • Catered lunches three times a week and a fully stocked kitchen
  • Annual MeetUp, our company-wide offsite to learn, grow, and connect
  • Frequent office activities and offsites, like Muleys at the Ballpark, Waffle Wednesdays, and family nights
  • Regular opportunities to give back to the community together
  • Comprehensive medical, dental, and vision insurance for you and your family
  • 401(k) and pre-tax health insurance, dependent care, and commuter benefits (FSA)





About Us

Our mission is to help organizations change and innovate faster by making it easy to connect the world’s applications, data, and devices. MuleSoft's Anypoint Platform is at the heart of the applications and services you use every day, like Spotify and Salesforce, from Global 500 corporations to emerging companies in approximately 60 countries. Hiring the best and brightest people is our number one priority—our people define our culture and our future. We’re committed to providing an equal opportunity workplace where everyone is supported and inspired to fulfill our mission and build a successful company together. One of the fastest growing software companies, now with 18 global offices, we’ve been named the #1 Top Workplace in the Bay Area and a best place to work five years in a row—but we’ve barely scratched the surface. We need fiercely determined people to help us take on this challenge. Join the team!

To all recruitment agencies: MuleSoft does not accept unsolicited agency resumes. Please do not forward resumes to MuleSoft employees or to any other company location. MuleSoft is not responsible for any fees related to unsolicited resumes and will not pay fees to any third-party agency or company that does not have a signed agreement with the company. MuleSoft provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, marital status, amnesty, or status as a covered veteran in accordance with applicable federal, state and local laws. MuleSoft complies with applicable state and local laws governing non-discrimination in employment in every location in which the company has facilities. This policy applies to all terms and conditions of employment, including, but not limited to, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training. MuleSoft expressly prohibits any form of unlawful employee harassment based on race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status. Improper interference with the ability of MuleSoft employees to perform their expected job duties is absolutely not tolerated.


Refer a friend