Ideagen

Senior Site Reliability Engineer

Posted Date 3 months ago(6/26/2024 11:50 AM)
Job ID
2024-1752
# of Openings
1
Category
IT Infrastructure
Role type
Permanent
Working
In Office
Name
India - Hyderabad

About Us

Ideagen is the invisible force behind many things we rely on every day - from keeping airplanes soaring in the sky, to ensuring the food on our tables is safe, to helping doctors and nurses care for the sick.

 

So, when you think of Ideagen, think of it as the silent teammate that's always working behind the scenes to help those people who make our lives safer and better. 

 

Everyday millions of people are kept safe using Ideagen software. We have offices all over the world including America, Australia, Malaysia and India with people doing lots of different and exciting jobs. 

 

Ideagen believe that by recruiting diverse and talented individuals, we create an inclusive community for all. We are committed to empowering all colleagues to maximise their potential and express their unique characteristics, experience, and knowledge to achieve their ambitions.

 

As the Monitoring & Observability Lead you will be responsible for overseeing and managing the infrastructure monitoring team to ensure the optimal performance and reliability of our SaaS infrastructure across a multi-cloud environment.

 

You will design, implement, and maintain our infrastructure and application monitoring solutions – across platforms such as New Relic, Datadog, and CloudWatch. You will analyse system performance data and work collaboratively with various teams to prevent and resolve issues.

 

Your role is critical in ensuring that our systems are available, reliable, and scalable to meet the needs of the business.

Responsibilities

  • Develop and implement a comprehensive infrastructure and application monitoring strategy.
  • Lead, mentor, and manage a team of monitoring specialists.
  • Analyse trends and patterns in system performance data to proactively address potential issues.
  • Collaborate across development, infrastructure, and application teams to optimise system performance and reliability.
  • Generate regular reports on infrastructure performance, incidents, and trends for senior management.
  • Maintain comprehensive documentation of monitoring configurations, procedures, and policies.
  • Ensure compliance with industry standards and best practices.
  • Identify opportunities to enhance monitoring capabilities and implement innovative solutions.
  • Collaborate with cross-functional teams to ensure observability requirements are met and integrated into the development lifecycle.

Skills and Experience

We don’t expect you to be an expert in everything but with our technology stack experience of some of the following is essential:

  • Experience in production 24/7 high-availability SaaS environments based on AWS, Azure and/or GCP.
  • Deep knowledge of infrastructure and application monitoring and alerting tools – such as New Relic, Datadog, CloudWatch, OpsGenie etc.
  • Experience with Infrastructure as Code (IaC) tools such as Terraform, CloudFormation.
  • Experience of working alongside development functions delivering software within an agile development environment.
  • Strong scripting skills in various languages such as Python, BASH, and/or PowerShell.
  • Working alongside development functions delivering software within an agile development environment.
  • Proven ability to grasp new technical concepts quickly

Desirable:

  • Strong understanding of Software Development Lifecycles
  • Understanding of compliance standards-based infrastructure such as ISO27001, Cyber Essentials & FedRAMP, and general regulatory compliance management.
  • Exposure to ITIL concepts and adoption.

Behaviour

  • Ambitious Drive, Planning & Execution
  • Adventurous - Flexibility & Resilience and Savvy Thinking
  • Community - Collaboration & Communication

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.