job description
Join Avensys Consulting as a Site Reliability Engineer (SRE) specializing in Monitoring and Event Management in beautiful Bali! We are seeking a skilled professional to ensure the reliability, performance, and scalability of our systems while proactively managing incidents and optimizing infrastructure.
In this role, you will work closely with cross-functional teams to design, implement, and maintain robust monitoring solutions, automate operational tasks, and drive continuous improvement in system resilience. If you are passionate about DevOps, observability, and solving complex technical challenges, this is the perfect opportunity for you!
Apply now by submitting your updated CV in Microsoft Word format to [email protected].
Responsibility
- Design, implement, and maintain scalable monitoring and alerting systems to ensure high availability and performance.
- Develop and optimize event management processes to minimize incident resolution time.
- Automate operational tasks using scripting and infrastructure-as-code (IaC) tools.
- Collaborate with development and operations teams to improve system reliability and efficiency.
- Conduct post-incident reviews and implement preventive measures to reduce future outages.
- Monitor system health, performance metrics, and capacity planning to proactively address potential issues.
- Enhance observability through logging, tracing, and metrics collection.
- Stay updated with industry best practices and emerging technologies in SRE and DevOps.
Qualifications
- Bachelor’s degree in Computer Science, Engineering, or a related field.
- Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role.
- Strong knowledge of monitoring tools (e.g., Prometheus, Grafana, Nagios, Datadog).
- Experience with scripting languages (Python, Bash, Go) and automation frameworks.
- Familiarity with cloud platforms (AWS, GCP, Azure) and containerization (Docker, Kubernetes).
- Understanding of CI/CD pipelines and infrastructure-as-code (Terraform, Ansible).
- Excellent problem-solving and troubleshooting skills.
- Strong communication and collaboration abilities.