Reliability Engineer

Metro Systems Romania

At METRO SYSTEMS Romania, the Reliability Engineers are responsible for the pulse of the products they oversee. They ensure optimal reliability, performance, and stability of multiple products running on microservices, by enabling an engineering expertise in everything they do – from performance and reliability evaluation, to building tools and providing best practices consultancy and development support.


Technical skills for an awesome Reliability Engineer:
  • Ability to code anytime into one or more of the following: Java, Javascript, Python, Go or Scala
  • Knowledge about microservices architecture, with a focus on developing and running microservices on the cloud technologies stack, including but not limited to Kubernetes and Docker
  • Good knowledge of Linux internals and scripting
  • Performance monitoring and tuning guru, with a familiarity with tools like Prometheus and Grafana, ELK stack, Nagios/OMD, DataDog or similar.
  • Knowledge of web security and networking with a good understanding of protocols and services like TCP/IP, HTTP/S, SSL/TLS, DNS, LDAP, Load balancing.
  • Understanding of NoSQL databases (with a focus on Cassandra);

You're a great fit for us if you:
  • Focus intuitively in lowering the mean time from having ideas and making things happen;
  • Have experience in analyzing and troubleshooting large scale distributed systems;
  • Are always eager to learn new technologies and apply them in your day to day work
  • Are a clear communicator;
  • Are open to ideas and new challenges;
  • Have experience with automation tools;
  • Have a systematic problem solving approach, coupled with a strong sense of ownership and drive;
  • Are able to turn ideas into practice;
  • Have a good sense of humor;
  • Are always thinking, 'What happens if this fails?'.
  • Have a good command of English


The job of a Reliability Engineer therefore typically consists of playing different roles, as follows:
  • Investigator: the Reliability Engineer seeks continuously for reliability leaks on the systems overseen, investigates and finds its root cause. In order to achieve this, the engineer applies techniques of monitoring and alerting, and will receive the alerts on an on-call basis. The end goal of the investigator role is to ensure that the problems that need to be addressed on a regular basis have their cause identified and that the issue will not happen again.
  • Developer: the Reliability Engineer writes tools has two developer roles. First, the engineer writes tools to automate the maintenance and monitoring tasks, including tools to ensure that the systems are self-healing. Second, the engineer takes into account the architectural principles of microservices, and they code the reliability of the system by writing tools to ensure that the systems are self-healing. When needed, the developer role of the Reliability Engineer includes coding the reliability solution together with the development team
  • Consultant: the Reliability Engineer offers the gained expertise towards the development teams. Best practices and anti-patterns are shared between the development teams, and architectural principles are optimized for reliability and stability in mind.
  • Analyst: the Reliability Engineer analyses the metrics provided by the products and reports on them. They are the owners of the reliability metrics information, for which they provide transparency through the tools they develop.



  • Java/Javascript/Python/Go/Scala- knowledge
  • Docker/Kubernetes- knowledge
  • Linux- knowledge
  • Prometheus/Grafana/ELK stack/Nagios/OMD/DataDog- familiarity
  • TCP/IP, HTTP/S, SSL/TLS, DNS, LDAP, Load balancing- understanding
  • noSQL(Cassandra)- understanding

Metro Systems Romania

Performance, innovation and service in IT: this is what we offer around the world for METRO GROUP. With a team of 800 IT professionals working for more than 30 countries, METRO SYSTEMS Romania is innovating daily and is keen in leading METRO GROUP world of retailing into the future.

Office location

4C, Sos. Pipera Tunari


To be discussed during first HR interview.

Apply now

( no account necessary )

Recruitment steps

  • Step 1

    Online technical testing

  • Step 2

    Interview (Talk with an HR consultant and a Team Leader)

  • Step 3


Caută prin job-urile existente