WHO WE ARE
Opswerks specializes in making data center and cloud operation teams thrive. Our philosophy is about making technology "werk" for our customers by tailoring solutions to their exact needs.
We believe that products, services, tools and processes should serve the needs of people, NOT the other way around.
Our global team of service architects, infrastructure admins and software engineers have built and operated some of the world's largest, most scalable environments over the past two decades.
From 24 / 7 monitoring to infrastructure design to application development, our team of talented, creative and dynamic people has got it all covered.
Join us at opswerks. Together, we can make technology werk for the people.
We are currently looking for passionate Platform Systems Engineers who will provide their expertise and resourcefulness in identifying, troubleshooting, and reporting platform problems to developers and stakeholders in order to ensure that the applications are provided with a stable and reliable service.
Identify, troubleshoot, resolve and escalate incidents quickly and effectively.
Be responsible for the operational monitoring of the platform health.
Be responsible for the platform end-users’ problems.
Interact with Application Developers in sustaining application performance and implement fix / changes / improvements as necessary.
Utilize application logs in debugging reported issues and provide analysis for improvement / resolution.
Develop tools, operational enhancements and automated solutions
Perform root cause analysis. Identify and resolve underlying problem patterns, while driving to develop automated and self-healing solutions.
Participate in outage conference calls.
Write clear and consumable documentation of the environment and operational procedures.
Be a member of a 24 x7 shifting rotation.
Strong sense of ownership, customer service support, and integrity demonstrated through excellent written / verbal communication.
Ability to work through complex engineering obstacles using debugging and problem solving skills.
Solid grasp of Linux network and security stack.
Fluency in scripting (ie. Bash, Python, Regex, Ruby on Rails)
Experience / good understanding of containerization (Docker) and container scheduling platforms running in distributed systems (ie.
Apache Mesos, Mesosphere, Kubernetes, Openstack).
Strong debugging and troubleshooting skills that span applications, systems and networking (TCP / IP).
Experience in the usage and administration of Monitoring solutions such as Grafana, Sensu, Nagios, Zabbix, etc.
Knack in using different Analytics and Observability applications such as Splunk, Elasticsearch, Prometheus, etc. in finding issues on different applications and platforms
Bachelor’s degree or higher education.
Familiarity with distributed systems is a plus including : the CAP Theorem, Microservices, and the Twelve Factor App.
Understanding of Linux kernel space, memory process, threads, static and shared libraries, interprocess communication, and signals.
Experience handling large numbers of diverse systems with configuration management systems like Puppet, Chef, Ansible or Salt.
Experience with CI / CD solutions such as Jenkins, Bamboo, Spinnaker, Artifactory, and Git is a plus.
Experience in working and interfacing with APIs and serialized formatting like JSON and YAML is a plus.