What we're looking for
We seek a skilled professional with expertise in observability tools and blockchain systems. Your primary responsibilities will include configuring and managing monitoring solutions, ensuring system reliability, and creating alerts to identify and mitigate potential issues in our complex infrastructure.
Experience with Cosmos SDK-based chains, containerization, and Linux system administration is highly valued. You thrive in a fast-paced environment, have strong self-management skills, and can collaborate effectively within a distributed team.
Responsibilities
Observability
- Design, implement, and manage observability solutions using Prometheus, Grafana, and Alertmanager.
- Create meaningful dashboards, metrics, and alerts to monitor blockchain infrastructure.
- Optimize and maintain monitoring systems to ensure minimal downtime and accurate insights.
Blockchain Infrastructure
- Ensure high availability and performance of blockchain systems.
- Monitor and troubleshoot issues within Cosmos SDK-based blockchain systems.
- Collaborate with blockchain developers and engineers to enhance system reliability.
System Administration & Networking
- Maintain and troubleshoot Linux-based systems (systemd, networking, containers).
- Manage containerized environments using Podman or similar tools.
- Diagnose and resolve complex system and networking issues.
Collaboration & Process Improvement
- Document observability processes, tools, and workflows.
- Provide insights and feedback to improve system architecture and operational efficiency.