THE BIG PICTURE
Sysco LABS is the captive innovation center for Sysco Corporation (NYSE:SYY), a Fortune 100 company and the world’s largest foodservice provider with 72,000+ associates, 330+ distribution centers and over 725,000 customers in 90 countries. For fiscal 2023 that ended July 1, 2023, Sysco generated over $76 billion in sales.
Sysco LABS powers Sysco’s farm-to-fork operations and our technology is present in the sourcing of food products, merchandising, storage and warehouse operations, order placement and pricing algorithms, the delivery of food and supplies to Sysco’s global network, the in-restaurant dining experience of the end-customer and much more.
Our technology ecosystem spans 600+ applications, monitoring and incident management across 10,000+ servers, multi-cloud – multi-platform event streaming and microservices architecture, and enterprise-grade systems that power a catalog of over 1.4 million products, 330+ distribution centers and a fleet of 14,000 IoT-enabled delivery trucks, and more.
Everything we do at Sysco LABS supports Sysco’s Purpose of ‘Connecting the world to share food and care for one another’, and our technology directly impacts millions of food consumers in a trillion-dollar, global industry.
THE OPPORTUNITY
As a Lead Observability Engineer, you will be responsible for designing, implementing, and maintaining observability solutions for complex systems and applications across Sysco. You will work closely with development, operations, monitoring and ITSM teams to ensure that the systems and applications are monitored, logged, and traced effectively, allowing for efficient troubleshooting, debugging, and performance analysis. You will also be responsible for defining observability standards and best practices, driving the adoption of observability technologies, and continuously improving the observability posture of the organization. You will be a key stakeholder in Sysco’s Major Incident Management process supporting technology teams to troubleshoot and resolve complex production issues faster using Datadog.
WHAT YOU WILL BE DOING
- Defining, documenting, and enforcing observability standards and best practices across the organization for the three primary pillars of observability (Logs, Traces and Metrics) and researching and developing new solutions for other pillars of observability (like RUM, Synthetic Monitoring, Network monitoring and profiling)
- Collaborating with development and SRE teams to identify and address areas of improvement in the observability stack while ensuring that observability is integrated into the software development lifecycle
- Designing and developing standard dashboards for critical metrics for various Sysco applications and services using the observability data
- Continuously monitoring and analyzing system and application performance data, identifying trends and anomalies, and making recommendations for improvement
- Monitoring and maintaining Sysco’s observability tool stack (Datadog and ServiceNow ITOM), ensuring they are up to date, healthy, secure, and compliant
- Taking responsibility for Datadog platform usage and capacity planning to ensure technology teams can consume all the platform features effectively
- Staying up to date with the latest trends and advancements in observability technologies and best practices, evaluating the viability of such in Sysco’s context and providing thought leadership
- Collaborating with technology teams to troubleshoot and resolve complex production issues related to system performance, and reliability through observability tools and actively engaging in Sysco’s Major Incident Management process for incident response and resolution
- Providing training, mentorship, and guidance to other team members on observability concepts, tools, and practices
- Enabling integrations and building custom integrations where necessary to onboard new data sources and metrics to the Datadog platform
- Driving process optimization via automation to ensure the L1 monitoring team can monitor the applications effectively and efficiently
REQUIREMENTS
- Proven experience as an Observability Engineer or similar role in Software Engineering or SRE with a strong understanding of observability concepts, methodologies, and tools
- Expert-level experience in monitoring and logging technologies, both open-source and closed-source. Previous working experience in Datadog or Dynatrace will be an added advantage
- Strong programming and scripting skills (e.g., Python, Go, JavaScript) for automation and customization of observability solutions.
- A deep understanding of system and application architectures, distributed systems, microservices, and cloud computing
- Experience with DevOps practices and tools (e.g., Git, Jenkins, Docker, Kubernetes) for continuous integration, deployment, and delivery
- Excellent analytical, problem-solving, and troubleshooting skills
- Strong communication and collaboration skills to work effectively with cross-functional teams
- The ability to work in a fast-paced, dynamic environment and adapt to changing requirements and priorities
- A working knowledge in Network is needed. Fundamental knowledge of TCP/IP stack, application protocols (DHCP/DNS/HTTPs) and networking concepts (HSRP/NAT/VPN/VLANs/802.1x/Wireless/Clustering/High Availability/Load Balancing)
- Strong in troubleshooting incidents in production environments
WHAT AWAITS YOU AT SYSCO LABS
- An attractive base compensation package comfortably above market rates
- Performance rewards and recognition
- Comprehensive healthcare coverage
- A health and wellness allowance
- A team outing allowance
- Paid birthday leave
- Learning and development allowance
- Overseas travel opportunities and exposure to client environments
- A progressive culture that promotes work-life harmony and personal development
Sysco LABS is an Equal Opportunity Employer.