Staff Network Operations Engineer

45 Days Old

Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to power their most advanced AI applications. Crusoe is redefining AI cloud infrastructure, with a mission to align the future of computing with the future of the climate. Our AI platform is recognized as the "gold standard" for reliability and performance. Our data centers are optimized for AI workloads and are powered by clean, renewable energy.
Be part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.
About the Role The Crusoe Cloud Network Engineering team seeks an ambitious, experienced team player to join our Network Operations team. This team is responsible for designing, building, and operating the global edge, backbone, and data center network for High-Performance Compute (HPC) Clusters with GPUs. The ideal candidate will be highly motivated, self-directed, and passionate about working on cutting-edge environmental technologies. Excellent analytical, communication skills, and teamwork are essential.
As a Staff Network Operations Engineer, you will be part of the Network Engineering team, overseeing the operations of the global Crusoe Cloud Network. Your responsibilities include ensuring network uptime through monitoring, outage fixes, and participating in a 24/7 on-call rotation. This role offers valuable experience in managing edge, backbone, and HPC-based data center networking at a large scale.
A Day in the Life:
Manage and optimize Crusoe Energy Cloud's global network, including edge, backbone, data center, and public cloud connectivity.
Collaborate with Network Engineering and cross-functional teams such as Software Infrastructure and Product teams to drive network innovation and evolution.
Lead operational excellence initiatives by developing monitoring, alerting, and self-healing systems to ensure high network availability.
Perform advanced troubleshooting and root cause analysis for incidents, guiding post-mortem reviews and improvements.
Mentor network engineers and establish best practices for incident response, documentation, and operational readiness.
Participate in a 24/7 On-call Support rotation for the Crusoe Network.
You Will Thrive In This Role If:
You have 10+ years of experience building and operating at scale in a production environment.
You possess in-depth knowledge of network protocols such as TCP/IP, QoS, BGP, OSPF/IS-IS, EVPN, VXLAN, QoS, and MPLS-related technologies like RSVP-TE, LDP.
You understand network monitoring protocols and tools like SNMP, IPFIX, Sflow/netflow, and Telemetry.
You have experience with tools such as Kentik, Arbor, Thousand Eyes, Catchpoint, and packet design.
You are familiar with data center network architectures like Fat Tree, CLOS, BGP-TE, and peering for edge.
You have hands-on experience with network devices from Mellanox, Cisco, Arista, Juniper, and other vendors.
You are familiar with mainstream switch/router chipsets like Broadcom and Barefoot.
Knowledge of RDMA, Infiniband, and RoCE is a plus.
You have in-depth knowledge of public cloud connectivity options (AWS, GCP, Azure, Ali Cloud, OCI).
You understand IPv6 and IPv4-IPv6 coexistence technologies.
Programming or scripting experience in Python, Ansible, Puppet, Chef, or similar languages is a plus.
You are self-motivated with good communication and writing skills.
You are a team player willing to participate in the global on-call rotation.
You hold a Bachelor's degree in Computer Science, Information Science, Engineering, Mathematics, or have equivalent work experience (3+ years).
Benefits:
Hybrid work schedule
Industry-competitive pay
Restricted Stock Units in a fast-growing, well-funded tech company
Health insurance options including HDHP and PPO, vision, dental for you and dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term and long-term disability
Teladoc access
Pet-friendly offices
401(k) with 100% match up to 4%
Generous paid time off and holidays
Cell phone reimbursement
Tuition reimbursement
Subscription to the Calm app
Company-paid commuter benefit of $200 per pay period
Compensation Range:
Salary between $195,000 and $230,000, including Restricted Stock Units. Final compensation depends on education, experience, skills, and internal equity considerations.
Crusoe is an Equal Opportunity Employer, committed to diversity and inclusion in the workplace.
#J-18808-Ljbffr
Location:
San Francisco, CA, United States
Salary:
$200,000 - $250,000
Category:
Engineering