No more applications are being accepted for this job
- Server Component Validation Validation of NICs, Flash drives, PSUs, Storage backpanes, etc. Work with Vendors and OEMs on RCA, writing FA reports.
- Firmware Validation Validation of new versions of BIOS, and other firmware types Customization of test scenarios for new components.
- System debug and RCA Investigation and troubleshooting of hardware failures in prod Work with Vendors and OEMs on RCA, writing FA reports.
- Problem escalation to vendor/manufacturer and resolution Reporting discovered new types of failures to Vendors and OEMs Work on reproduction scenarios in prod/test environment. Communication with vendor/manufacturer. Example activities:
- Making changes to automation infrastructure to monitor and support fleet health at scale.
- Conducting investigations and creating tooling to detect and diagnose hardware health issues.
- Working with partner teams on identification, validation, and implementation of remediations across the software and hardware stack.
- Publishing updates on resolutions and communicating findings internally.
- Troubleshooting, diagnosing, and root causing system failures while isolating components and failure scenarios, working with internal and external stakeholders.
- Employing data visualization techniques to highlight issues and implementing systemic solutions to hardware health issues. Required Skills
- hands-on experience of debugging and resolving hardware and systems issues
- facility with Python or equivalent development / scripting language
- troubleshooting and analytical skills
- strong experience working with Linux systems
- experience with server GPU solutions experience with HPE server (x) solutions