Database Systems Site Reliability Engineer
Apple Inc
Bengaluru, India
Job posting number: #7289220 (Ref:apl-200575263)
Posted: October 25, 2024
Job Description
Summary
The people here at Apple don’t just build products — they build the kind of wonder that’s revolutionized entire industries. It’s the diversity of those people and their ideas that inspires the innovation that runs through everything we do, from amazing technology to industry-leading environmental efforts. Join Apple, and help us leave the world better than we found it.
Apple’s Services Engineering organization (ASE) is seeking experienced database systems engineers to join our Redis SRE team. Engineers in ASE Redis SRE are powering some of Apple's most critical internet services. You will be joining a team of experts, working at the cutting edge of modern database deployment architectures, distributed systems. The team's work is deployed at massive scale, across our data-centers worldwide. It also has significant impact, forming the platform upon which iCloud and many other internet services at Apple are built. In ASE, your work will benefit hundreds of millions of users and is critical to the success of some of the most visible current and future Apple features.
Apple’s Services Engineering organization (ASE) is seeking experienced database systems engineers to join our Redis SRE team. Engineers in ASE Redis SRE are powering some of Apple's most critical internet services. You will be joining a team of experts, working at the cutting edge of modern database deployment architectures, distributed systems. The team's work is deployed at massive scale, across our data-centers worldwide. It also has significant impact, forming the platform upon which iCloud and many other internet services at Apple are built. In ASE, your work will benefit hundreds of millions of users and is critical to the success of some of the most visible current and future Apple features.
Description
The ASE Redis SRE team develops applications and tooling that are safe, reliable, scalable, and fast. This work requires an innovative spirit and an extraordinary degree of care and rigor in engineering. Team members contribute to all major components of Redis deployment infrastructure, including maintenance automation, backup service application, monitoring and alerting tooling/dashboards, deployment architecture, focused on stability, performance, and scaling.
Success in this role requires expertise in several of the following:
- Understanding of core SRE concepts - Monitoring, Alerting, Incident management.
- Understanding of database concepts (consistency models, isolation levels, crash and recovery semantics).
- Performance engineering (design concepts, profile-guided optimization).
- Service management across a bare metal, virtualized (EC2), and containerized (K8s) style platforms.
- Fundamentals of system-level hardware and networking components (storage devices and controllers, network interfaces, CPU and memory layout in server-class systems).
- Operating systems concepts (process scheduling, disk and network I/O, performance).
- Datacenter architecture (networking topologies, host placement strategies, and failure modes), design of multi-datacenter systems, failure domains, and wide-area networking.
- Prior experience with the development or maintenance of distributed databases/storage systems is recommended.
Success in this role requires expertise in several of the following:
- Understanding of core SRE concepts - Monitoring, Alerting, Incident management.
- Understanding of database concepts (consistency models, isolation levels, crash and recovery semantics).
- Performance engineering (design concepts, profile-guided optimization).
- Service management across a bare metal, virtualized (EC2), and containerized (K8s) style platforms.
- Fundamentals of system-level hardware and networking components (storage devices and controllers, network interfaces, CPU and memory layout in server-class systems).
- Operating systems concepts (process scheduling, disk and network I/O, performance).
- Datacenter architecture (networking topologies, host placement strategies, and failure modes), design of multi-datacenter systems, failure domains, and wide-area networking.
- Prior experience with the development or maintenance of distributed databases/storage systems is recommended.
Minimum Qualifications
- 8+ years of demonstrated expertise in developing database systems, storage engines, distributed systems, or performance engineering.
- Proficient in modern Java and optionally Python / Go.
- Experience with EC2, EBS, and Terraform.
- Operating systems concepts (process scheduling, disk and network I/O, performance).
- Experience in developing critical internet services and platform infrastructure.
- Experience running Tier 1 services for 24/7 support.
Preferred Qualifications
- BS or MS in Computer Science / related fields or equivalent work experience
- Service management across a bare metal, virtualized (EC2), and containerized (K8s) style platforms.
- Familiarity with micro-services architecture and container orchestration with Kubernetes.
- Understanding SRE principles including monitoring, alerting, error budgets, fault analysis, and automation.
- Strong sense of ownership, with a desire to communicate and collaborate with other engineers and teams.