At T-Mobile, we invest in YOU! Our Total Rewards Package ensures that employees get the same big love we give our customers. All team members receive a competitive base salary and compensation package - this is Total Rewards. Employees enjoy multiple wealth-building opportunities through our annual stock grant, employee stock purchase plan, 401(k), and access to free, year-round money coaches. That's how we're UNSTOPPABLE for our employees!
Job OverviewImplement observability tools, dashboards, and SLO frameworks for LLM-based services and inference pipelines.
Monitor and improve the health, latency, and throughput of AI infrastructure in multi-cloud (primarily Azure) and hybrid environments.
Manage incident detection, response, and root cause analysis (RCA) for production issues affecting AI services.
Support cost attribution and token usage observability using tools like Weave, Splunk, OpenSearch, and Grafana.
Operationalize and support AI services such as ChatGPT, Infobot, Glean, and AI Gateway in partnership with platform and architecture teams.
Automate deployment, monitoring, and rollback processes using CI/CD and IaC pipelines (e.g., Terraform, Azure DevOps).
Contribute to incident playbooks, runbooks, and knowledge base documentation for GenAI systems.
Partner with engineering, product, and compliance teams to enforce policy and governance on LLM usage and API integrations.
Contribute to ongoing AI projects like Weave auto-eval, GC Academy, or OVA SRE support, as needed.
Gitlab deployment pipeline and software SDLC deployment flow, including new feature definition, development, testing and deployment
Education and Work Experience :
Bachelor's Degree Computer Science, Engineering or related field (Preferred)
Master's/Advanced Degree Computer Science, Engineering or related field (Preferred)
4-7 years Working in operations or develops environments Required
4-7 years Solving customer related issues and managing customer relationships Required
4-7 years Developing software solutions using Python or similar programming languages Required
Experience supporting or integrating with APIs from LLM providers (e.g., OpenAI, Azure OpenAI, Glean).
Strong understanding of service reliability concepts: monitoring, alerting, SLOs, RCA, chaos testing.
Familiarity with container orchestration (Kubernetes), CI/CD systems, and IaC tools (Terraform, ARM, etc.).
Hands-on experience working in Azure (preferred), AWS, or GCP.
AI "RAG" experience
Gitlab deployment pipeline and software SDLC deployment flow, including new feature definition, development, testing and deployment
Exposure to GenAI/LLMOps tools like LiteLLM, Weights & Biases, LLMJ scoring, or Prompt Engineering telemetry.
Experience supporting secure API gateways and centralized LLM access platforms (e.g., Solo Gloo Gateway).
Familiarity with AI governance and compliance practices in enterprise settings.
Experience with real-time systems, auto-evaluation platforms, or AI assistant integration (e.g., Infobot, Coach Assist).
Programming Proficiency in programming and scripting languages such as Python and Bash. (Required)
Automation Ability to automate processes and reduce manual effort. (Required)
Incident Management Understanding of incident response management and operational support. (Required)
Experience with designing and maintaining CICD Pipelines. (Required)
Ability to learn new skills and technologies quickly and adapt to changing circumstances. (Required)
Understanding of system reliability and resilience principles. (Required)
Ability to drive innovation and improve software development and deployment processes. (Preferred)
Experience with cloud native platforms. (Preferred)
The pay range above is the general base pay range for a successful candidate in the role. The successful candidate's actual pay will be based on various factors, such as work location, qualifications, and experience, so the actual starting pay will vary within this range.
At T-Mobile, employees in regular, non-temporary roles are eligible for an annual bonus or periodic sales incentive or bonus, based on their role. Most Corporate employees are eligible for a year-end bonus based on company and/or individual performance and which is set at a percentage of the employee's eligible earnings in the prior year. Certain positions in Customer Care are eligible for monthly bonuses based on individual and/or team performance. To find the pay range for this role based on hiring location, click here.At T-Mobile, our benefits exemplify the spirit of One Team, Together! A big part of how we care for one another is working to ensure our benefits evolve to meet the needs of our team members. Full and part-time employees have access to the same benefits when eligible. We cover all of the bases, offering medical, dental and vision insurance, a flexible spending account, 401(k), employee stock grants, employee stock purchase plan, paid time off and up to 12 paid holidays - which total about 4 weeks for new full-time employees and about 2.5 weeks for new part-time employees annually - paid parental and family leave, family building benefits, back-up care, enhanced family support, childcare subsidy, tuition assistance, college coaching, short- and long-term disability, voluntary AD&D coverage, voluntary accident coverage, voluntary life insurance, voluntary disability insurance, and voluntary long-term care insurance. We don't stop there - eligible employees can also receive mobile service & home internet discounts, pet insurance, and access to commuter and transit programs! To learn about T-Mobile's amazing benefits, check out .
Never stop growing!
As part of the T-Mobile team, you know the Un-carrier doesn't have a corporate ladder-it's more like a jungle gym of possibilities! We love helping our employees grow in their careers, because it's that shared drive to aim high that drives our business and our culture forward. By applying for this career opportunity, you're living our values while investing in your career growth-and we applaud it. You're unstoppable!
...~2:00 AM Dispatch times ~ Can make up to $85K per year (details discussed during interview) Position Purpose: The Driver, Hourly CDL-A is responsible for driving a tractor trailer or straight truck on intrastate and interstate local, over-the-road (OTR), shuttle...
...team of individuals committed to creating a supportive learning environment and continually look for new ways to push the envelope of piano pedagogy with modern teaching styles and approaches. We teach all types of music ranging from classical and jazz repertoire to radio...
...control of your career? We are looking for a motivated Remote Insurance Agentno experience requiredto join our growing team! This role... ...Out Our organization partners with millions of customers nationwide, providing trusted insurance services to individuals, families...
...private practice? Headway is here to help you start accepting insurance with ease, increase your earnings with higher rates, and start... ...higher rates with top insurance plans through access to our nationwide insurance network. - Dependable payments: Build stability...
...currently seeking a male Specimen Collector/Analyst who will work in our Prescott Valley office. The company provides workplace drug testing, and Federal drug testing services to the community and surrounding areas. The company also maintains a state certified environmental...