Site Reliability Engineer Job at TalentOla, Austin, TX

S1BoSlBSSGt0V0lRVTJpbGxJaEJWNU1j
  • TalentOla
  • Austin, TX

Job Description

Job Title: Site Reliability Engineer

Location: Austin, TX (Onsite)

Job Summary

Seasoned Site Reliability Engineer (SRE) with 8+ years of experience in supporting complex, large-scale distributed systems. Highly skilled in managing production failures, conducting root cause analysis, and driving effective remediation. Strong communicator with expertise in ing, monitoring, and release management, complemented by automation proficiency and a keen ability to learn quickly.

This role involves providing 24/7 support as part of the SRE team, ensuring the reliability and performance of mission-critical Java, .NET, and Batch applications deployed across GCP, PCF, and on-premise environments.

Technical Skills:

  • Expertise in understanding large scale production systems and technologies, for example load balancing, monitoring, distributed systems, microservices, and configuration management.
  • Should have solid hands-on experience in troubleshooting and fixing application failures, application Performance degradation, Code issues, cloud platform issues, Batch Failures, Infra failures, DB failures, Network failures.
  • Hands-on experience in performing Production deployments using CI/CD and exposure to deployment strategies.
  • Experience in troubleshooting of Linux/Unix.
  • Monitor the application/Services/batch availability.
  • Act quickly on the application s(Performance, Availability) and Batch Job failures
  • Perform the required analysis (Code/Log) and escalate to the Engineering team as required.
  • Initiate and drive the Techlines in case of outages/major incidents/Batch abends and ensure Service Restoration in the least time possible.
  • Effectively handle the Incident, Problem, Release and Change management.
  • Own and deliver the user stories assigned as part of the sprint.
  • The user stories range from application code Debugging, Issue analysis, Code fix, Knowledge base creation, documentation of SOP's, Production
  • Deployments, Pre & Post Patching/Maintenance activities, Service Requests
  • Build monitoring solutions using APM tools like Splunk, Appdynamics, Thousand Eyes, ITRS, AppMetrics, MoogSoft, Kafka etc
  • Automate of day-day operational tasks.
  • Be part of the Exit reviews to ensure the best practices are followed to have the right code deployed to Production systems
  • Provide feedback/recommend improvements to the system which would enable highly stable systems.
  • Strong understanding of Networking Concepts (TCP/IP, SSL/TLS, IPSec, VPN etc), Firewall and Load Balancers.
  • Experience in Scripting Shell/Powershell/Python
  • Strong Experience in working with any Cloud-based infrastructure (PCF, GCP, AWS, Azure Cloud or others)

Job Tags

Full time,

Similar Jobs

Allied Mechanical

Welder - Flux Core, GTAW Job at Allied Mechanical

 ...excellence in machining services to a broad range of business platforms including aerospace, space, energy, nuclear, military, oil refinery, turbine compressor, semiconductor and data storage. We specialize in machining large castings and other metal/alloy parts. Allied... 

Ford Motor Company

Motorsports Engine Technician Job at Ford Motor Company

 ...systems and related software applications (Microsoft applications)+ Possess a competitive spirit, can-do attitude, and interest in motorsports+ International and Domestic travel may be required**Even better, you may have...**+ Bachelors Degree in a related discipline... 

IICA Speech

Speech Pathologist Assistant Job at IICA Speech

 ...Assist speech-language pathologists in the assessment and treatment of speech, language, voice, and fluency disorders. Implement speech and...  ...was formed in 2004 by founder, Amy Haught. We provide speech therapy, occupational therapy, and feeding therapy. We began in the south... 

CA High Speed Rail Authority

Chief Technology Officer Job at CA High Speed Rail Authority

Job Description and Duties Under the administrative direction of the Chief Executive Officer, the Chief Technology Officer (CTO) manages the Authoritys information technology services. In this capacity, the incumbent has the highest-level management responsibility ...

HCA Healthcare

Physical Therapist - Inpatient Rehab Job at HCA Healthcare

**Description****Introduction**Do you want to join an organization that invests in you as a(an) Physical Therapist - Inpatient Rehab? At HCA Florida Bayonet Point Hospital, you come first. HCA Healthcare has committed up to $300 million in programs to support our incredible...