Senior MLOps Engineer
SecurityScorecard is the global leader in cybersecurity ratings, with over 12 million companies continuously rated, operating in 64 countries. Founded in 2013 by security and risk experts Dr. Alex Yampolskiy and Sam Kassoumeh and funded by world-class investors, SecurityScorecard’s patented rating technology is used by over 25,000 organizations for self-monitoring, third-party risk management, board reporting, and cyber insurance underwriting; making all organizations more resilient by allowing them to easily find and fix cybersecurity risks across their digital footprint.
Headquartered in New York City, our culture has been recognized by Inc Magazine as a "Best Workplace,” by Crain’s NY as a "Best Places to Work in NYC," and as one of the 10 hottest SaaS startups in New York for two years in a row. Most recently, SecurityScorecard was named to Fast Company’s annual list of the World’s Most Innovative Companies for 2023 and to the Achievers 50 Most Engaged Workplaces in 2023 award recognizing “forward-thinking employers for their unwavering commitment to employee engagement.” SecurityScorecard is proud to be funded by world-class investors including Silver Lake Waterman, Moody’s, Sequoia Capital, GV and Riverwood Capital.
About the Role:
We are seeking an experienced Senior MLOps Engineer to join our Data Science team. In this role, you will collaborate with a cross functional team of ML engineers, data engineers and data science researchers. You will collaborate with other experts to design, build, deploy, and operate production pipelines and microservice systems with a focus on MLOps best practices. You will build and manage infrastructure including feature stores, data mesh and our AI platform, creating automation for training, delivery, and updating of our machine learning models. If you're a problem solver, effective communicator, and enthusiastic about driving advancements in AI and ML in the security space, we want you on our team.
What You'll Do:
- Own and lead the creation, operation and maintenance of critical infrastructure projects and automation for the data science team to empower data science research and ML model delivery.
- Train and mentor team members in applying best practices in operations and security.
- Provide code reviews and feedback on Github pull requests.
- Identify opportunities for technical and process improvement and implementation.
- Knowledge and application of best practices such as immutable containers, Infrastructure as Code, stateless applications, and software observability.
- Tune large-scale distributed system performance to achieve SLA metrics such as stability, uptime, scalability, and low latency while keeping costs under control.
- Continuous improvement of CI/CD processes to automate builds and deployments.
- Collaborate with scientists and engineers to understand KPIs and configure observability, monitoring, and alerting to support operations.
- Setup Terraform / Kubernetes and associated tooling to support data pipelines, feature stores, data mesh and delivery of machine learning models.
- Diagnose and correct networking issues or communicating problems clearly enough such that centralized IT teams can resolve.
- Decompose system layer abstractions to investigate and determine root cause issues and resolve complex distributed system performance problems.
What We Need You To Have:
- 4-5+ years experience in MLOps / DevOps in the cloud (AWS, GCP, or Azure).
- Experience with Apache Spark and big data streaming infrastructure (data lakes, Snowflake, Databricks, S3).
- Production environment experience with Amazon Web Services (AWS) or equivalent.
- Experience supporting data stores such as RDMBS (Postgres), KVS (Cassandra / ScyllaDB) and queues / streaming (Kafka).
- Skilled with Terraform, Git, Python, bash / shell scripting, and Docker containers.
- Experienced with CI/CD processes (Jenkins, Ansible) and automated configuration tools (Terraform, Ansible, etc.).
- Experience setting up container orchestration (AWS ECS, Kubernetes / K8s).
- Skilled with dashboard creation and monitoring with tools such as Prometheus and DataDog.
- Capable of planning out future infrastructure and projecting timelines.
- Ability to work with our highly collaborative team.
- Strong written and verbal communication skills.
- Willingness to teach and mentor others.
- You have a bachelors or greater in computer science, STEM or related field.
- You’ve implemented data mesh and feature stores.
- Strong understanding of networking concepts, including OSI layers, firewalls, DNS, split-horizon DNS, VPN, routing, BGP, etc.
- Skilled with tools such as Airflow, Argo, Kubefllow, MLFlow, and vector databases.
Specific to each country, we offer a competitive salary, stock options, Health benefits, and unlimited PTO, parental leave, tuition reimbursements, and much more!
SecurityScorecard is committed to Equal Employment Opportunity and embraces diversity. We believe that our team is strengthened through hiring and retaining employees with diverse backgrounds, skill sets, ideas, and perspectives. We make hiring decisions based on merit and do not discriminate based on race, color, religion, national origin, sex or gender (including pregnancy) gender identity or expression (including transgender status), sexual orientation, age, marital, veteran, disability status or any other protected category in accordance with applicable law.
We also consider qualified applicants regardless of criminal histories, in accordance with applicable law. We are committed to providing reasonable accommodations for qualified individuals with disabilities in our job application procedures. If you need assistance or accommodation due to a disability, please contact email@example.com.
SecurityScorecard does not accept unsolicited resumes from employment agencies. Please note that we do not provide immigration sponsorship for this position.