Senior Data Scientist - Machine Learning Data Operations New San Francisco, California

1 Days Old

Senior Data Scientist - Machine Learning Data Operations Senior Data Scientist - Machine Learning Operations ABOUT THE JOB Company Intro: TurbineOne is the frontline perception company. We deliver decision advantage, better situational awareness, and stronger force protection. Our customers love how we automate the right portions of the military intelligence cycle while keeping them in the loop. The company is a small, fast-moving, and high-performance startup that is backed by the best DefenseTech venture capitalistswith over $50M in funding.We recently raised our Series B round and are valued at $300M. TurbineOne is actively deployedby every branch of the Department of Defense to solve their most critical missions. Job Title : Data Scientist Reporting to the Machine Learning team lead Geographically flexible for home-office Primary Responsibilities: Ingesting, organizing, and maintaining large-scale training datasets from open-source resources and contract-specific artifacts Creating and managing data cataloging systems to ensure datasets are findable, accessible, and ready for ML training pipelines Designing and implementing data labeling workflows, including managing external labeling vendors and quality assurance processes Building and maintaining YOLO-style manifests and annotation formats for custom computer vision datasets Performing data cleaning, validation, and augmentation to ensure high-quality training data Conducting exploratory data analysis and generating insights about dataset characteristics, biases, and coverage gaps Supporting the ML research team with statistical analysis, experiment design, and model evaluation Developing data pipelines and automation tools for continuous data ingestion and processing Collaborating with ML engineers to optimize data loading and preprocessing for training efficiency On a Typical Day You Would: Process incoming datasets from various sources, performing quality checks and organizing them into our data management system Create or review annotation schemas and coordinate with labeling teams to ensure consistent, high-quality labels Write Python scripts to clean, transform, and validate datasets for specific ML training requirements Analyze dataset statistics and create visualizations to identify potential issues or opportunities for improvement Collaborate with the ML research lead to design experiments and evaluate model performance across different data splits Document dataset characteristics, versioning, and lineage to maintain reproducibility and compliance Desired Experience: High standard of ethics, grit, integrity and moral character 5+ years of experience in data science, analytics, or related field with focus on ML data preparation Strong foundation in probability, statistics, and experimental design Bachelor's degree in Statistics, Mathematics, Computer Science, or related quantitative field (Master's preferred) Proficiency with Python data stack: Pandas, NumPy, Jupyter Notebooks, and data visualization libraries Experience with ML frameworks (PyTorch, Scikit-learn) and familiarity with training workflows Hands-on experience with computer vision datasets and annotation formats (COCO, YOLO, Pascal VOC) Experience managing data labeling projects and working with annotation tools (Label Studio, CVAT, or similar) Familiarity with open-source ML models and experience applying them to real-world problems Strong SQL skills and experience with data warehousing concepts Experience with version control (Git) and collaborative development practices Excellent communication skills for coordinating with technical and non-technical stakeholders Meticulous attention to detail and strong organizational skills for managing complex datasets Willingness to embrace the Startup Culture of moving fast, being insatiably curious, celebrating often, embracing uncertainty, and having a personal desire to improve other peoples’ lives Nice to Have: Experience with defense or security-related datasets Knowledge of edge computing constraints and data optimization techniques Experience with distributed data processing frameworks (Spark, Dask) Familiarity with MLOps practices and tools Background in specific domains relevant to perception systems (satellite imagery, sensor fusion, etc.) Startup Culture Expectations: We're a small, fully remote team and everything is our responsibility Our team thrives on autonomy, trust and solid communication Everyone on the Team needs to be very comfortable with constant change, moving fast, sharing failures, embracing grit, and building things themselves Must be eligible to obtain a clearance with the U.S. government Create a Job Alert Interested in building your career at TurbineOne? Get future opportunities sent straight to your email. Apply for this job * indicates a required field First Name * Last Name * Email * Phone * Resume/CV * Enter manually Accepted file types: pdf, doc, docx, txt, rtf Enter manually Accepted file types: pdf, doc, docx, txt, rtf LinkedIn Profile Website What aspect of TurbineOne's mission excites you most? * CLEARANCE ELIGIBILITY - This position requires eligibility to obtain and maintain a U.S. security clearance. * Select... Do you presently hold an active U.S. security clearance, or are you eligible to obtain and maintain a U.S. security clearance? For more information about U.S. Security Clearances: click here .
#J-18808-Ljbffr
Location:
San Francisco, CA, United States
Salary:
$250,000 +
Category:
IT & Technology