Data Manager/ Data architect / Data Engg Lead
New Today
Overview Role: Senior Data Manager / Data Architect / Data Engineer Lead
Location: Hybrid in San Francisco, CA (2/3 on site)
Duration: Long Term
Employer: ADDSOURCE, on behalf of Dice
This position is part of the Image Curation and Data Products team, transforming biomedical imaging data from clinical trials and real world data sources to deliver high quality, FAIR imaging data sets for imaging data scientists and algorithm development.
Key Responsibilities Imaging Data Pipeline delivery: design, implement and maintain automated pipelines for onboarding, verifying, transforming and curating biomedical imaging data from clinical trials and real world data sources, for therapeutic areas including Oncology, Neurology, and Ophthalmology across all image file formats.
Data Quality and Integrity: develop and implement solutions to detect and correct anomalies to achieve the highest data quality; ensure de-identification, PHI/PII controls, and image-specific QC checks at scale per industry standards like DICOM and internal specifications.
Data Analysis and Integration: integrate ML/AI-assisted tools in pipelines for inline image analysis, classifications, and segmentations to extract and enrich metadata for various analyses and performance optimization.
Image Data Management: build and maintain large-scale catalogs of curated imaging data sets, enhancing FAIR principles and enabling easy discovery and access to imaging data assets.
Compliance and Controls: ensure applicable compliance and privacy controls are followed, including GXP and CSV validations.
Collaboration: work closely with Image Scientists, Data Scientists, ClinOps, and biomarker research teams to support data needs for primary and secondary endpoint analyses.
External collaboration: coordinate with external partners such as CROs to ensure imaging data conforms to established agreements and quality standards.
Leadership: lead delivery teams to ensure timely delivery of product backlog and features; participate in agile ceremonies and guide the team through planning and execution.
Ideal Qualifications Experience with medical imaging data and platforms (PACS, VNAs, etc.) and imaging modalities (CT, PET, MRI, nifti, ophthalmic imaging such as OCT, CFP, etc.).
Strong understanding of DICOM standard, metadata parsing, tags, and multi-frame images.
Experience with clinical information data standards (SDTM, ADaM).
Data integration across diverse sources, including imaging data with tabular clinical data.
De-identification methodologies and PHI/PII privacy controls.
Knowledge of GXP and CSV validation frameworks.
Proficiency in Python (pandas, pydicom, SimpleITK, dicom-numpy, dcm2niix).
Hands-on ETL/ELT experience with large medical imaging datasets.
Experience with orchestration tools (Apache Airflow, Spark, Talend or similar).
SQL and NoSQL proficiency; experience with image metadata stores (PostgreSQL, MongoDB, etc.).
Experience with AWS data services (RDS, Athena, Glue, EC2, Lambda, S3) and familiarity with EKS, Docker and HPC.
Data analysis and reporting tools (e.g., Tibco, Tableau, AWS QuickSight).
Strong Git/GitLab skills and DevOps tools (e.g., Jenkins, Terraform).
Experience with ML workflows for computer vision tasks (segmentation, classification, etc.).
Nice to have: NLP and GenAI implementation experience.
Ability to work with cross-functional global teams in a dynamic Agile environment; leadership and mentoring of agile team members.
10+ years of experience with data platforms, analysis and insights.
Job Function & Seniority Seniority level: Mid-Senior level
Employment type: Full-time
Job function: Information Technology
Industries: Software Development
Note: This description is based on the original posting and includes the essential responsibilities and qualifications for the role. EEO statements and required disclosures remain part of the hiring process.
#J-18808-Ljbffr
- Location:
- San Francisco, CA, United States
- Salary:
- $250,000 +
- Job Type:
- FullTime
- Category:
- IT & Technology