Data Engineer (Hybrid/Remote), United States

Data Engineer (Hybrid/Remote)

19 Days Old

Overview Johnson Lambert is a leading provider of audit, tax, and advisory services with a specialized focus on the insurance, nonprofit, and employee benefit plan sectors. For 35+ years, weu2019ve built a reputation for deep industry knowledge, exceptional client service, and a culture grounded in agility, respect, and trust . Weu2019re passionate about serving our clients, growing our firm, and developing our people. Weu2019re building a modern data foundation to unlock new efficiencies, enhance service delivery, and lay the groundwork for a future where data quality, context, and availability are paramount. Youu2019ll join our Business Automation team as our first datafocused hire and collaborate with colleagues in process automation, analytics, and domain expertise. Our mandate: design and implement our next-generation data foundation on AWS , applying modern data lake/lakehouse patterns and open approaches to data layout, governance, and reliability, while staying flexible to evaluate the best tools over time. Note: our current needs are batch-first ; we are not building near-realtime pipelines today. What Youll Do Own the modern data foundation on AWS: design secure, scalable, and cost-aware lake/lakehouse patterns using open, interoperable formats and a layered architecture (raw ? standardized ? curated/analytics-ready). Build dependable batch pipelines: implement ingestion, transformation, validation, and orchestration to move data from source systems to governed, analytics-ready datasets with clear SLAs/SLOs. Translate messy files into trusted data: create robust, repeatable processes to extract and normalize data from Excel (multisheet, merged cells, header variations, hidden rows, crosstab layouts) and PDF documents (including OCR and table extraction), mapping to standardized schemas. Integrate key SaaS sources: ingest data via APIs/exports from business apps such as Salesforce, Slack, Tableau, and keep them in sync on reliable schedules. Structure data for AI/ML accessibility: prepare datasets for analytics, ML, and LLM workloads (semantic/feature layers, curated text corpora, and vector indexes/databases for retrieval), with appropriate metadata and access controls. Model for the business: implement pragmatic dimensional/lakehouse models aligned to how our audit, tax, and advisory teams work across insurance, nonprofit, and employee benefit plan domains. Raise data quality & trust: embed tests and contracts, schema checks, observability; maintain lineage, documentation, and data dictionaries usable by nonengineers. Harden security & governance: apply AWS identity, access controls, encryption, classification/tagging, and rightsized governance appropriate for client-serving environments, protecting client data used in AI/ML contexts. Automate and templatize: use infrastructure-as-code and CI/CD to make environments reproducible; publish templates/patterns teammates can reuse without deep data engineering expertise. Enable and mentor: partner with analysts/automation engineers; run reviews, workshops, and coaching to uplift the team and promote data self-service where practical. Required Qualifications 57 years in progressively complex data engineering/data architecture roles. Strong experience building on AWS (storage, compute/serverless, identity, orchestration, monitoring) and operating secure, production data workloads. Proven success designing and implementing modern lake/lakehouse architectures using open, interoperable approaches (transactional tables, partitioning, governance, performance optimization). Expert data wrangling in Python and SQL for structured and semi-structured data (CSV, JSON, Excel) with practical experience in PDF extraction (OCR, layout detection, table parsing). Hands-on experience building and deploying data infrastructure using infrastructure-as-code (e.g., Terraform, AWS CDK), CI/CD practices, and modern data testing/observability tooling. Practical experience implementing data governance solutions for cataloging, lineage, and documentation suitable for sensitive, client-service environments. Experience with ETL/ELT tools (e.g., Airflow, Spark) and data platforms such as Databricks or Snowflake; open approaches and thoughtful tool selection are prioritized. Ability to ingest from SaaS apps (e.g., Salesforce, Slack, Tableau) via APIs/exports and normalize these feeds into curated datasets. Comfortable as a player-coach and first-of-its-kind hire: setting standards, making build-vs-buy decisions, and delivering under ambiguity. Excellent communication skills to translate business requirements into clear technical plans, and vice versa. Bacheloru2019s in Computer Science or related field preferred. AWS certifications a plus. Nice to Have Familiarity with insurance/nonprofit/EBP data (e.g., policy, claims, loss registers; donor/grant; plan/participant). Big data technologies (e.g., Hadoop, Kafka) batch-focused environment. Experience with LLM-assisted extraction or classification for document normalization with governance/guardrails. How Youll Succeed (Outcomes & Measures) First 90 days: Stand up or harden a secure AWS baseline and initial lake/lakehouse layout with CI/CD; deliver a production batch pipeline converting one high-value Excel/PDF process into a standardized, validated dataset with documentation and lineage. By 6 months: Operationalize 23 priority SaaS integrations feeding curated layers on a dependable schedule; reduce manual prep for target stakeholders by 3050% through standardized schemas and self-service access. By 12 months: Publish reusable ingestion and document-processing templates; establish data quality SLAs/SLOs adopted by multiple teams; demonstrate improvements in reliability, freshness, and adoption across analytics use cases. How We Work Our culture prizes agility, respect, and trust . We iterate in short cycles, document what we build, and keep stakeholders close. We choose modern, open, and maintainable solutions and believe governance should enablenot hinderdelivery. Equity note: Research suggests that women and Black, Indigenous, and other persons of color are less likely than men or White job seekers to apply for positions unless they are confident they meet 100% of the qualifications. We strongly encourage interested individuals to apply, and allow us to evaluate the knowledge, skills, and abilities you demonstrate, using an internal equity lens. Johnson Lambert prides itself on the hands-on approach and relationships we build with future employees, employees, and clients. A member of our HR team personally reviews all applications submitted. Employment Type: Full Time Salary: $120,000 - $150,000 Annual Bonus/Commission: Yes Seniority level Mid-Senior level Employment type Full-time Job function Quality Assurance, Information Technology, and Analyst Industries IT Services and IT Consulting Referrals increase your chances of interviewing at Johnson Lambert LLP by 2x Get notified about new Data Engineer jobs in Raleigh, NC. #J-18808-Ljbffr

Apply

Location:: United States
Job Type:: FullTime
Category:: Computer And Mathematical Occupations