SD Solutions logo

Atlas Invest | Senior Data-Focused Backend developer

SD Solutions
26 days ago
Full-time
Remote
Serbia, Romania, and Poland
Real Estate Development

On behalf of Atlas Invest, SD Solutions is looking for a talented Senior, research-oriented Data Engineer / Data-Focused Backend developer who can take a feature idea from concept through research, data validation, modeling approach, and full implementation. You will play a key role in designing, developing, and maintaining our core services, with a focus on performance, reliability, and scalability.

SD Solutions is a staffing company operating globally. Contact us to get more details about the benefits we offer.

As a Data-Focused Backend developer, you will own the full arc from idea to impact. "End-to-end" here isn't just a buzzword; it means you translate abstract problems into testable hypotheses. It means the same person who reads a paper on hybrid document classification prototypes it in a notebook, evaluates it with DSPy metrics, wires it into a LangGraph node, and deploys it into our production Python/TypeScript monorepo.

You will bridge the gap between abstract research and concrete engineering. You won’t stop at a "notebook win" or just build isolated models; you will build the pipelines, FastAPI services, and TypeScript integrations that serve them to the real world, ensuring reliability and measurable business value. We are looking for a "rockstar" who can seamlessly navigate the boundary between high-level AI orchestration and low-level system reliability.

Your First 90 Days

Month 1: Codebase Mastery & First Shipped Wins

  • Get fully onboarded by successfully running the monorepo locally and tracing a 'live' data request through our core AI and data services within your first few days.
  • Ship your first pipeline improvement to production (e.g., an extraction fix or a schema normalization) by the end of Week 1.
  • Reproduce a notebook experiment, publish a short gap analysis, and transition your first DSPy or LangGraph prototype into a tested FastAPI service.

Month 2: Pipeline Ownership & The Research Flywheel

  • Take end-to-end ownership of a complex pipeline component (like due diligence intelligence or multi-source data fusion).
  • Deliver a new evaluation harness tied to a live pipeline, and immediately use it to measure and drive a real-world performance increase.
  • Productionize a research-driven upgrade (like a new DSPy optimizer strategy) with clear before/after metrics.

Month 3: Architecture & Scale

  • Lead the architecture of a next-generation research initiative (e.g., advanced GraphRAG or a new autonomous diligence agent) from abstract idea to production deployment.
  • Define and accelerate a repeatable “research-to-release” playbook for your domain, setting the standard for how we bridge AI research and production engineering.

What You Will Own

  • AI Extraction Pipelines: Design and ship improvements to the OCR → Classify → Extract pipeline (using PaddleOCR, LangGraph, DSPy) to reduce extraction error and latency for complex document types like T12 financials, rent rolls, and appraisals.
  • Scale Data Normalization: Expand our property data aggregation layer. You will pull data from various top-tier real estate and demographic APIs, optimizing schema normalizations and conflict resolution to unify external datasets with our internal systems.
  • Strengthen Automated Risk Engines: Improve the underlying engine to generate smarter, cleaner, and higher-quality risk assessments.
  • Optimize Property Intelligence Pipelines: Enhance automated data enrichment to deliver instantaneous, actionable insights on asset-specific attributes and external risk factors.
  • External Provider Resilience: Expand and maintain our TypeScript-based provider ecosystem, ensuring reliability against third-party outages via robust caching, retries, and observability.
  • Drive the Research Flywheel: Conduct systematic gap analyses using custom evaluation suites (accuracy/precision-recall) on current modules. You will identify the next 2-3 bottlenecks, feed them back into the engineering loop, and implement academic approaches (e.g., SOTA advanced chunking, multi-step RLM reasoning) to continuously boost precision and recall.
  • Orchestrate Agentic Workflows: Use LangGraph to build complex, fault-tolerant state machines that connect our document classification, OCR, and schema extraction modules.

What hard skills do we need?

Note: We don't expect you to have every single skill listed below-that's nearly impossible. We value equivalent skills and a proven ability to learn fast, especially when it comes to specific technologies like DSPy or Neo4j Cypher.

  • Languages: Python 3.12+ (FastAPI/Pydantic), TypeScript (Strict mode/Zod), SQL/Cypher, and the newest programming language -> English.
  • AI/ML/LLM Systems: Prompts/DSPy optimization, LangGraph orchestration, vector retrieval (Weaviate, Elastic, or alternatives), prompt/eval loops, and multi-model integrations (OpenAI, Gemini, vLLM).
  • Data & Graphs: Neo4j modeling, schema design, multi-source data fusion, and ORMs (SQLAlchemy, Prisma, or Drizzle are an advantage).
  • Document Intelligence: Working with pre-implemented OCR pipelines, document parsing, and classification under noisy, real-world inputs/files/tables.
  • Production Engineering: Monorepo tooling, Docker/Docker-compose, message queues (RabbitMQ or others), and observability (tracing, structured logging).
  • Experimentation: Comfortable in Jupyter Notebooks for rapid prototyping, benchmark/evaluation harnesses, reproducible experiments, and A/B metric tracking.

Core Responsibilities:

  • Identify and onboard new data sources
  • Perform data comparisons & validation
  • Assess data quality and usability
  • Define the modeling approach
  • Implement and productionize solutions
  • Work independently with minimal structure

The Team X @ Atlas Mission & Culture

Atlas Invest’s Team X is building the intelligence layer for real estate. We ingest, normalize, and reason over the messiest data in one of the world's largest asset classes – property records scattered across multiple external providers, complex ownership networks buried in public filings, and financial details locked inside massive, unstructured rent rolls and appraisals.

Team X is a diverse, high-performing squad of engineers and researchers within Atlas. We value ownership, velocity, and craftsmanship. We ship a polyglot monorepo and treat the boundary between research and production as a feature, not friction. You will join a culture where people are trusted to run with ambiguity, publish Jupyter experiments on Monday, and deploy those results to production by Friday.

About the company:

Atlas Invest is transforming the bridge loan landscape, seamlessly connecting investors with real estate developers using advanced big data analytics for a personalized investment experience.

By applying for this position, you agree to the terms outlined in our Privacy Policy. Please take a moment to review our Privacy Policy https://sd-solutions.breezy.hr/privacy-notice, and make sure you understand its contents. If you have any questions or concerns regarding our Privacy Policy, please feel free to contact us.