Job Description
Data Science Professional
Req ID:  58239
Posting Start Date:  06/05/2026
Job Function:  Software Engineering
Division:  Digital
Job Location:  IND-Bengaluru-RMZ Ecoworld
Advertised Salary:  Competitive

Job Req ID: 58239

Posting Date: 6-May-2026

Function:  Data Science Professional

Location: Bengaluru

 

About the role

You will build and own scoped AI service features within the Cognium platform. You work within the architecture set by the Lead AI Engineer, take feature specifications and deliver production-quality implementations: a chunking strategy module, a guardrail model integration, an embedding pipeline stage, a RAGAS metric computation job. You are expected to work independently within scope — take ownership, write tests, benchmark your work, and ship to the Lead's quality bar.
You have built ML or AI features in production before. You know that a model that scores well in a notebook evaluation is not done — it needs to be packaged, served, monitored, and maintained. You are comfortable with the full lifecycle from experiment to production deployment.

What you’ll be doing

RAG Pipeline Implementation
•    Implement document ingestion pipeline stages: PDF parser (PyMuPDF — table extraction, heading detection), DOCX parser (python-docx — structure preservation), HTML cleaner (BeautifulSoup — boilerplate removal), metadata extractor (doc_id, page_number, section_heading, source_url)
•    Implement chunking strategies and expose as configurable per-KB parameter: fixed-size with configurable overlap, semantic (sentence boundary detection using SpaCy sentencizer), recursive (hierarchical structure-aware splitting), custom (user-defined regex split pattern)
•    Build the embedding generation pipeline: sentence-transformers (BGE-large, E5-mistral) batch inference, async pipeline with queue-based worker pool, embedding dimension validation (768–3072), storage to pgvector with HNSW index maintenance
•    Implement the hybrid retrieval pipeline: pgvector cosine similarity query (top-k dense), Elasticsearch BM25 query (top-k sparse), Reciprocal Rank Fusion score normalisation and merge, result deduplication by chunk_id

Guardrail ML Model Integration
•    Integrate Presidio + custom SpaCy NER into the input PII detection pipeline: load custom NER model trained on enterprise entity types, configure recogniser registry, implement redaction with entity-type-specific masking (e.g. EMPLOYEE_ID → [ID_REDACTED])
•    Integrate DistilBERT prompt injection classifier: ONNX runtime inference for low-latency serving, threshold configuration (0.85 block / 0.50–0.85 flag), batch inference for high-throughput scenarios, model update workflow without pod restart
•    Integrate Detoxify output toxicity model: multilabel classification (toxicity, severe_toxicity, obscene, threat, insult, identity_attack), per-label threshold configuration, structured result payload to audit log

Memory and Embeddings
•    Implement episodic memory read/write: encode user interaction into embedding (sentence-transformers), store to pgvector with user_id + agent_id + timestamp metadata, similarity search for memory recall (top-3 most relevant past interactions), TTL-based pruning job
•    Implement organisational memory entity extraction: SpaCy NER pipeline for entity identification (people, projects, products, policies) from agent conversations, entity deduplication, Neo4j node/relationship upsert, Cypher query interface for graph-augmented retrieval

Evaluation and Experimentation
•    Implement RAGAS metric computation jobs: faithfulness, answer_relevancy, context_precision, context_recall — using RAGAS library against sampled invocations from ClickHouse, persist scores to eval_results table with agent_id + version + timestamp
•    Build golden dataset management pipeline: curate high-quality invocations (RAGAS faithfulness ≥ 0.85) from ClickHouse into golden_dataset table, versioned golden sets per agent, diff comparison between golden set versions
•    Run offline experiments: A/B compare chunking strategies, embedding models, retrieval k values, re-ranker models — track metrics (MRR@k, faithfulness, retrieval latency) in MLflow or ClickHouse experiment tracking, report findings to Lead with recommendation

Essential Skills / Experience

Systems Architecture
•    Distributed systems design - CAP theorem trade-offs, eventual vs strong consistency selection per use case (CockroachDB for Cedar policies, Redis for budget counters), partition tolerance in multi-node ML serving, failure isolation between pipeline stages
•    Multi-tenant SaaS architecture -per-tenant data isolation in vector stores (pgvector namespace by workspace_id), tenant-scoped RAGAS baselines, Restricted data routing enforcement (Cedar hard-deny to cloud LLM), resource quota enforcement via Kubernetes LimitRange
•    Event-driven architecture - Kafka event sourcing for audit trail, NATS JetStream for real-time policy invalidation (<5s), async embedding ingestion pipeline via Kafka consumer group, evaluation result streaming to ClickHouse
•    API gateway patterns 
•    Microservices communication 

LLM Orchestration and Agentic Patterns
•    Deep knowledge of LLM orchestration patterns - prompt assembly (system prompt + persona + memory context + RAG context + conversation history + user input), context window management, token budget allocation across pipeline stages
•    Agentic reasoning loops -ReAct (Reason + Act) pattern: thought generation → action selection → tool execution → observation integration → next thought, max_iterations guard, loop termination conditions
•    Multi-agent coordination 
•    Prompt engineering

Python and ML Stack
•    Python 3.11+ - async/await with asyncio, type annotations (Pydantic v2 for service contracts), dataclasses, context managers, generator patterns for streaming, profiling with cProfile and memory-profiler
•    sentence-transformers -model loading and inference, batch encoding, semantic similarity computation, cross-encoder re-ranking, model fine-tuning with MultipleNegativesRankingLoss, evaluation with InformationRetrievalEvaluator
•    Embedding models - BGE-large-en-v1.5, E5-mistral-7b-instruct, domain-specific fine-tuning on enterprise document corpus, embedding dimension trade-offs (768 vs 1024 vs 3072), matryoshka representation learning for adaptive truncation
•    SpaCy -custom NER model training (annotated corpus → spacy train), entity recogniser pipeline integration with Presidio, sentencizer for semantic chunking, linguistic feature extraction (dependency parsing for entity relationship extraction into Neo4j)
•    RAGAS - faithfulness, answer_relevancy, context_precision, context_recall metric computation, custom metric definition, dataset creation from production logs, async evaluation for throughput
•    PyTorch / HuggingFace Transformers — model loading (AutoModelForSequenceClassification, AutoTokenizer), ONNX export for production serving, LoRA / QLoRA fine-tuning with PEFT library, bitsandbytes for quantisation (4-bit, 8-bit)
•    vLLM 
•    FastAPI 

Retrieval and Search
•    Chunking strategies -fixed-size (configurable size + overlap), semantic (SpaCy sentence boundaries + semantic similarity threshold), recursive (hierarchical document structure: headers → paragraphs → sentences), late chunking (embed full document then chunk embeddings), chunk size impact on retrieval quality vs latency trade-off
•    Vector databases - pgvector: HNSW index creation (m=16, ef_construction=64), cosine vs L2 distance selection, ivfflat for approximate search at scale, multi-tenant partitioning by workspace_id. Weaviate (Phase 2): multi-tenancy classes, HNSW+BQ compression, batch import, hybrid search
•    Elasticsearch BM25 -index mapping for chunk content and metadata, custom analyser for enterprise terminology, BM25 parameter tuning (k1, b), multi-field search with boosting, semantic sparse retrieval (ELSER) as BM25 enhancement
•    Hybrid retrieval -Reciprocal Rank Fusion (RRF)

 

Desirable Skills / Experience

Evaluation and Safety
•    LLM evaluation frameworks -RAGAS, DeepEval, EleutherAI LM Evaluation Harness — know when to use each, their assumptions, and their blind spots
•    LLM-as-Judge design -calibration against human labels, positional and verbosity bias mitigation, structured output enforcement, confidence scoring, multi-turn evaluation for conversational agents
•    Adversarial robustness -prompt injection taxonomy (direct, indirect, multi-turn), jailbreak pattern library, red-teaming methodology, automated attack generation, bypass rate measurement and SLO definition
•    Safety benchmarks -ToxiGen, HateXplain, BBQ (demographic bias), TruthfulQA (factual accuracy), PrivacyGLUE — know what they measure and what they miss for enterprise agentic use cases
•    Human annotation -rubric design, inter-annotator agreement (Cohen's Kappa, Fleiss' Kappa), calibration sessions, annotation quality control, disagreement resolution

Data and Infrastructure
•    ClickHouse- MergeTree engine for eval_results time series, window functions for score trending, materialized views for per-agent baselines, ReplacingMergeTree for deduplication of re-evaluated invocations
•    Kafka -producer for audit events and evaluation triggers, consumer group for async embedding pipeline, Kafka Streams for real-time RAGAS score aggregation, exactly-once semantics for evaluation result persistence
•    Redis 
•    PostgreSQL / CockroachDB 
•    Neo4j 
•    Docker / Kubernetes 

Observability and Tooling
•    Dynatrace 
•    OpenTelemetry 
•    MLflow or Weights & Biases 
•    Git / GitLab CI 

•    LangChain / LlamaIndex — know their patterns even if not using them directly, understand where Cognium's custom RAG implementation diverges and why
•    Pinecone / Qdrant — alternative vector database experience for comparison benchmarking and Phase 2 Weaviate migration planning
•    Cohere Rerank API — commercial re-ranker benchmarking against ms-marco-MiniLM to validate self-hosted choice
•    OpenAI Evals framework — understand its design principles for building comparable internal eval harness
•    Presidio custom recogniser development — training pipeline for new entity types, rule-based recogniser authoring, regex + NLP hybrid recognition
•    NVIDIA DCGM — GPU metrics collection (gpu_utilisation, memory_bandwidth, compute_utilisation), KEDA custom metric integration, MIG partition health monitoring
•    Structured generation / constrained decoding — outlines, guidance, llama.cpp grammar sampling — for enforcing JSON output from vLLM without post-processing

Our Package

BT Group is the UK’s leading communications group and the holding company behind some of the country’s most recognised brands – including BT, EE, Openreach and Plusnet. Our purpose is as simple as it is ambitious: we connect for good.  Our customers include consumers, small, medium and large businesses, public sector organisations and other communications providers. 

BT Group’s role is about setting direction, unlocking value and creating the conditions for our brands and businesses to thrive.

Having come through the most capital-intensive phase of our fibre investment, our focus now is on what comes next – simplifying how we operate, using technology and AI to work smarter, and organising ourselves to serve customers better and grow sustainably. Group teams shape strategy, policy, brand, capital allocation and transformation, helping the whole organisation perform at its best.

We have a singular culture that unites all our people: we are customer-first challengers, who are committed, clear and connected. These behaviours unite us as one team to deliver for our colleagues, our customers, our stakeholders and the country.   Joining BT Group means working at the heart of a business that matters to the UK, with the opportunity to shape decisions, influence outcomes and help set the future course of one of the country’s most important companies.