Data engineer (level 5)
There are 2 training providers who offer this course. Check if a training provider can deliver this training in the apprentice's work location.
Information about Data engineer (level 5)
Build systems that collect, manage, and convert data into usable information for data scientists, data analysts and business intelligence analysts to interpret.
- Knowledge, skills and behaviours
-
View knowledge, skills and behaviours
Knowledge
- Processes to monitor and optimise the performance of the availability, management and performance of data product.
- Methodologies for moving data from one system to another for storage and further handling.
- Data normalisation principles and the advantages they achieve in databases for data protection, redundancy, and inconsistent dependency.
- Frameworks for data quality, covering dimensions such as accuracy, completeness, consistency, timeliness, and accessibility.
- The inherent risks of data such as incomplete data, ethical data sources and how to ensure data quality.
- Software development principles for data products, including debugging, version control, and testing.
- Principles of sustainable data products and organisational responsibilities for environmental social governance.
- Deployment approaches for new data pipelines and automated processes.
- How to build a data product that complies with regulatory requirements.
- Concepts of data governance, including regulatory requirements, data privacy, security, and quality control. Legislation and its application to the safe use of data.
- Data and information security standards, ethical practices, policies and procedures relevant to data management activities such as data lineage and metadata management.
- How to cost and build a system whilst ensuring that organisational strategies for sustainable, net zero technologies are considered.
- The implications of financial, strategic and compliance regarding to security, scalability, compliance and cost of local, remote or distributed solutions.
- The uses of on-demand Cloud computing platform(s) in a public or private environment such as Amazon AWS, Google Cloud, Hadoop, IBM Cloud, Salesforce and Microsoft Azure.
- Data warehousing principles, including techniques such as star schemas, data lakes, and data marts.
- Principles of data, including open and public data, administrative data, and research data including the value of external data sources that can be used to enrich internal data. Examples of how business use direct data acquisition to support or augment business operations.
- Approaches to data integration and how combining disparate data sources delivers value to an organisation.
- How to use streaming, batching and on-demand services to move data from one location to another.
- Differences between structured, semi-structured, and unstructured data.
- Types and uses of data engineering tools and applications in own organisation.
- Policies and strategies to ensure business continuity for operations, particularly in relation to data provision.
- Technology and service management best practice including configuration, change and incident management.
- How to undertake analysis and root cause investigation.
- Processes for evaluating prototypes and taking them to implementation within a production environment.
- The lifecycle of implementing data solutions in a business, from scoping, though prototyping, development, production, and continuous improvement.
- Data development frameworks and approved organisational architectures.
- The principles of descriptive, predictive and prescriptive analytics.
- Continuous improvement including how to: capture good practice and lessons learned.
- Strategies for keeping up to date with new ways of working and technological developments in data science, data engineering and AI.
- The methods and techniques used to communicate messages to meet the needs of the audience.
Skills
- Collate, evaluate and refine user requirements to design the data product.
- Collate, evaluate and refine business requirements including cost, resourcing, and accessibility to design the data product.
- Design a data product to serve multiple needs and with scalability, efficiency, and security in mind.
- Automate data pipelines such as batch, real-time, on demand and other processes using programming languages and data integration platforms with graphical user interfaces.
- Produce and maintain technical documentation explaining the data product, that meets organisational, technical and non-technical user requirements, retaining critical information.
- Systematically clean, validate, and describe data at all stages of extract, transform, load (ETL).
- Work with different types of data stores, such as SQL, NoSQL, and distributed file system.
- Identify and troubleshoot issues with data processing pipelines.
- Query and manipulate data using tools and programming such as SQL and Python. Manage database access, and implement automated validation checks.
- Communicate downtime and issues with database access to stakeholders to mitigate the operational impact of unforeseen issues.
- Evaluate opportunities to extract value from existing data products through further development, considering costs, environmental impact and potential operational benefits.
- Maintain a working knowledge of data use cases within organisations.
- Use data systems securely to meet requirements and in line with organisational procedures and legislation.
- Identify new tools and technologies and recommend potential opportunities for use in own department or organisation.
- Optimise data ingestion processes by making use of appropriate data ingestion frameworks such as batch, streaming and on-demand.
- Develop algorithms and processes to extract structured data from unstructured sources.
- Apply and advocate for software development best practice when working with other data professionals throughout the business. Contribute to standards and ways of working that support software development principles.
- Develop simple forecasts and monitoring tools to anticipate or respond immediately to outages and incidents.
- Identify and escalate risks with suggested mitigation/resolutions as appropriate.
- Investigate and respond to incidents, identifying the root cause and resolution with internal and external stakeholders.
- Identify and remediate technical debt, assess for updates and obsolescence as part of continuous improvement.
- Develop, maintain collaborative relationships using adaptive business methodology with stakeholders such as, business users, data scientists, data analysts and business intelligence teams.
- Present, communicate, and disseminate messages about the data product, tailoring the message and medium to the needs of the audience.
- Evaluate the strengths and weaknesses of prototype data products and how these integrate within an organisation’s overarching data infrastructure.
- Assess and identify gaps in existing tools and technologies in respect of implementing changes required.
- Identify data quality metrics and track them to ensure the quality, accuracy and reliability of the data product.
- Selects and apply sustainable solutions to contribute to net zero and environmental strategies across the various stages of product and service delivery.
- Horizon scanning to identify new technologies that offer increased performance of data products.
- Implement personal strategies to keep up to date with new technology and ways of working.
Behaviours
- Acts proactively and takes accountability adapting positively to changing work priorities, ensuring deadlines are met.
- Works collaboratively with stakeholders and colleagues, developing strong working relationships to achieve common goals. Support an inclusive culture and treat technical and non- technical colleagues and stakeholders with respect.
- Quality focus that promotes continuous improvement utilising peer review techniques, innovation and creativity to the data system development process to improve processes and address business challenges.
- Takes personal responsibility towards net zero and prioritises environmental sustainability outcomes in how they carry out the duties of their role.
- Use initiative and innovation to problem solve and trouble shoot, providing creative solutions.
- Keeps abreast of developments in emerging, contemporary and advanced technologies to optimise sustainable data products and services.
- Apprenticeship category (sector)
- Digital
- Qualification level
-
5
Equal to higher national diploma (HND) - Course duration
- 24 months
- Maximum funding
-
£19,000
Maximum government funding for
apprenticeship training and assessment costs. - Job titles include
-
- Data engineer
View more information about Data engineer (level 5) from the Institute for Apprenticeships and Technical Education.