Data Science and Statistics
-
DepartmentFaculty of Fundamental Sciences
-
Program code6211AX009
-
Field of studyMathematical Sciences
-
QualificationMaster of Mathematical Sciences
-
Duration2
Fun fact
In today’s digital world, vast amounts of data are continuously generated by official statistics institutions, government agencies, medical organisations, financial institutions, transport companies, social networks, gaming platforms, and countless online services. The growing number of smart devices further amplifies this data flow, producing massive and complex datasets that require advanced analytical expertise.
Extracting meaningful and actionable insights from such diverse data sources has become one of the major challenges of modern business and research. This task requires professionals capable of applying advanced statistical analysis, data modelling, and forecasting methods using modern technologies and analytical tools.
About
The Data Science and Statistics programme responds to this demand by preparing highly skilled data analysts who can navigate the entire data analysis process—from data acquisition and preparation to model development and interpretation of results. It is open to students with backgrounds in statistics, mathematics, or computer science who wish to specialise in data-driven decision-making and predictive analytics.
The aim of this programme is to train specialists capable of performing complex data analyses and developing innovative analytical solutions for large and unstructured datasets. Unlike traditional statistics, modern data science addresses unique challenges related to the size, structure, and heterogeneity of data. This requires the integration of mathematical statistics with computer science, programing, and information technologies.
Students acquire the skills needed to program in R and Python, work with databases, and apply statistical modeling and forecasting techniques to solve real-world problems. The programme combines theoretical knowledge with practical applications, ensuring that graduates are well-prepared to analyse large-scale data, build predictive models, and assess model performance and quality.
Graduates will possess comprehensive knowledge of mathematical statistics, data science methodologies, and analytical software systems, enabling them to contribute effectively to both research and applied data projects.
Possible research areas include:
-
Data analysis and modelling – works related to regression models, time series, missing data imputation, and indicator modelling
-
Artificial intelligence and machine learning – research focusing on neural networks, classification methods, text and image recognition
-
Economic and business applications – theses analysing real estate, transport, or service markets, price indices, and economic growth indicators
-
Digital environment and online data sources – research examining advertising and tracking detection, user behaviour analysis online, and evaluation of data from virtual learning environments.
-
What will I be able to do?
Upon successful completion of the programme, graduates will be able to:
• Select appropriate mathematical models, parameter estimation methods, and quality assessment techniques for data analysis
• Prepare data for analysis, program in R and Python, and organise statistical research results for interpretation
• Develop mathematical and statistical models for both small- and large-scale data, estimate parameters, and evaluate model suitability
• Collaborate effectively in interdisciplinary and international research and professional teams. -
What are my career opportunities?
Graduates of the Data Science and Statistics programme can pursue careers as:
• Data Analysts, Business Systems Analysts, or Risk Assessment Specialists
• Project Managers in business and government institutions in Lithuania and abroad
• Researchers or doctoral candidates in mathematics and related fields of the physical sciences.
Study subjects
1 Semester
-
FMSAM25155 9 credits
Analysis and Forecasting of Official Statistics Indicators (with course work)
Module aim
Course aim – to provide students with knowledge of official statistics, methods for the analysis and modelling of statistical indicators, and to develop their ability to apply statistical methods to multivariate data and time series, assess indicator quality, perform forecasts, and critically interpret the obtained results.
Module description
Indicators of official statistics can be analysed in spatial cross section and time, therefore applied statistical methods are divided into methods of statistical analysis of independent statistical data and methods of time series analysis. Students are introduced with the linear and nonlinear modeling of indicators. The aspects of the economical indicators time series preadjustment nonobserved components estimation are introduced during the course. Theoretical knowledge is realised in practical exercises with statistical software R and others.
-
FMSAM25144 6 credits
Data Visualization and Communication
Module aim
To make acquaintance with the principles of the Bayesian inference in statistics and apply this knowledge in practice.
Module description
In contrast to the classical statistics, data is considered as fixed and it’s distribution parameters are considered as random in Bayesian statistics. Inference about the parameters and other unknown values is updated applying information obtained from the data. Using Bayesian theorem statistical models are constructed. Prior distributions, data likelihood, posterior distributions and their approximation are studied for this. Bayesian methods are applied for the moedls of the classical statistics. Advantages and disadvantages of classical and Bayesian methods are compared; acquitance with the estimation of the hierarchical models is made. Suitability of the statistical models is tested applying information criteria. R software is used for practical application of Bayesian methods.
-
FMSAM24170 6 credits
Functional Programming with R
Module aim
The purpose of this course – to introduce students to functional programming using the R programming language, to improve their programming skills with R, and to prepare them for the use of the R language in statistical data analysis.
Module description
Statistical data analysis has several main stages: data collection and their transformation, visualization, statistical modeling. These stages depend on the capabilities of the software and, of course, on the skill of using that equipment.
Software used in data analysis can be divided into several groups. A separate group consists of specialized programming languages.
R is a functional programming language specifically designed for statistical data analysis and data visualization. Using additional libraries, the number of which is constantly increasing, the basic capabilities of the language can be easily extended to use the latest statistical modeling, visualization and other data analysis techniques. According to the TIOBE index, R is among the top 20 programming languages and its position in the ranking has been increasing in recent years. This course is designed to consolidate basic programming skills with R, with a strong focus on the functional programming style. In addition, R is used in other data science specialization courses, so this course will prepare students for further studies. -
FMSAM17106 3 credits
Master's Research Work 1
Module aim
To formulate the topic of the final work. To examine the necessary scientific literature.
Module description
The aim of the final work is to learn to solve mathematics problems in various applied fields, to expand the theoretical knowledge gained in the study process, to deepen scientific experience.
-
FMSAM24167 6 credits
Data Science Seminar
Module aim
The goal of this course is to introduce techniques used in data scraping, data preparation and visualization.
Module description
There is a massive amount of data published in different forms on the web. But the majority of online data exists as (unstructured) web content. Data scraping is the process of extracting information from websites into structured data. This course will be a practice-oriented course that focuses on real-world data preparation, transformation and visualizaton. Topics includes data scraping, overwiev of data file formats (JSON, HDF5), data manipulation tools, map visualization. This course is designed to serve as a bridge between introductory statistics and practical work with real, large-scale databases.
-
FMSAM22168 6 credits
Advanced Matrix Algebra
Module aim
To provide students with the methods of matrix algebra necessary to understand and to apply multivariate statistical methods and statistical modeling.
Module description
Matrix algebra is widely used in statistical computation and manipulations, especially in highdimensional data analysis. Knowledge of matrix algebra is, therefore, essential to learning, understanding, or using areas of statistics based on matrix data. The advantage of this course over other courses of matrix algebra is that the contents are selected to address the needs for statisticians and the exercises and examples are also geared towards applications in statistics. This course provides the topics of matrix algebra necessary to understand multivariate statistical methods including block matrices and their operations, linear dependence and independence of matrices, linear spaces and subspaces, linear span, base and dimension of space, the null space of a matrix, homogeneous linear systems, rank of a matrix, trace of a matrix, LU, spectral and singular value (SVD) decompositions of a matrix, generalized inverse of a matrix.
-
FMSAM25101 6 credits
Advanced Sampling and Estimation Methods in Official Statistics
Module aim
To introduce students to advanced statistical methods used in official statistics for data collection and population parameter estimation, covering various data sources, sampling strategies, nonresponse and outlier treatment, small area estimation, and modern data sources such as administrative data and big data.
Module description
This course covers a range of statistical methods applied in the practice of official statistics. Students will explore both traditional and modern data sources (censuses, surveys, administrative registers, big data), fundamental concepts of sampling theory, and probabilistic as well as non-probabilistic sampling. Topics include complex sampling techniques-such as stratified and unequal probability sampling-and the use of ratio, regression, and calibrated estimators. The course also addresses practical challenges such as nonresponse, data editing, imputation, and validation. A strong focus is placed on small area estimation, indirect and model-based estimators. Additionally, the course examines the integration of independent samples and the combination of probability and non-probability data, as well as issues of confidentiality and dissemination of statistical results.
2 Semester
-
FMSAM24274 9 credits
Modern Databases: From Relational to Distributed Systems (with course work)
Module aim
It will be given knowledge about relational and distributed databases, be able to create the database to meet their needs, be able to record, update and retrieve data from any type of database.
Module description
The course is designed to introduce students to relational and distributed databases, their different types. Examples are given of how to create, insert, edit or retrieve data in each case. The advantages and disadvantages of each of them are presented.
-
FMSAM25274 6 credits
Bayesian Methods
Module aim
To make acquaintance with the principles of the Bayesian inference in statistics and apply this knowledge in practice.
Module description
In contrast to the classical statistics, data is considered as fixed and it’s distribution parameters are considered as random in Bayesian statistics. Inference about the parameters and other unknown values is updated applying information obtained from the data. Using Bayesian theorem statistical models are constructed. Prior distributions, data likelihood, posterior distributions and their approximation are studied for this. Bayesian methods are applied for the moedls of the classical statistics. Advantages and disadvantages of classical and Bayesian methods are compared; acquitance with the estimation of the hierarchical models is made. Suitability of the statistical models is tested applying information criteria. R software is used for practical application of Bayesian methods.
-
FMSAM22272 6 credits
Optimization Problems in Statistics
Module aim
In this course, goal is to give students a foundation in common computational techniques and algorithms that are used in the implementation of statistical methods.
Module description
Computational methods play significant role in modern statistical data analysis. This course is an introduction to the modern, computationally intensive methods in statistics. Topics include numerical optimization, least squares and maximum likelihoo methods, dimensionality reduction, simulation and Monte Carlo methods including Markov chain Monte Carlo (MCMC) and other modern topics.
-
FMSAM25222 6 credits
Advanced Machine Learning
Module aim
To provide students with deep theoretical and practical knowledge of advanced machine learning methods, develop their ability to apply complex algorithms in real-world data analysis, independently evaluate model suitability and interpretability, and prepare them for further work in the fields of data science and artificial intelligence.
Module description
This course is designed to deepen students’ knowledge of advanced machine learning methods not covered in undergraduate studies. It focuses on more complex supervised and unsupervised learning algorithms, including regularization techniques, ensemble models (bagging, boosting), parametric and non-parametric classifiers, dimensionality reduction methods, and outlier detection techniques. The course also introduces the principles of Automated Machine Learning (AutoML) and emphasizes the importance of model interpretability in modern data science. Special attention is given to applying these methods to real-world data and interpreting the results to support data-driven decision-making.
-
FMSAM17207 3 credits
Master's Research Work 2
Module aim
Collect and analyze information on the topic of student’s Master’s Thesis by applying scientific research methods.
Module description
Information retrieval: information needed to address problem, searching and reviewing of relevant literature sources, information analysis, evaluation, and recording. Data gathering: data gathering techniques, statistical data processing, other data analysis methods.
3 Semester
-
FMSAM24366 9 credits
Natural Language Processing and Analysis (with course work)
Module aim
To provide students with some knowledge and experience applicable to real-life problems which involve the analysis of textual data.
Module description
This course is aimed to equip the students with tools appropriate for the analysis of large collections of unstructured textual data. It gives an overview of both data structures applicable to the forementioned task and the main algorithms for analyzing the content and structure of written communication.
-
FMMMM22301 6 credits
Big Data Processing Technologies
Module aim
The goal is to introduce the basic numerical methods and to learn how to apply these methods for solution of specific problems.
Module description
In this course students learn the concepts of computer arithmetic and stability of numerical algorithms, numerical methods for solution of nonlinear equations and systems of equations, direct and iterative methods for solution of linear systems of equations, interpolation and approximation, numerical methods for solution of eigenvalue and eigenfunction problems, optimization methods, and numerical integration methods.
Students must attend at least 60% of the time scheduled practical works and 50% of the lectures.
-
FMSAM16308 3 credits
Master's Research Work 3
Module aim
Experimental planning, execution and initial analysis of results.
Module description
This module is designed to plan, perform and analyze the results of the experiment that was provided in the module “Final Thesis 2”. The experiment must approve or disprove the assumptions made in the module “Final Thesis 2”.
Student should get acquainted with experiment planning and execution methods, compare the achieved results with similar research done by other researchers and evaluate the results’ efficiency.
-
FMSAM25373 6 credits
Random Graphs
Module aim
To acquaint students with theoretical and practical aspects of random graph analysis. It is expected that upon successful completion of this course, the listener will be able to apply the knowledge of random graphs to the analysis of real data.
Module description
The theory of random graphs is a branch of mathematics that is quite widely applicable in real-life analysis of computer, social, economic, biological, epidemiological, and other area networks. The advent of the computer age has led to an increasing interest in understanding the structure and development of real-world networks. The theory of random graphs provides a framework for this understanding. Dureng the lectures students are introduced to the main elements of random graph theory, models and their application. Theoretical knowledge is acquired through practical tasks.
-
FMSAM25301 6 credits
Internship Official Statistics
Module aim
To provide students with hands-on experience in the field of official statistics, enabling them to understand the processes of data collection, processing, analysis, and dissemination in statistical institutions, and to develop practical skills relevant to working with official statistical data.
Module description
This module is intended for students who wish to deepen their knowledge and gain practical skills in the area of official statistics. The internship takes place in institutions or organizations responsible for producing official statistics (e.g., national statistical offices, central banks, ministries, or international organizations). Students become familiar with data sources, survey design, sampling methods, quality assurance procedures, statistical analysis techniques, and the interpretation and presentation of results. Upon completion of the module, students will be able to apply their theoretical knowledge in real-world statistical practice and evaluate the role of official statistics in evidence-based decision-making.
-
FMSAM25356 6 credits
Risk Theory
Module aim
After successful completion of this course participant is expected to know main theories describing behaviour of individuals (agents) in risky situations, main risk measures and criteria for selection under risk. Participant is also expected to be able to apply theoretical knowledge in practical situations.
Module description
Main theories describing behaviour of individuals (agents) in risky situations, their financial decisions under risk, main risk measures and criteria for selection under risk are presented in this course. It is shown how theoretical model may be applied in real situations when making insurance and / or participation in lottery (investment) decisions.
4 Semester
-
FMSAM16409 30 credits
Master Graduation Thesis
Module aim
To develop the ability to formulate the scientific conclusions of the research work, to prepare and publish the publication of the main results, to prepare the final master’s thesis and to present it to the qualification commission.
Module description
Scientific and practical conclusions are made, publications are prepared, prepared work is presented to the qualification commission.
Statistics
| Metric | Value |
|---|---|
| Enrolled students | 5 |
| Enrolled to FT | 4 |
| Min FT grade | 8.88 |