Data Science and Statistics
-
DepartmentFaculty of Fundamental Sciences
-
Program code6211AX009
-
Field of studyMathematical Sciences
-
QualificationMaster of Mathematical Sciences
-
Duration2
Fun fact
In today’s digital world, vast amounts of data are continuously generated by official statistics institutions, government agencies, medical organisations, financial institutions, transport companies, social networks, gaming platforms, and countless online services. The growing number of smart devices further amplifies this data flow, producing massive and complex datasets that require advanced analytical expertise.
Extracting meaningful and actionable insights from such diverse data sources has become one of the major challenges of modern business and research. This task requires professionals capable of applying advanced statistical analysis, data modelling, and forecasting methods using modern technologies and analytical tools.
About
The Data Science and Statistics programme responds to this demand by preparing highly skilled data analysts who can navigate the entire data analysis process—from data acquisition and preparation to model development and interpretation of results. It is open to students with backgrounds in statistics, mathematics, or computer science who wish to specialise in data-driven decision-making and predictive analytics.
The aim of this programme is to train specialists capable of performing complex data analyses and developing innovative analytical solutions for large and unstructured datasets. Unlike traditional statistics, modern data science addresses unique challenges related to the size, structure, and heterogeneity of data. This requires the integration of mathematical statistics with computer science, programing, and information technologies.
Students acquire the skills needed to program in R and Python, work with databases, and apply statistical modeling and forecasting techniques to solve real-world problems. The programme combines theoretical knowledge with practical applications, ensuring that graduates are well-prepared to analyse large-scale data, build predictive models, and assess model performance and quality.
Graduates will possess comprehensive knowledge of mathematical statistics, data science methodologies, and analytical software systems, enabling them to contribute effectively to both research and applied data projects.
Possible research areas include:
-
Data analysis and modelling – works related to regression models, time series, missing data imputation, and indicator modelling
-
Artificial intelligence and machine learning – research focusing on neural networks, classification methods, text and image recognition
-
Economic and business applications – theses analysing real estate, transport, or service markets, price indices, and economic growth indicators
-
Digital environment and online data sources – research examining advertising and tracking detection, user behaviour analysis online, and evaluation of data from virtual learning environments.
-
What will I be able to do?
Upon successful completion of the programme, graduates will be able to:
• Select appropriate mathematical models, parameter estimation methods, and quality assessment techniques for data analysis
• Prepare data for analysis, program in R and Python, and organise statistical research results for interpretation
• Develop mathematical and statistical models for both small- and large-scale data, estimate parameters, and evaluate model suitability
• Collaborate effectively in interdisciplinary and international research and professional teams. -
What are my career opportunities?
Graduates of the Data Science and Statistics programme can pursue careers as:
• Data Analysts, Business Systems Analysts, or Risk Assessment Specialists
• Project Managers in business and government institutions in Lithuania and abroad
• Researchers or doctoral candidates in mathematics and related fields of the physical sciences.
Study subjects
1 Semester
obligatory
-
FMSAM17158 9 credits
Operations Research (with course work)
Module aim
To provide knowledge in the main models of operation research. To familiarize with possibilities of solving and analysis of all these problems using optimization procedures of SAS/OR.
Module description
The linear, mixed-integer and parametric programming are introduced in the course. Transportation models, assignment problem and scheduling problem are discussed. Nonlinear, stochastic and dynamic programming is presented. Multiple criteria optimization, project management and game theory are reviewed. Process of model building and possibilities of solving and analysis of all these problems using optimization procedures of SAS/OR are investigated.
-
FMSAM17159 6 credits
Software Systems for Data Analysis
Module aim
To teach students to use nowadays statistical computer software – statistical analysis system SAS and open source program R.
Module description
Students are uner training of two programs: SAS and R. Statistical analysis system SAS is used under licence by big enterprises. Transformations of the data sets, various statystical analysis is done by SAS, results are presented in graphs, tables and data sets.
R is a language and environment for statistical computing and graphics. R can be considered as a different implementation of S. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques, and is highly extensible.
R is available as Free Software. Its popularity is growing up very quickly among the statisticians all over the world. -
VVEIM17389 6 credits
Economics
Module aim
To give knowledge of of the economic processes of enterprises and business environment, develop skills necessary for economic decisions making and implementing of the professional activity.
Module description
During the course of Economics are studing the economic theory: the needs, resources, production, labor and material resources in the economy. There are analysing the market, supply and demand balance, elasticity as well as a company’s costs and profits differentiation and competition models. Also there are analysing the investment, business environment factors, pricing strategy and methods, as well as macroeconomic indicators, GNP calculation methods. The course analyses the fiscal and monetary policy measures, the labor market, assessment of unemployment and inflation, international economic relations, business risk and instruments of its reduction, business development.
-
FMSAM17161 6 credits
Selected Chapters of Probability Theory
Module aim
To present methods of characteristic functions and cumulant, which are often used in probability theory and mathematics statistics.
Module description
The course presents two main methods of characteristic functions and cumulants that are used in the asymptotic analysis of the distributions of various statistics: general limit theorems for sums of independent summands, estimates of the remainder term of the approximation by normal law, asymptotic expansions, limit theorems for large deviations, and exponential inequalities.
-
FMSAM17106 3 credits
Master's Research Work 1
Module aim
To formulate the topic of the final work. To examine the necessary scientific literature.
Module description
The aim of the final work is to learn to solve mathematics problems in various applied fields, to expand the theoretical knowledge gained in the study process, to deepen scientific experience.
-
FMSAM25155 9 credits
Analysis and Forecasting of Official Statistics Indicators (with course work)
Module aim
Course aim – to provide students with knowledge of official statistics, methods for the analysis and modelling of statistical indicators, and to develop their ability to apply statistical methods to multivariate data and time series, assess indicator quality, perform forecasts, and critically interpret the obtained results.
Module description
Indicators of official statistics can be analysed in spatial cross section and time, therefore applied statistical methods are divided into methods of statistical analysis of independent statistical data and methods of time series analysis. Students are introduced with the linear and nonlinear modeling of indicators. The aspects of the economical indicators time series preadjustment nonobserved components estimation are introduced during the course. Theoretical knowledge is realised in practical exercises with statistical software R and others.
-
FMSAM25144 6 credits
Data Visualization and Communication
Module aim
To make acquaintance with the principles of the Bayesian inference in statistics and apply this knowledge in practice.
Module description
In contrast to the classical statistics, data is considered as fixed and it’s distribution parameters are considered as random in Bayesian statistics. Inference about the parameters and other unknown values is updated applying information obtained from the data. Using Bayesian theorem statistical models are constructed. Prior distributions, data likelihood, posterior distributions and their approximation are studied for this. Bayesian methods are applied for the moedls of the classical statistics. Advantages and disadvantages of classical and Bayesian methods are compared; acquitance with the estimation of the hierarchical models is made. Suitability of the statistical models is tested applying information criteria. R software is used for practical application of Bayesian methods.
-
FMSAM24170 6 credits
Functional Programming with R
Module aim
The purpose of this course – to introduce students to functional programming using the R programming language, to improve their programming skills with R, and to prepare them for the use of the R language in statistical data analysis.
Module description
Statistical data analysis has several main stages: data collection and their transformation, visualization, statistical modeling. These stages depend on the capabilities of the software and, of course, on the skill of using that equipment.
Software used in data analysis can be divided into several groups. A separate group consists of specialized programming languages.
R is a functional programming language specifically designed for statistical data analysis and data visualization. Using additional libraries, the number of which is constantly increasing, the basic capabilities of the language can be easily extended to use the latest statistical modeling, visualization and other data analysis techniques. According to the TIOBE index, R is among the top 20 programming languages and its position in the ranking has been increasing in recent years. This course is designed to consolidate basic programming skills with R, with a strong focus on the functional programming style. In addition, R is used in other data science specialization courses, so this course will prepare students for further studies. -
FMSAM17106 3 credits
Master's Research Work 1
Module aim
To formulate the topic of the final work. To examine the necessary scientific literature.
Module description
The aim of the final work is to learn to solve mathematics problems in various applied fields, to expand the theoretical knowledge gained in the study process, to deepen scientific experience.
-
FMSAM24167 6 credits
Data Science Seminar
Module aim
The goal of this course is to introduce techniques used in data scraping, data preparation and visualization.
Module description
There is a massive amount of data published in different forms on the web. But the majority of online data exists as (unstructured) web content. Data scraping is the process of extracting information from websites into structured data. This course will be a practice-oriented course that focuses on real-world data preparation, transformation and visualizaton. Topics includes data scraping, overwiev of data file formats (JSON, HDF5), data manipulation tools, map visualization. This course is designed to serve as a bridge between introductory statistics and practical work with real, large-scale databases.
-
FMSAM22168 6 credits
Advanced Matrix Algebra
Module aim
To provide students with the methods of matrix algebra necessary to understand and to apply multivariate statistical methods and statistical modeling.
Module description
Matrix algebra is widely used in statistical computation and manipulations, especially in highdimensional data analysis. Knowledge of matrix algebra is, therefore, essential to learning, understanding, or using areas of statistics based on matrix data. The advantage of this course over other courses of matrix algebra is that the contents are selected to address the needs for statisticians and the exercises and examples are also geared towards applications in statistics. This course provides the topics of matrix algebra necessary to understand multivariate statistical methods including block matrices and their operations, linear dependence and independence of matrices, linear spaces and subspaces, linear span, base and dimension of space, the null space of a matrix, homogeneous linear systems, rank of a matrix, trace of a matrix, LU, spectral and singular value (SVD) decompositions of a matrix, generalized inverse of a matrix.
-
FMSAM25101 6 credits
Advanced Sampling and Estimation Methods in Official Statistics
Module aim
To introduce students to advanced statistical methods used in official statistics for data collection and population parameter estimation, covering various data sources, sampling strategies, nonresponse and outlier treatment, small area estimation, and modern data sources such as administrative data and big data.
Module description
This course covers a range of statistical methods applied in the practice of official statistics. Students will explore both traditional and modern data sources (censuses, surveys, administrative registers, big data), fundamental concepts of sampling theory, and probabilistic as well as non-probabilistic sampling. Topics include complex sampling techniques-such as stratified and unequal probability sampling-and the use of ratio, regression, and calibrated estimators. The course also addresses practical challenges such as nonresponse, data editing, imputation, and validation. A strong focus is placed on small area estimation, indirect and model-based estimators. Additionally, the course examines the integration of independent samples and the combination of probability and non-probability data, as well as issues of confidentiality and dissemination of statistical results.
2 Semester
obligatory
-
FMSAM17254 9 credits
Statistical Analysis by Sampling Methods (with course work)
Module aim
To give knowledge about estimate of results in sampling survey under the complicated real conditions.
Module description
Estimators of finite population parameters for any sampling design are being studied. Resampling methods for estimation of variances of the estimatoros of parameters are applied. Estimators in the case of nonresponse are investigated. Computer software for estimation in the case of complex sampkling design is presented.
-
FMSAM17253 6 credits
Mathematics of Insurance
Module aim
After successful completion of this course participant is expected to know main actuarial models used in life insurance, general insurance and /or pension funds and be able to apply these models for determination of premiums and reserves.
Module description
Main actuarial models used in life and general insurance are introduced in this course. Models are then used to define insurance premium and / or calculate reserves of life insurance office.
-
FMSAM17255 6 credits
Statistical Modeling and Analysis of Structures
Module aim
To familiarize students with widespread models used in econometric analysis, algorithms of their identification and pecularities of their practical applications.
Module description
The purpose of econometrics and main types of econometric models are discussed. The problems of fitting of linear regression model are outlined: multicolinearity, selection of informative factors, application of dummy variables, outliers. Making use of instrumental variables when identifying linear regression models is discussed. The trend function, seasonal index, models of univariate time series analysis and vector autoregression model are introduced. The concept of structural VAR models, cointegration and vector error correction model are presented. Forecasting and modelling of macroeconomical processes is discussed.
-
FMSAM17207 3 credits
Master's Research Work 2
Module aim
Collect and analyze information on the topic of student’s Master’s Thesis by applying scientific research methods.
Module description
Information retrieval: information needed to address problem, searching and reviewing of relevant literature sources, information analysis, evaluation, and recording. Data gathering: data gathering techniques, statistical data processing, other data analysis methods.
-
FMSAM24274 9 credits
Modern Databases: From Relational to Distributed Systems (with course work)
Module aim
It will be given knowledge about relational and distributed databases, be able to create the database to meet their needs, be able to record, update and retrieve data from any type of database.
Module description
The course is designed to introduce students to relational and distributed databases, their different types. Examples are given of how to create, insert, edit or retrieve data in each case. The advantages and disadvantages of each of them are presented.
-
FMSAM25274 6 credits
Bayesian Methods
Module aim
To make acquaintance with the principles of the Bayesian inference in statistics and apply this knowledge in practice.
Module description
In contrast to the classical statistics, data is considered as fixed and it’s distribution parameters are considered as random in Bayesian statistics. Inference about the parameters and other unknown values is updated applying information obtained from the data. Using Bayesian theorem statistical models are constructed. Prior distributions, data likelihood, posterior distributions and their approximation are studied for this. Bayesian methods are applied for the moedls of the classical statistics. Advantages and disadvantages of classical and Bayesian methods are compared; acquitance with the estimation of the hierarchical models is made. Suitability of the statistical models is tested applying information criteria. R software is used for practical application of Bayesian methods.
-
FMSAM22272 6 credits
Optimization Problems in Statistics
Module aim
In this course, goal is to give students a foundation in common computational techniques and algorithms that are used in the implementation of statistical methods.
Module description
Computational methods play significant role in modern statistical data analysis. This course is an introduction to the modern, computationally intensive methods in statistics. Topics include numerical optimization, least squares and maximum likelihoo methods, dimensionality reduction, simulation and Monte Carlo methods including Markov chain Monte Carlo (MCMC) and other modern topics.
-
FMSAM25222 6 credits
Advanced Machine Learning
Module aim
To provide students with deep theoretical and practical knowledge of advanced machine learning methods, develop their ability to apply complex algorithms in real-world data analysis, independently evaluate model suitability and interpretability, and prepare them for further work in the fields of data science and artificial intelligence.
Module description
This course is designed to deepen students’ knowledge of advanced machine learning methods not covered in undergraduate studies. It focuses on more complex supervised and unsupervised learning algorithms, including regularization techniques, ensemble models (bagging, boosting), parametric and non-parametric classifiers, dimensionality reduction methods, and outlier detection techniques. The course also introduces the principles of Automated Machine Learning (AutoML) and emphasizes the importance of model interpretability in modern data science. Special attention is given to applying these methods to real-world data and interpreting the results to support data-driven decision-making.
-
FMSAM17207 3 credits
Master's Research Work 2
Module aim
Collect and analyze information on the topic of student’s Master’s Thesis by applying scientific research methods.
Module description
Information retrieval: information needed to address problem, searching and reviewing of relevant literature sources, information analysis, evaluation, and recording. Data gathering: data gathering techniques, statistical data processing, other data analysis methods.
one of the following
-
FMSAM17265 6 credits
Queuing Theory
Module aim
To present some basic notions of probability theory and stochastic processes which are necessary for analysis of queueing systems. To introduce queueing systems and methods of their analysis.
Module description
After successful completion of this course participant is expected to understand main concepts of Queueing Theory, its probabilistic background and main modelling features. Main probability distributions and fundamentals of Poisson, Markov, birth and death stochastic processes are presented.
-
FMSAM17256 6 credits
Risk Theory
Module aim
After successful completion of this course participant is expected to know main theories describing behaviour of individuals (agents) in risky situations, main risk measures and criteria for selection under risk. Participant is also expected to be able to apply theoretical knowledge in practical situations.
Module description
Main theories describing behaviour of individuals (agents) in risky situations, their financial decisions under risk, main risk measures and criteria for selection under risk are presented in this course. It is shown how theoretical model may be applied in real situations when making insurance and / or participation in lottery (investment) decisions.
3 Semester
obligatory
-
FMSAM16352 9 credits
Data Analysis Methods (with course work)
Module aim
To familiarize students with modern methods of mathematical statistics and related data analysis algorithms as well as pecularities of their practical applications.
Module description
The basic concepts of stochastic modelling are presented. Popular methods of statistical identification are reviewed. In particular, method of maximum likelihood and generalized least squares method are discribed emphasizing their advantages and disadvantages. Principals of nonparametric estimation are formulated, and combining of nonparametric and parametric methods in the data analysis is discussed. An introduction into robust statistical analysis is presented. The bootstrap and crossvalidation methods and their application for selection of appropriate model from several alternatives and for evaluation of identification accuracy are described. The main stages and problems of multivariate data analysis are outlined. The generalized linear regression model and methodology for estimation of its parameters are presented. The basic methods of discriminant and cluster analysis are introduced.
-
FMSAM16362 6 credits
Analysis and Forecasting of Economic Indicators
Module aim
To introduce the theoretical and practical aspects of the specification of econometric models, to overlook the econometric models made and applied in Lithuania and other countries, to evaluate forecasting possibilities of econometric models.
Module description
Economic indicators can be analysed in spatial cross section and time, therefore applied statistical methods are divided into methods of statistical analysis of independent statistical data and methods of time series analysis. Students are introduced with the linear and nonlinear modeling of economic indicators. The aspects of the economical indicators time series pre-adjustment nonobserved components estimation are introduced during the course. Theoretical knowledge are realized in practical exercises with statistical software R, Demetra+ ant oth.
-
FMSAM16357 6 credits
Mathematical Models of Financial Markets
Module aim
To present the main methods of mathematical modelling of financial data, dynamic asset pricing and investment risk analysis.
Module description
In the course, linear and nonlinear stochastic models of financial data and considered. Using the concepts of absence of arbitrage, agent optimal behaviour and market equilibrium, the pricing models of stocks, bonds and derivative securities are studied. In addition, the problems of optimal portfolio and consumption choise are investigated.
-
FMSAM16308 3 credits
Master's Research Work 3
Module aim
Experimental planning, execution and initial analysis of results.
Module description
This module is designed to plan, perform and analyze the results of the experiment that was provided in the module “Final Thesis 2”. The experiment must approve or disprove the assumptions made in the module “Final Thesis 2”.
Student should get acquainted with experiment planning and execution methods, compare the achieved results with similar research done by other researchers and evaluate the results’ efficiency.
-
FMSAM24366 9 credits
Natural Language Processing and Analysis (with course work)
Module aim
To provide students with some knowledge and experience applicable to real-life problems which involve the analysis of textual data.
Module description
This course is aimed to equip the students with tools appropriate for the analysis of large collections of unstructured textual data. It gives an overview of both data structures applicable to the forementioned task and the main algorithms for analyzing the content and structure of written communication.
-
FMSAM24374 6 credits
Bayesian Methods
Module aim
To make acquaintance with the principles of the Bayesian inference in statistics and apply this knowledge in practice.
Module description
In contrast to the classical statistics, data is considered as fixed and it’s distribution parameters are considered as random in Bayesian statistics. Inference about the parameters and other unknown values is updated applying information obtained from the data. Using Bayesian theorem statistical models are constructed. Prior distributions, data likelihood, posterior distributions and their approximation are studied for this. Bayesian methods are applied for the moedls of the classical statistics. Advantages and disadvantages of classical and Bayesian methods are compared; acquitance with the estimation of the hierarchical models is made. Suitability of the statistical models is tested applying information criteria. R software is used for practical application of Bayesian methods.
-
FMMMM22301 6 credits
Big Data Processing Technologies
Module aim
The goal is to introduce the basic numerical methods and to learn how to apply these methods for solution of specific problems.
Module description
In this course students learn the concepts of computer arithmetic and stability of numerical algorithms, numerical methods for solution of nonlinear equations and systems of equations, direct and iterative methods for solution of linear systems of equations, interpolation and approximation, numerical methods for solution of eigenvalue and eigenfunction problems, optimization methods, and numerical integration methods.
Students must attend at least 60% of the time scheduled practical works and 50% of the lectures.
-
FMSAM16308 3 credits
Master's Research Work 3
Module aim
Experimental planning, execution and initial analysis of results.
Module description
This module is designed to plan, perform and analyze the results of the experiment that was provided in the module “Final Thesis 2”. The experiment must approve or disprove the assumptions made in the module “Final Thesis 2”.
Student should get acquainted with experiment planning and execution methods, compare the achieved results with similar research done by other researchers and evaluate the results’ efficiency.
4 Semester
obligatory
-
FMSAM16409 30 credits
Master Graduation Thesis
Module aim
To develop the ability to formulate the scientific conclusions of the research work, to prepare and publish the publication of the main results, to prepare the final master’s thesis and to present it to the qualification commission.
Module description
Scientific and practical conclusions are made, publications are prepared, prepared work is presented to the qualification commission.
-
FMSAM16409 30 credits
Master Graduation Thesis
Module aim
To develop the ability to formulate the scientific conclusions of the research work, to prepare and publish the publication of the main results, to prepare the final master’s thesis and to present it to the qualification commission.
Module description
Scientific and practical conclusions are made, publications are prepared, prepared work is presented to the qualification commission.
Statistics
| Metric | Value |
|---|---|
| Enrolled students | 5 |
| Enrolled to FT | 4 |
| Min FT grade | 8.88 |