Bartosz Ziółko (PhD), AGH Professor - AI & Speech Processing

Biography

Bartosz Ziółko (PhD), AGH professor, individual investor. He was the co-founder and CEO of Techmo - a technology company providing solutions in the field of speech recognition and generation.

Studies in Electronics and Telecommunications at AGH. PhD in Computer Science at the University of York. Habilitation in 2017. Over 100 scientific papers, two patents granted by the USPTO and one by the EPO. Author of the book "Speech Processing". Scholar at Hokkaido University in Japan, participant of the TOP 500 Innovators program at Stanford.

Research Interests

His research interests include:

Financial Data Analysis
LLM-based reasoning across heterogeneous data sources
Automatic speech recognition (ASR)
Natural language processing (NLP)
Prediagnosis of diseases based on speech

He participated in over 10 national and European research projects. The company he built created a speech recognition system that processed over 100 million conversations.

Courses

Text Algorithms
Evolutionary Algorithms
Multidimensional Data Analysis
Databases in Data Mining
Data Exploration

Open Thesis Topics

PhD Proposal

Multimodal Reasoning with LLMs for Financial Data

The primary objective is to investigate the mechanisms by which information from unstructured alternative data sources is transmitted to financial instrument prices. The project aims to verify the hypothesis that analyzing heterogeneous signals (textual, behavioral, and operational) allows for earlier identification of changes in a company's business paradigm than traditional models.
The research is designed as a multi-layered framework focused on the integration of high-dimensional, heterogeneous data streams into a unified predictive environment. The core lies in the development of a multimodal architecture capable of synchronizing classic fundamental financial indicators with unconventional alternative data, including NLP-processed conference transcripts, social media sentiment, and OSINT signals. The focus centers on the phenomenon of management board narrative coherence. By leveraging LLMs, the project will quantify the semantic alignment between strategic corporate statements and objective, real-world proxies. This involves the simultaneous processing of unstructured text, card transaction statistics, and spatial-temporal features derived from phone geolocation patterns and computer vision analysis of site-specific imagery.
The research will further explore a paradigm shift from traditional frequentist simulations to a Generative World Model approach, evaluating the efficacy of LLMs-as-a-Model in place of, or in hybrid conjunction with, classical Monte Carlo methods. While Monte Carlo techniques provide a robust mathematical baseline for uncertainty propagation, they often fail to capture the semantic complexity of non-linear market shocks. This project investigates the use of LLMs to perform Semantic Scenario Synthesis, where the model acts as a probabilistic reasoner capable of generating counterfactual states based on latent environmental variables.

Master's Thesis

Application of large language models (LLM) to synthesize and interpret heterogeneous market data for personalized investment decision support

Goal: Design, implementation, and evaluation of an advanced investment decision support system using LLMs and Retrieval-Augmented Generation (RAG) architecture.

Research Problem: Addressing "information overload" for investors. The system synthesizes distributed and often contradictory data (stock data, macro indicators, reports) to generate personalized analytical briefs tailored to a specific portfolio strategy, grounding answers in factual data.

Engineering Thesis (BSc)

A system for automatic collection and normalization of economic data from distributed sources

Scope: Technical challenges related to the ETL (Extract, Transform, Load) process in the context of financial data.

Acquisition Module (Extract): Implementation of API clients and web scraping mechanisms, error handling, and task scheduling.
Processing Module (Transform): Algorithms for data cleaning (handling missing values, anomalies) and normalization (unifying formats).
Storage Module (Load): Design of an efficient database (SQL or Time-Series, e.g., InfluxDB).
Access Module (API/UI): REST API or a basic UI for filtering, aggregation, and data export (CSV/JSON).