Biography
Bartosz Ziółko (PhD), AGH professor, individual investor. He was the co-founder and CEO of Techmo - a technology company providing solutions in the field of speech recognition and generation.
Studies in Electronics and Telecommunications at AGH. PhD in Computer Science at the University of York. Habilitation in 2017. Over 100 scientific papers, two patents granted by the USPTO and one by the EPO. Author of the book "Speech Processing". Scholar at Hokkaido University in Japan, participant of the TOP 500 Innovators program at Stanford.
Research Interests
His research interests include:
- Financial Data Analysis
- LLM-based reasoning across heterogeneous data sources
- Automatic speech recognition (ASR)
- Natural language processing (NLP)
- Prediagnosis of diseases based on speech
He participated in over 10 national and European research projects. The company he built created a speech recognition system that processed over 100 million conversations.
Courses
- Text Algorithms
- Evolutionary Algorithms
- Multidimensional Data Analysis
- Databases in Data Mining
- Data Exploration
Open Thesis Topics
PhD Proposal
Multimodal Reasoning with LLMs for Financial Data
The primary objective is to investigate the mechanisms by which information from unstructured alternative data sources is transmitted to financial instrument prices. The project aims to verify the hypothesis that analyzing heterogeneous signals (textual, behavioral, and operational) allows for earlier identification of changes in a company's business paradigm than traditional models.
The research is designed as a multi-layered framework focused on the integration of high-dimensional, heterogeneous data streams into a unified predictive environment. The core lies in the development of a multimodal architecture capable of synchronizing classic fundamental financial indicators with unconventional alternative data, including NLP-processed conference transcripts, social media sentiment, and OSINT signals. The focus centers on the phenomenon of management board narrative coherence. By leveraging LLMs, the project will quantify the semantic alignment between strategic corporate statements and objective, real-world proxies. This involves the simultaneous processing of unstructured text, card transaction statistics, and spatial-temporal features derived from phone geolocation patterns and computer vision analysis of site-specific imagery.
The research will further explore a paradigm shift from traditional frequentist simulations to a Generative World Model approach, evaluating the efficacy of LLMs-as-a-Model in place of, or in hybrid conjunction with, classical Monte Carlo methods. While Monte Carlo techniques provide a robust mathematical baseline for uncertainty propagation, they often fail to capture the semantic complexity of non-linear market shocks. This project investigates the use of LLMs to perform Semantic Scenario Synthesis, where the model acts as a probabilistic reasoner capable of generating counterfactual states based on latent environmental variables.
Master's Thesis
Application of large language models (LLM) to synthesize and interpret heterogeneous market data for personalized investment decision support
Goal: Design, implementation, and evaluation of an advanced investment decision support system using LLMs and Retrieval-Augmented Generation (RAG) architecture.
Research Problem: Addressing "information overload" for investors. The system synthesizes distributed and often contradictory data (stock data, macro indicators, reports) to generate personalized analytical briefs tailored to a specific portfolio strategy, grounding answers in factual data.
Engineering Thesis (BSc)
A system for automatic collection and normalization of economic data from distributed sources
Scope: Technical challenges related to the ETL (Extract, Transform, Load) process in the context of financial data.
- Acquisition Module (Extract): Implementation of API clients and web scraping mechanisms, error handling, and task scheduling.
- Processing Module (Transform): Algorithms for data cleaning (handling missing values, anomalies) and normalization (unifying formats).
- Storage Module (Load): Design of an efficient database (SQL or Time-Series, e.g., InfluxDB).
- Access Module (API/UI): REST API or a basic UI for filtering, aggregation, and data export (CSV/JSON).