# Concrete topics for PhD research

### Ruzanna Chitchyan

**Topics on Software Engineering for Sustainability:** These topics relate to understanding the implications of sustainability needs on the systems engineering, with main focus on requirements engineering, system modelling, and design. Software is in the centre of many human activities. Thus, it is essential to review and modernise the way software is developed and used to ensure that both its' direct and indirect effects support the relevant aspects of sustainabilnirity. The direct effects of software use are in the way it enables immediate resource consumption/preservation and facilitates the in-situ organisational/social processes. For instance, a programme for party organisation may help minimise food waste by carefully checking how many people will be present and what type/portion of food they will use. The indirect effects, on the other hand, are its long-term effects (such as changes in societal lifestyle (e.g., due to use of social media), or changes in patterns of resource use triggered by it. There is a large set of topics in this area, and my research interests are not limited to the examples below.

Some references:

- M. Mahaux, P. Heymans, and G. Saval, “Discovering Sustainability Requirements: An Experience Report,” in Requirements Engineering: Foundation for Software Quality, 2011, pp. 19–33. URL: http://info.fundp.ac.be/~mma/wordpress/wp-content/uploads/2012/02/paperExpReport.pdf
- J. Cabot, S. Easterbrook, J. Horkoff, L. Lessard, S. Liaskos, and J. N. Mazón,“Integrating Sustainability in Decision-Making Processes: A Modelling Strategy,” in 31st International Conference on Software Engineering-Companion Volume, 2009. ICSE-Companion 2009, 2009, pp. 207–210. URL: http://www.cs.utoronto.ca/~jenhork/Papers/Sustainability_ICSE_NIER.pdf

- Engineering Sustainability Requirements: at this time it is not clear what sustainability requirements actually are. Sustainability is a wide and complex concept, it is often defined as comprising social, economic, and environmental aspects, each of which may be perceived quite differently in a given application area or discipline. The first challenge in tacking the sustainability problem in software design is in understanding and mapping out this complex concept into a format useable by requirements engineers (e.g., a soft-goal decomposition catalogue). This, obviously, will have to be validated via case studies and supported by a tool.
- Measuring Sustainability Impact of Software: this project will develop techniques and tools for quantifying the above mentioned direct and indirect sustainability effects delivered by a particular choices in software design. For instance, working from home may reduce resource consumption (e.g., eliminating travel-related fuel consumption, and office space use), but adversely affect social interactions of a person (e.g., by eliminating opportunities for direct contact with colleagues and ability to build real relationships). How can such effects be measured? What kind of tools will be needed to support such quantification?

**Advanced Software Modularity:** These topics are related to new ways of modularising and composing software systems, with specific focus on requirements and architecture design areas, such as:

- aspect-oriented requirements engineering;
- requirements and architecture design techniques for software product line engineering;
- requirements engineering for service-oriented systems.

Some References:

- "Semantics-Based Composition for Aspect-Oriented Requirements Engineering", R. Chitchyan, A. Rashid, P. Rayson, R. Waters, Proceedings of the 6th International Conference on Aspect-Oriented Software Development (AOSD 2007). URL: http://www.comp.lancs.ac.uk/computing/aop/papers/AOSD_2007.pdf
- "A Framework for Constructing Semantically Composable Feature Models from Natural Language Requirements", N. Weston, R. Chitchyan, A. Rashid. Proceedings of the 13th International Software Product Line Conference (SPLC 2009).

**Creative Engineering of Systems for Future:** This is a follow on from work on the All-in One project. The All-in-One project sets out to deliver a scientific and technological data collection and aggregation support for envisioning future (100 years from now) utilities delivery infrastructure. With this work in progress, there is a need to design and develop methodologies and tools for analysis of the available scientific and technological data for (predictive) computing of the future technological landscape for a given utility area (e.g., water supply, or ICT). Since utilities can be perceived as a type of service, this research can be pursued with respect to other service-like application areas as well (not only utilities).

Reference:

- “Energy and Utilities Infrastructure: Can All be in One?”, F. Camci, B. Ulanicki , J. Boxall, R. Chitchyan , L. Varga. Conference on Energy and People: Futures, Complexity and Challenges, 2011. URL: http://www.ukerc.ac.uk/support/tiki-download_file.php?fileId=1938&display

Ruzanna Chitchyan's homepage and contact details

### Roy Crole

**Programming language semantics** research makes use of techniques from mathematical logic, category theory and type theory to develop the foundations of programming languages with a long term goal of producing new and better programming languages and environments. I am interested in axiomatic, operational, categorical and denotational semantics. One creates mathematical models and methods which inform language design and assist with the development of logical systems for specifying and reasoning about program properties. More specifically this research:

- aims to increase our general understanding of the theory and fundamentals of programming and computation;
- aims to underpin and clarify our understanding of existing programming languages and systems, and can be used as a basis for analysis, specification, verification, and testing; and
- can lead to the development of fundamentally new languages and systems.

**Mathematical Models of Variable Binding**: In the last decade we have advanced our understanding of the formal notion of variable binding considerably. Presheaf categories have been shown to provide models of standard languages with variable binding, such as (variants of) the lambda calculus, as initial algebras. This is a considerable advancement over older models in which expressions with binders could only be realised as alpha-equivalence classes of elements of initial algebras. The important gain here is the initial algebra gives rise to a clear principle of structural induction. My own work has developed particular kinds of presheaf category models, and a PhD thesis could be based upon the further development of these models and the associated reasoning principles.**Computer Systems for Reasoning About Binding:**The Hybrid system was developed in Isabelle HOL. It is a metalogic which can be used to reason about languages with binding. One translates the object language into Hybrid, and then uses Hybrid's reasoning principles (such as induction and coinduction) to prove properties of the language. There are a variety of research avenues to explore, ranging from applications of Hybrid which could involve the testing of Hybrid and its comparison with other systems of reasoning with binders, to the further development of the Hybrid system itself.**The Nominal Lambda Calculus**The theory of nominal sets has been developed over the past decade as a very general model of variable binding. There are also various kinds of nominal category that provide generalisations of nominal sets. Recent work has been to develop a nominal lambda calculus and a categorical semantics. There are various ways in which these ideas can be further developed, leading to a PhD thesis.

Roy Crole's homepage and contact details

### Fer-Jan de Vries

Rewriting and Lambda Calculus We are interested to extend the theory of rewriting with infinite reductions and infinite terms. This leads to new insights when one tries to generalise results and concepts of finite rewriting to the infinite setting. See http://www.cs.le.ac.uk/people/fdv1/ for papers that I've written with colleagues. Areas that are waiting to be explored are:

**Models for infinitary lambda calculus:**the study of infinite lambda calculus has revealed an abundance of different syntactic models. Is it possible to construct models that induce the same equality relations as these syntactic models with other means, for instance via domain equations? The converse question is of interest too: given a known semantics of the lambda calculus, what is the corresponding infinite lambda calculus whose term model induces the same equality relation?**Infinite lambda calculi without renaming:**traditional finite lambda calculi follow a variable convention. Various solutions have been proposed to circumvent this problem: for instance De Bruijn indices or the Adbmal calculus of Hendriks and van Oostrom. Do these solutions generalise?**Weakly orthogonal rewriting**is confluent in finite rewriting. Does this generalise to infinite rewriting?**Undefined terms:**for term rewriting and for lambda calculus there is a notion of meaningless set. Meaningless terms need to be identified in order to regain confluence in the infinite orthogonal setting. Is there a notion of meaningless set for other rewriting formats, like Klop's combinatory reduction systems?**Decreasing diagrams**is a powerful technique to prove confluence in finite rewriting. Can it be generalised to prove confluence in the infinite setting?#

Fer-Jan de Vries's homepage and contact details

### Thomas Erlebach

**Algorithmic Aspects of Comunication Networks.** Numerous challenging optimization problems arise in the design and operation of communication networks such as optical networks, cellular wireless networks, mobile wireless ad-hoc networks, and GRID computing. Examples of such problems include network design, resource provisioning, resource allocation, scheduling, virtual backbone construction, and call admission control. Students working on such topics typically identify suitable models and problem formulations, design and analyze algorithms (approximation algorithms, on-line algorithms, distributed algorithms, algorithmic game theory), and implement simulations of algorithms for the purpose of experimental evaluation and comparisons. The focus can be more on the theoretical analysis (proofs of worst-case performance guarantees, competitive analysis) or on the experimental and algorithm engineering side, but typically the research includes aspects of both.

This is a broad topic that can accommodate a number of specific PhD research projects. Note, however, that I already have a number of PhD students and, for the moment, will accept additional students only if they are exceptionally well qualified.

List of skills required:

- Ability to understand the design and analysis of algorithms.
- Ability to understand and apply basic concepts of discrete mathematics (graphs, combinatorics, proofs).
- Ability to write about algorithms and proofs in a precise and rigorous way.
- Creativity in prolem-solving.
- Ability to write programs in a suitable programming language such as Java or C++ (only if the PhD project includes a simulation or implementation component).

List of relevant optional modules that can be attended at University of Leicester:

- CO7200 Algorithms for Bioinformatics
- CO7212 Game Theory in Computer Science
- CO7213 Networking and Distributed Computing

Some suggested reading:

- Introduction to Online Algorithms:

Susanne Albers: Competitive Online Algorithms. BRICS Lecture Series, LS-96-2, 1996. Available at: http://www.brics.dk/LS/96/2/ - Survey of Approximation Algorithms for Disjoint Paths Problems:

Thomas Erlebach: Approximation Algorithms for Edge-Disjoint Paths and Unsplittable Flow. In E. Bampis et al. (Eds.): Efficient Approximation and Online Algorithms, LNCS 3484, Springer, 2006, pp. 97-134. Available at: http://www.cs.le.ac.uk/people/te17/papers/edpsurvey.pdf - Survey of Algorithmic Models of Ad-hoc/Sensor Networks:

S. Schmid and R. Wattenhofer: Algorithmic Models for Sensor Networks. In Proceedings of the 14th International Workshop on Parallel and Distributed Real-Time Systems (WPDRTS), April 2006. Available at: http://www.dcg.ethz.ch/members/pascal/refs/wpdrts06cam.pdf - Example of an algorithmic result for virtual backbone construction in ad-hoc networks:

Christoph Ambuehl, Thomas Erlebach, Matus Mihalak, Marc Nunkesser: Constant-factor approximation for minimum-weight (connected) dominating sets in unit disk graphs. In Proceedings of the 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX 2006), LNCS 4110, Springer, 2006, pp. 3-14. Extended version available at: http://www.cs.le.ac.uk/people/te17/papers/CS-06-008.pdf

Thomas Erlebach's homepage and contact details

### Stanley Fung

Design and analysis of algorithms, particularly in the areas of

- Online algorithms
- Combinatorial optimization
- Computational biology and bioinformatics

Skills:

Background in discrete mathematics and algorithms as provided in most CS undergraduate degrees

Suggested MSc modules:

Analysis and Design of Algorithms. provides the above skills. C++ Programming and Advanced Algorithm Design and Game Theory in Computer Science may also be useful.

Papers:

- S. Albers, Online algorithms: a survey, Mathematical Programming 97 (1-2), p. 3-26, July 2003. http://www.informatik.uni-freiburg.de/~salbers/ismp03.ps.gz
- A. Borodin and R. El-Yaniv, Online Computation and Competitive Analysis, Cambridge University Press, 1998.

Stanley Fung's homepage and contact details

### Reiko Heckel

**Modelling and analysis of social networks by graph transformation**

A network can be seen as a graph, and its evolution over time modelled by graph transformation rules. See http://www.cs.le.ac.uk/people/rh122/papers/2006/Hec06Nutshell.pdf for an introduction to graph transformation and google 'social network analysis' to get an idea of the problem.

Graph transformation systems (GTS), see http://www.cs.le.ac.uk/people/rh122/papers/2006/Hec06Nutshell.pdf, are more expressive than nets, and indeed every net can be seen as a simple GTS. Lifting ideas and concepts from the Petri net world, we would like to establish GTS as a technique for modelling workflows / business processes.

Good knowledge and skills in at least two out of the following three areas is required.

- OO development (programming and UML).
- basic theoretical computer science (automata and formal languages, propositional and first order logic, basic set theory).
- classical maths (specifically analysis and probability theory).

**Test management for web applications based on probabilistic models**

With limited resources (computing, personnel and time) it is not possible to exhaustively test every feature of a web-application on every browser or device. A key challenge of any software engineering project is to decide how to deploy your testing resources. The goal is to detect faults in the application, but we must decide whether to focus on bugs that have the biggest impact on a user, those that will affect most users, those that are most easily fixed by developers, etc.

That means to make decisions as to which features of the application to focus on and which browsers or devices to use. As we receive the results of testing we learn about which choices reveal the most defects and so adapt our testing strategy to focus more tightly on specific areas. However, as applications evolve and bugs are removed, the areas in which we should focus also change. If we think about the probability that a defect is exposed in a certain feature when viewed in a certain browser using a probability distribution, it should be possible to model these distributions and use them to guide our decision making on how to focus our testing effort.

On the more technical side it would be interesting to develop tools that allow us to maintain, visualise and use these model throughout a software project. Automated tested adds another dimension of complexity to this because we can use it to test certain functionality across browsers and devices, but this itself requires resources to develop and maintain.

**The project will be co-supervised with Dr Richard Craggs, Release Manager at Jadu.**

Reiko Heckel's homepage and contact details

### Michael Hoffmann

**1.** **Algorithm with uncertainty:** The area of update algorithms for problems with uncertainty is a relative recent developed topic. For most classical algorithms it is assumed that the input is precise. In this model each input might be given in a rough (uncertain) form. The precise value for each input can be obtained for a certain cost. The algorithm has to decide for which input data the the extra costs needs to be paid to solve the underlying problem. Algorithm with uncertainty are rated on cost effective they are.

Prerequisites:

- Programming in C++, Java, or similar computer language
- Knowledge in time and space complexity
- Knowledge in statistical analysis

Related publication:

- Thomas Erlebach, Michael Hoffmann, Danny Krizanc, Matu's Mihala'k, Rajeev Raman: Computing Minimum Spanning Trees with Uncertainty. STACS 2008: 277-288T.

**2. Online Algorithms:** Quota from wikipedia: (http://en.wikipedia.org/wiki/Online_algorithm) "In computer science, an online algorithm is one that can process its input piece-by-piece, without having the entire input available from the start. In contrast, an offline algorithm is given the whole problem data from the beginning and is required to output an answer which solves the problem at hand. (For example, selection sort requires that the entire list be given before it can sort it.)

As an example of the problem consider the problem of finding a shortest path in a finite connected graph when the graph is unknown and the algorithm receives the node neighbors only when it "enters" the node. It is clear that this problem can not be solved optimally without a simple exhaustive search. Thus, new performance measures have to be introduced, such as competitive analysis, which compares the performance of an online algorithm with that of a hypothetical offline algorithm that knows the entire input in advance."

In this general topic some of my recent research focused on network discovery. Where a structure of the network is not known and by probing certain nodes the whole network will be discovered.

Prerequisites:

- Programming in C++, Java, or similar computer language
- Knowledge in time and space complexity
- Knowledge in statistical analysis

Related publication:

- Zuzana Beerliova, Felix Eberhard, Thomas Erlebach, Alexander Hall, Michael Hoffmann, Matu's Mihala'k, L. Shankar Ram: Network Discovery and Verification. IEEE Journal on Selected Areas in Communications 24(12): 2168-2181 (2006)

Michael Hoffmann's homepage and contact details

**Leandro Minku**

I am interested in the areas of computational intelligence for software engineering and machine learning for non-stationary environments.

**Computational Intelligence for Software Engineering**

Due to the increasing size and complexity of software systems, software engineering problems such as software testing, software project scheduling and software maintenance have become more difficult to handle. In order to reduce the high cost of performing software engineering tasks and increase software reliability, computational intelligence approaches can be used as decision-support tools. The following two areas of research have been showing promising results in this context:

- Search-Based Software Engineering (SBSE)

SBSE is concerned with (at least partly) automating software engineering problems. This is done by reformulating software engineering problems as optimisation problems, and then using optimisation algorithms to solve them. For example, software project scheduling can be viewed as the optimisation problem of allocating employees to tasks in a software so as to minimise the cost and completion time of this project. Software test case generation can be seen as the optimisation problem of generating test cases so as to maximise code coverage and minimise testing effort. Software requirements selection can be seen as the optimisation problem of finding a set of requirements that maximises stakeholders' satisfaction while remaining within budget. Several other software engineering tasks can also be viewed as optimisation problems, such as improving existing software code and architectures to better deal with non-functional requirements.

- Machine Learning for Software Engineering

This area of research formulates software engineering problems as machine learning problems, and then makes use of existing software data repositories to create machine learning models to solve them. For example, software effort estimation is the problem of estimating the effort required to develop software projects based on metrics describing this project, such as team expertise, development type, memory requirements, etc. Models for predicting the effort of future projects can be created based on data describing completed software projects. Another example is software defect prediction, which consists in predicting whether a certain module (e.g., a class, package or file) of a software program is likely to contain defects. Defect prediction models can be created based on data describing previous versions of software programs. Other software engineering tasks that can be tackled by machine learning include prediction of software energy consumption, identification of security bugs, etc.

Selected references:

MINKU, L. L.; SUDHOLT, D.; YAO, X. . "Improved Evolutionary Algorithm Design for the Project Scheduling Problem Based on Runtime Analysis", IEEE Transactions on Software Engineering, 40(1):83-102, 2014, doi: 10.1109/TSE.2013.52.

MINKU, L. L.; YAO, X. . "Software Effort Estimation as a Multi-objective Learning Problem", ACM Transactions on Software Engineering and Methodology, 22(4):35.1-32, 2013, doi: 10.1145/2522920.2522928.

MINKU, L. L.; YAO, X.; "How to Make Best Use of Cross-company Data in Software Effort Estimation?", Proceedings of the 36th International Conference on Software Engineering (ICSE'14), p. 446-456, 2014, doi: 10.1145/2568225.2568228.

**Machine Learning for Non-stationary Environments**** **

** **

Most machine learning algorithms operate in off-line mode. They first learn how to perform a particular task and then are used to perform this task. No task can be performed during the learning phase and, after the learning phase is completed, the system cannot further improve or change. However, many real-world problems do not allow us to see all data in advance and are not intrinsically static. For example, consider an information filtering system which predicts the user’s reading preferences. It is not possible to collect all users’ information beforehand for learning. New users can join in and old users may leave at some point. Users can also change their preferences with time and a system which learnt how to predict them in the past will fail if it cannot update according to the new ones. Other examples include systems for computer security, spam detection, price prediction and web pages classification.

Different from offline learning algorithms, online learning algorithms can be used for learning a particular task and making predictions at the same time. They are updated whenever a new training example is available. There are also algorithms for chunk-based learning. Rather than being updated whenever each new training example arrives, chunk-based algorithms are updated whenever a whole new chunk of data is received. Both these algorithms can incorporate strategies to deal with non-stationary environments. Research in this area includes several topics, such as how to deal with different types of non-stationary conditions, how to deal with class imbalance data, how to deal with missing data, how to deal with large scale data sets, how to deal with very small data sets, etc.

Selected references:

WANG, S.; MINKU, L. L.; YAO, X. . "Resampling-Based Ensemble Methods for Online Class Imbalance Learning", IEEE Transactions on Knowledge and Data Engineering 25(5):1356-1368, 2014, doi: 10.1109/TKDE.2014.2345380.

MINKU, L. L.; YAO, X. . "DDD: A New Ensemble Approach For Dealing With Concept Drift.", IEEE Transactions on Knowledge and Data Engineering, 24(4):619-633, 2012, doi: 10.1109/TKDE.2011.58.

MINKU, L. L.; WHITE, A. P.; YAO, X. . "The Impact of Diversity on On-line Ensemble Learning in the Presence of Concept Drift.", IEEE Transactions on Knowledge and Data Engineering 22(5):730-742, 2010, doi: 10.1109/TKDE.2009.156.

**Nir Piterman**

If you are interested in questions that have to do with two-player games, automata, algorithms related to them, or other things that appear on my research interests, there's something to talk about.

**p-Automata and related games:** These are new automata motivated by probabilistic CTL and alternating tree automata that accept Markov chains. Their study involve issues about automata (conversion between different transition modes of automata (deterministic, nondeterministic, alternating)), algorithmic questions about automata, the introduction of new games (obligation games) and algorithmic questions about such games.

You can have a look on the paper introducing them or the paper on obligation games .

**Synthesis of controllers from temporal specifications:** Synthesis (in my language) is the automatic conversion of a specification in linear temporal logic to an input-enabled program all of whose executions satisfy the specification.

This is a question that intrigued researchers in verification for many years, getting much attention, but only recently getting actual practical progress.

The techniques involved in doing this include the study of transition systems, two player games, and construction of strategies. This is also related to automata and temporal logic.

The approach that I have been advocating calls for restricting the type of specifications to a "reasonable" subset that can be handled in practice.

I have recently considered how to apply this to certain abstract ways of representing the world (here).

This is also related to questions about games with partial information, their solutions, and strategies controlling them (See survey, not mine).

**Dynamic Reactive Modules:** This is an approach for modeling transition systems that support allocation of new processing power (and its deallocation) based on variable sharing (rather than message passing). I find this very exciting as this could be related to ad-hoc networks and such things and hasn't been considered in the past.

See the paper introducing them.

Nir Piterman's homepage and contact details

### Rajeev Raman

I am interested in processing, querying and mining very large data. Specific areas include:

**Efficient Data Structures**Niklaus Wirth wrote a famous book in 1976 called Algorithms + Data Structures = Programs. What is meant by this is that algorithms manipulate data, and in order to write a good program, one should identify the operations that one performs on the data and choose an appropriate data structure. However, we are living in a world of BIG data, and classical data structures (such as trees or lists) can be incredibly inefficient.- One problem is that normal data structures take up very large amounts of RAM, compared to the data they store. For example, after reading a 1GB XML document into a DOM tree, the memory usage goes up to 15GB RAM. This increase in memory makes it impossible to process BIG data efficiently.
*Succinct and compressed*data structures solve this problem in novel and clever ways. For example, did you know that you can navigate in a binary tree, in which each node is represented using at most two*bits?*. I have many potential PhD topics in succinct data structures. See an example PhD Thesis I have supervised in this area. There are many practical applications; see e.g. the SiXML project, the Bowtie package. - I am also interested in novel algorithms for large-scale data processing, using say MapReduce or similar frameworks, but my interests are narrowly focussed. You will need to provide a general application area (what kind of data are you interested in processing? why is it challenging to process this kind of data using MapReduce?).
**Data Mining**I am interested in data mining. Data mining is of course an application that involves processing large amounts of data. Succinct data structures can help to process large amounts of data. In addition, I am working on two projects:- Mining Probabilistic Data. Much of data to be mined is automatically harvested and collected, and is therefore
*uncertain*due to sensor errors, inconsistent data in databases or the web etc. Mining such data is a challenge. See a recent PhD Thesis I supervised on this topic. - Tom Gransden, a student supervised jointly with Neil Walkinshaw, is investigating the use of Machine Learning algorithms to detect proof tactics used in Interactive Theorem Provers.

- Mining Probabilistic Data. Much of data to be mined is automatically harvested and collected, and is therefore

Selected References (my PhD students' names in bold)

**Muhammad Muzammal**, Rajeev Raman. Mining sequential patterns from probabilistic databases.*Knowledge and Information Systems*, 2014. 10.1007/s10115-014-0766-7, 2014.**Thomas Gransden**, Neil Walkinshaw, Rajeev Raman. Mining State-Based Models from Proof Corpora. CICM 2014: 282-297. Published version on arXiV.**Stelios Joannou**, Rajeev Raman. Dynamizing Succinct Tree Representations. Experimental Algorithms Lecture Notes in Computer Science Volume 7276, 2012, pp 224-235. Final submitted version.**Richard Geary**, Venkatesh Raman, Rajeev Raman. Succinct ordinal trees with level-ancestor queries.*ACM Transactions on Algorithms*, 2006, 2 (4), pp.510-534. DOI: 10.1145/1198513.1198516. Final submitted version.

Rajeev's homepage and contact details.

### Stephan Reiff-Marganiec

**1. Dynamic Adaptation of Workflows:** Workflows capture Business Processes, but business is dynamic so their workflows need to adapt. StPowla is an approach to process modelling for service-oriented systems. It has three ingredients: workflows to express core processes, services to perform activities and policies to express variability. Workflows are expressed using a graphical notation. Policies can make short-lived changes to a workflow instance, i.e. they last for the duration of the workflow instance and usually will be made during the execution of the instance, rather than applied to the overall workflow model. There are two types of policies: refinement and reconfiguration. Refinement policies specify criteria for the selection of the service that is ultimately chosen and invoked to execute the task at hand, while reconfiguration policies change the structure of the workflow instance by e.g. deleting or adding tasks. This area also covers the study of policy conflict.

Selected References:

- S Gorton, C Montangero, S Reiff-Marganiec and L Semini. StPowla: SOA, Policies and Workflows. WESOA 2007.
- L Bocchi, S Gorton and S Reiff-Marganiec. Engineering Service Oriented Applications: From StPowla Processes to SRML Models. FACS (Formal Aspects of Computing).

**2. Service Selection:** One area of service-oriented computing is concerned with identifying the right service for a user. This is a very active and interesting area and I am specifically interested in how can the best service for a specific user context be selected and how can the overall selection services for a business process be optimized. In the former area we have developed an extension of LSP (Logic Scoring for Preferences) that allows for automatic ranking of single services. An extension of this method for workflows is under consideration, however, for workflows we are also working on an Immune based algorithm for finding the global best.

Selected References:

- J. Xu and S Reiff-Marganiec. Towards Heuristic Web Services Composition Using Immune Algorithm. ICWS 2008.
- HQ Yu and S Reiff-Marganiec. A Method for Automated Web Service Selection. WSCA 2008.

**3. Virtual Organisations:** The idea of virtual organisations captured the fact that many smaller organisations can work together in dynamic ways to achieve larger goals that none of them could achieve on their own. We are studying how to model VOs at higher levels of abstraction, but in ways that will ultimately allow to reason about their behaviour. A future aspect of this work would be the consideration of a run-time platform to enable the operation of VOs.

Reference:

- L. Bocchi, J.L. Fiadeiro, N. Rajper and S. Reiff-Marganiec. Structure and Behaviour of Virtual Organisation Breeding Environments. FAVO 2009.

Stephan Reiff-Marganiec's homepage and contact details

### Tom Ridge

I'm interested in ways to make our I.T. systems more dependable and reliable. This could include:

- developing new programming language features
- identifying and implementing key I.T. infrastructure, e.g. persistent message queues
- using mathematics to prove that e.g. implementations of algorithms and systems are correct
- developing tools to make the above tasks easier e.g. theorem provers such as HOL4, and specification tools such as Ott and Lem

Some concrete current projects I would like PhD students to work on are:

**Verified filesystem** Filesystems are incredibly important, particularly when applications must provide some form of persistence for their data (eg a database must ensure that data survives a complete system failure). In order to develop applications that use the filesystem, one has to know what guarantees are being provided by the filesystem. Unfortunately current filesystems do not provide many guarantees beyond POSIX "atomic move" (if you move a file, and the system crashes, then either the file moved or it did not). The aim of this project is to develop a filesystem which guarantees certain key properties that application developers depend on. The filesystem will be developed in a functional language such as Haskell, OCaml or SML. Verification will use a theorem prover such as Isabelle, HOL, or Coq. The aim is to develop a reliable filesystem that can be used in real-world systems such as Linux.

**Filesystem implementation** The project is to develop a C implementation of a filesystem I have implemented in OCaml. The OCaml version is verified correct. Ideally the C implementation would have some direct (formal, mathematical) connection to the OCaml version, so that the C version would also be correct (there are several ways to make this more precise).

**Verified computational complexity** Lots of effort has been dedicated to reasoning formally about program correctness. We would now like to prove that implemented code has time and space complexity that it is claimed to have. A simple approach would be to define the atomic steps of a language, and count the number of steps a piece of code takes. However, informal reasoning tends to be at a much higher level than this. The aim of this project is to start developing the formal verification of computational complexity by looking at simple functional implementations e.g. of mergesort etc. Some recent work by Bob Harper et al. is relevant, here.

**Reasoning above operational semantics** One way to prove programs correct is to reason directly about their operational behaviour, as defined by the operational semantics for the language. There are many things that need to be done here, such as developing good operational semantics to support high-level reasoning; developing tools to make symbolic evaluation quicker; developing reasoning techniques for operational reasoning; porting existing techniques to the operational setting etc

**Access control matrices** Access control matrices (related to access control lists) are used to specify who has access to what resources. For example, the following states that user tom has access to all files, whereas jen has access only to File A.

|File A|File B|File C tom| Y | Y | Y jen| Y | N | N

Access control matrices are easy to implement. Unfortunately they are very difficult to manage, because high-level access control rules (e.g. only users with the role of administrator should be able to access all files) are not explicitly captured. Much existing research has been directed at this problem, but it should be possible to do something new and interesting by expressing the high-level rules in higher-order logic, then using automated provers to check that the properties are enforced.

**OCaml to C compilation** The project is to define how to translate OCaml (more likely, a simple functional programming language such as MiniML) to C. This would provide a general way to achieve the results of the "filesystem implementation" project above. The L4.verified project, to develop a verified operating system, did something similar, and it should be possible to use this existing work.

**OCaml to Scheme compilation** The project is to define how to translate OCaml (more likely, a simple functional programming language such as MiniML) to Scheme, similar to the OCaml to C compilation project above. Scheme is available pretty much everywhere, but implementations of ML-like languages are not so readily available. When new technologies such as Java (or .NET) come along, you want to be able to use all the new features and technology, combined with everything you love about ML. Scheme is usually supported very quickly e.g. from within Java via JScheme or Kawa, but it is still difficult to use ML-like languages with Java. This project aims to develop an ML to Scheme translator, allowing ML programmers to take advantage of existing Scheme/Java technology to run their ML code on the Java platform.

Tom Ridge's homepage and contact details

### Rick Thomas

**1. Automatic groups and semigroups:** There have been some intriguing interactions in recent years between group theory and theoretical computer science. One area which has proved to be very fruitful in providing interesting and useful results is that of automatic groups where computation in the group can (essentially) be performed via an associated set of finite automata. For example, it is known that any automatic group must be finitely presented and the word problem of an automatic group can be solved in quadratic time. For an overview of automatic groups, see [1].

Much of the research at Leicester is involved in extending this theory to semigroups. We find that some of the results in automatic groups do generalize to semigroups (such as the solution of the word problem in quadratic time) whereas others do not (for example, there exist automatic semigroups that are not finitely presented). See [2] for an introduction to automatic semigroups. As an example of the work done in this area, see [3] where it was shown that, for semigroups, the notions of automatic and biautomatic do not coincide (this is still an open problem for groups).

References:

- D. B. A. Epstein, J. W. Cannon, D. F. Holt, S. Levy, M. S. Paterson and W. Thurston, Word Processing in Groups (Jones and Barlett, 1992).
- C. M. Campbell, E. F. Robertson, N. Ruskuc and R. M. Thomas, Automatic semigroups, Theoretical Computer Science 250 (2001), 365-391.
- M. Hoffmann and R. M. Thomas, Biautomatic semigroups, in: M. Liskiewicz and R. Reischuk (editors), 15th International Symposium on Fundamentals of Computation Theory (FCT) 2005 (Lecture Notes in Computer Science 3623, Springer-Verlag, 2005), 56-67.

**2. Hyperbolic groups and monoids:** The notion of hyperbolic groups has played a fundamental role in computational group theory. There were several equivalent definitions of the notion of hyperbolicity in groups (see [1] for example) but none of these generalized to monoids. This changed with Gilman's elegant characterization of hyperbolic groups; Duncan and Gilman then suggested that this formulation could be taken as the definition of a hyperbolic monoid.

Their definition is entirely natural but we do not have efficient algorithms for dealing with hyperbolic monoids. It is known that the word problem for hyperbolic groups can be solved in linear time but the best known algorithm for the word problem in a hyperbolic monoid is exponential [2]. Other questions are still open as far as hyperbolic monoids are concerned, even as regards decidability. However, research at Leicester has shown how the definition can be changed in such a way that we still have equivalence in the class of groups but we do have efficient algorithms for hyperbolic monoids; see [3].

References:

- J. M Alonso, T. Brady, D. Cooper, V. Ferlini, M. Lustig, M. Michalik, M. Shapiro and H. Short, Notes on word hyperbolic groups, in: E. Ghys, A. Haefliger and A. Verjovsky (editors), Group Theory from a Geometric Viewpoint (World Scientific Publishing, 1991), 3-63.
- M. Hoffmann, D. Kuske, F. Otto and R. M. Thomas, Some relatives of automatic and hyperbolic groups, in: G. M. S. Gomes, J-E. Pin and P. V. Silva (editors), Semigroups, Algorithms, Automata and Languages (World Scientific, 2002), 379-406.
- M. Hoffmann and R. M. Thomas, Notions of hyperbolicity in monoids, in: E. Csuhaj-Varju and Z. Esik (editors), 16th International Symposium on Fundamentals of Computation Theory (FCT) 2007 (Lecture Notes in Computer Science 4639, Springer-Verlag, 2007), 341-352.

**4. Word problems of groups and semigroups.** The word problem for a finitely generated group is one of the must fundamental notions in group theory. There are many intriguing connections with theoretical computer science, and one important such area is that of investigating what implications the structure of the word problem as a formal language has on the algebraic structure of the corresponding group; see [2] for example. We can also ask which groups have word problem solvable within some resource bound, which leads to intriguing connections with computational complexity; see [1] for example.

As well as groups, it is possible to consider these issues in the wider context of finitely generated semigroups; several questions that have been completely resolved in the group case remain open in this wider setting. Notwithstanding this, some advances have been made, particularly in the case of the one-counter languages (see [3]).

References:

- S. R. Lakin and R. M. Thomas, Context-sensitive decision problems in groups, in: C. S. Calude, E. Calude and M. J. Dinneen (editors), Developments in Language Theory: 8th International Conference, DLT 2004, (Lecture Notes in Computer Science 3340, Springer-Verlag, 2004), 296-307.
- D. F. Holt, C. E. Roever, S. E. Rees and R. M. Thomas, Groups with a context-free co-word problem, Journal of the London Mathematical Society 71 (2005), 643-657.
- D. F. Holt, M. D. Owens and R. M. Thomas, Groups and semigroups with a one-counter word problem, Journal of the Australian Mathematical Society 85 (2008), 197-209.

**4. FA-presentable structures.** When one considers the notion of a computable structure. one would normally use some general model of computation such as a Turing machine. However, there have been various ideas put forward to restrict the model of computation used; whilst the range of possible structures decreases, the computation can become more efficient and certain properties of the structures may become decidable.

One interesting approach was introduced by Khoussainov and Nerode [1] who considered structures whose domain and relations can be checked by finite automata (as opposed to Turing machines); such a structure is said to be FA-presentable. One area of intense interest here has been the classification of which structures are FA-presentable, and there are not many instances where this has been achieved. Amongst the few examples of this to date has been the classification of FA-presentable finitely generated groups in [2] and the generalization to FA-presentable finitely generated cancellative semigroups in [3].

References:

- B. Khoussainov and A. Nerode, Automatic presentations of structures, in: D. Leivant (editor), Logic and Computational Complexity (Lecture Notes in Comp. Science 960, Springer-Verlag, 1995), 367-392.
- G. P. Oliver and R. M. Thomas, Automatic presentations for finitely generated groups, in: V. Diekert and B. Durand (editors), 22nd Annual Symposium on Theoretical Aspects of Computer Science, STACS 05 (Lecture Notes in Computer Science 3404, Springer-Verlag, 2005), 693-704.
- A. J. Cain, G. P. Oliver, N. Ruskuc and R. M. Thomas, Automatic presentations for semigroups, Information and Computation 207 (2009), 1156-1168.

Rick Thomas's homepage and contact details

### Emilio Tuosto

**Abstractions for mobile distributed systems:** The widespread use of distributed and concurrent systems urges to use formal models to rigorously describe and analyse computations where remotely executing processes interact through amongcommunication infrastructures that can dynamically change. I'm interested in studying models for several aspects of modern distributed computations from theoretical as well as applied points of view. My favourite models are based on **process algebras**, **automata**, and **hypergraphs**.

**1. Process Algebras:** Process algebras are particularly suitable for formally describing and reasoning on distributed communicating processes that operates in a dynamically changing environment. Indeed, many different theories (e.g., behavioural type systems) have been shown adequate to tackle modelling and verification problems rising in distributed computing (e.g., distributed coordination, behavioural types, model- or type-checking).

I have been working on calculi featuring event-notification primitives and, more recently, I am interested in applying process algebras to study **choreography-based, contracts, and behavioural types** of distributed applications.

Required skills: Good mathematical and logical skills. If interested in implementation aspects, good programming skills (either functional or object oriented programming).

References:

- Robin Milner, Joachim Parrow, David Walker. A Calculus of Mobile Processes Part I and Part II. Information and Computation 100:1-77 (1992).
- Emilio Tuosto: Contract-Oriented Services. WS-FM 2012: 16-29
- Laura Bocchi, Hernán C. Melgratti, Emilio Tuosto: Resolving Non-determinism in Choreographies. ESOP 2014: 493-512
- Massimo Bartoletti, Alceste Scalas, Emilio Tuosto, Roberto Zunino: Honesty by Typing. FMOODS/FORTE 2013: 305-320
- Julien Lange, Emilio Tuosto: Synthesising Choreographies from Local Session Types. CONCUR 2012: 225-239
- Roberto Bruni, Ivan Lanese, Hernan Melgratti, Emilio Tuosto. Multiparty sessions in SOC. In D. Lea and G. Zavattaro editors. Coordination Models and Languages, volume 5052 of Lecture Notes in Computer Science, Springer-Verlag, May 2008.
- GianLuigi Ferrari, Daniele Strollo, Roberto Guanciale, and Emilio Tuosto. Coordination Via Types in an Event-Based Framework. In J. Derrick and J. Vain, editors, 27th IFIP WG 6.1 International Conference on Formal Methods for Networked and Distributed Systems Special focus on service oriented computing and architectures, volume 4574 of Lecture Notes in Computer Science. Springer-Verlag, June 2007.

**2. Automata for mobile and distributed applications:** Automata are foundamental tools in theoretical computer science as well as in more applied aspects of our discipline. In my research I use automata models to study choregraphy-based design theories as well as nominal automata models. The latter enrich traditional automata to capture computational phenomena arising in mobile applications. Not only automata can be used to check behavioural equivalences of systems, but they are also suitable to design and verify e.g., service compatibility.

Required skills: Good mathematical and logical skills. Notions of automata theory.

References:

- Robin Milner, Joachim Parrow, David Walker. A Calculus of Mobile Processes - Part I and Part II. Information and Computation 100:1-77 (1992).
- Alexander Kurz, Tomoyuki Suzuki, Emilio Tuosto: On Nominal Regular Languages with Binders. FoSSaCS 2012: 255-269
- Alexander Kurz, Tomoyuki Suzuki, Emilio Tuosto: A Characterisation of Languages on Infinite Alphabets with Nominal Regular Expressions. IFIP TCS 2012: 193-208
- J. Lange, E. Tuosto, N. Yoshida: From Communicating Machines to Graphical Choreographies. POPL 2015
- G. Ferrari, U. Montanari, E. Tuosto. Coalgebraic Minimisation of HD-Automata for the pi-calculus Using Polymorphic Types. TCS 331:325-365 (2005).
- V. Ciancia, G. Ferrari, M. Pistore, E. Tuosto. History Dependent Automata for Service Compatibility. In P. Degano, R. De Nicola, and J. Meseguer editors. Concurrency, Graphs and Models, volume 5065 of Lecture Notes in Computer Science, Springer-Verlag, July 2008.

**3. Graphs model:** Graph and graph transformation can be envisaged as a generalisation of terms and term rewriting. In last years, graphs have been widely used to model concurrent and distributed systems and nowadays many tools rely on such theories. I'm interested in the use of a recent techniques for rewriting hypergraphs, a class of graphs where edges can connect more than 2 nodes. An interesting line of research is the implementation of the rewriting techniques defined for hypergraphs.

Required skills: Good programming skills (preferably functional programming). Some mathematical and logical skills.

References

- Kyriakos Poyias, Emilio Tuosto: On Recovering from Run-time Misbehaviour in ADR. ICE 2013: 68-84
- Kyriakos Poyias, Emilio Tuosto: Enforcing Architectural Styles in Presence of Unexpected Distributed Reconfigurations. ICE 2012: 67-82
- Gianluigi Ferrari and Dan Hirsch and Ivan Lanese and Ugo Montanari and Emilio Tuosto. Synchronised Hyperedge Replacement as a Model for Service Oriented Computing. F. de Boer, M. M. Bonsangue, S. Graf and W. P. de Roever editors. Formal Methods for Components and Objects: 4th International Symposium, FMCO, volume 4111 of Lecture Notes in Computer Science, Springer-Verlag, November 2005.

Emilio Tuosto's homepage and contact details

### Irek Ulidowski

**Modelling of reversibility of complex systems:** Most simple bio-chemical reactions are in principle reversible, so one might think that such reactions typically fluctuate, going forwards and backwards between a number of states. However, nature is synonymous with change or progress. In nature, reversibility is used in a number of important mechanisms such as, for example, signal passing, recycling of molecular resources and stimulating reactions by catalisys. This project aims at studying the role that reversibility plays in major families of bio-chemical reactions and, more generally, in the behaviour of complex systems, including artificial systems such as distributed or quantum systems, and robotics, with the view to model them formally at an appropriate level of abstraction. It is fairly well known how to model the so-called *causal reversibility*: we can use process calculi, event structures or modal logics (see publications below). There are, however, more general forms of reversibility, the so-called *out-of-causal-order reversibility*, that are very crutial in nature as well as in artificial system. But we do not know how to model them appropriately yet. We are currently experimenting with a variety of formal models, including process calculi where processes can change their structure.

- Iain Phillips, Irek Ulidowski and Shoji Yuen, A reversible process calculus and the modelling of the ERK signalling pathway. Proceedings, Reversible Computation 2012, LNCS, volume 7581, pp. 218-232.
- Iain Phillips and Irek Ulidowski, Reversibility and asymmetric conflict in event structures. Proceedings of CONCUR 2013, LNCS, volume 8052, pp. 303-318. Springer, 2013. doi:10.1007/978-3-642-40184-8_22
- Iain Phillips and Irek Ulidowski, An Event Identifier Logic.
*Mathematical Structures in Computer Science*. doi:10.1017/S0960129513000510 - Irek Ulidowski, Iain Phillips and Shoji Yuen, Concurrency and reversibility. Proceedings, Reversible Computation 2014, LNCS 8507, pp. 1-14. Springer 2014.

**Reversible asynchronous cellular automata:** The aim of this project is to study what reversibility means for different forms of cellular automata, concentrating on asynchronous cellular automata. Examples of problems that can be studied are the universality questions for families of asynchronous cellular automata, both forwards-only and reversible, where we vary how cells are updated, global versus local reversibility, and the time as well as the direction reversibility of asynchronous cellular automata. An integral part of the project will be a development of a software tool for simulation of reversible asynchronous cellular automata.

- Daniel Morrison and Irek Ulidowski, Direction-reversible self-timed cellular automata for Delay-Insensitive circuits. Proceedings, ACRI 2014, LNCS, pp. 367-377. Springer 2014.
- Daniel Morrison and Irek Ulidowski, Arbitration and reversibility of parallel Delay-Insensitive modules. Proceedings, Reversible Computation 2014, LNCS 8507, pp. 67-81. Springer 2014.

**Reversible debugging of multi-threaded programs:** In the last decade a number of commercial debugging software tools came out that claim to make a full use of reversibility when debugging programs. The most famous example is UndoBD by Undo Software Ltd based in Cambridge, UK. When it comes to dealing with multi-threaded programs or programs that share memory, these state-of-the-art debuggers use a technique called serialisation, which records a run of a multi-threaded program as a "sequence" of instructions. This removes the information about causality, concurrency and mutual exclusion between instructions in different threads. So, in some sense, reversing a run of a multi-threaded program is reduced to backtracking in a serialised run. In general, it is not clear what is the logically and operationally correct form of reversibility here. The situation is complicated further by various optimisation tasks performed during execution, which effectively change the program order. This project aims to explore reversibility of multi-threaded programs by taking the formal methods approach. We aim to develop a robust method for reversing simple multi-threaded programs using a process calculus or event structure-like models. The understanding of reversibility of parallel programs for the purpose of debugging must be reached in the setting of the so-called weak memory models, such as, for example the weak memory model of Java. Also, another issue that reversible debugger developers are concerned with, and which we aim to solve in this project, is multi-threaded recording.