View all GSoC/JSoC Projects

This page is designed to improve discoverability of projects. You can, for example, search this page for specific keywords and find all of the relevant projects.

MLJ.jl Projects – Summer of Code

MLJ is a machine learning framework for Julia aiming to provide a convenient way to use and combine a multitude of tools and models available in the Julia ML/Stats ecosystem.

List of projects

MLJ is released under the MIT license and sponsored by the Alan Turing Institute.

View all GSoC/JSoC Projects
Projects
1. List of projects
Categorical variable encoding
1. Description
2. Prerequisites
3. Your contribution
4. References
Machine Learning in Predictive Survival Analysis
1. Description
2. Prerequisites
3. Your contribution
4. References
Deeper Bayesian Integration
1. Description
2. Your contributions
3. References
4. Difficulty: Medium to Hard
Tracking and sharing MLJ workflows using MLflow
1. Description
2. Prerequisites
3. Your contribution
4. References
Speed demons only need apply
1. Description
2. Prerequisites
3. Your contribution
4. References
Improving test coverage (175 hours)
Multi-threading Improvement Projects (175 hours each)
Automation of testing / performance benchmarking (350 hours)
Documenter.jl
Fluid-Structure Interaction Example
Investigation of Performant Assembly Strategies
1. Training on very large graphs
2. Adding graph convolutional layers
3. Adding models and examples
4. Adding graph datasets
5. Implement layers for heterogeneous graphs
6. Improving performance using sparse linear algebra
7. Support for AMGDPU and Apple Silicon
8. Implement layers for Temporal Graphs
Recommended skills
Mentors
QML and Makie integration
1. Expected results
Web apps in Makie and JSServe
1. Expected results
Scheduling Algorithms for Dagger
Distributed Training
Distributed Arrays over Dagger
Benchmarking against other frameworks
Where to go for discussion and to find mentors
C++
1. CxxWrap STL
  1. Expected outcome
Rust
1. General goal of JuliaConstraints
Constraint Programming-Based Design for Kumi Kumi Slope
1. Core Objectives
Agents.jl
DynamicalSystems.jl
Large Language Model Projects
1. Project 1: Enhancing llama2.jl with GPU Support
2. Project 2: Llama.jl - Low-level C interface
3. Project 3: Supercharging the Knowledge Base of AIHelpMe.jl
4. Project 4: Enhancing Julia's AI Ecosystem with ColBERT v2 for Efficient Document Retrieval
5. Project 5: Enhancing PromptingTools.jl with Advanced Schema Support and Functionality
6. Project 6: Expanding the Julia Large Language Model Leaderboard
7. Project 7: Counterfactuals for LLMs (Model Explainability and Generative AI)
How to Contact Us
Observational Health Subecosystem Projects
1. Project 1: Developing Tooling for Observational Health Research in Julia
2. Project 2: Developing Patient Level Prediction Tooling within Julia
Medical Imaging Subecosystem Projects
1. Project 3: Adding functionalities to medical imaging visualizations
2. Project 4: Adding dataset-wide functions and integrations of augmentations
3. Project 5: Highly-efficient MRI Simulations with Multi-Vendor GPU Support
MIDIfication of music from wave files
Efficient symbolic-numeric set computations
Reachability with sparse polynomial zonotopes
Improving the hybrid systems reachability API
Panel data analysis
1. Description
2. Prerequisites
3. Your contribution
4. References
Distributions.jl Expansion
1. Prerequisites
2. Your contribution
HypothesisTesting.jl Expansion
1. Prerequisites
2. Your contribution
3. References
CRRao.jl
1. Description
2. Prerequisites
3. Your contribution
JuliaStats Improvements
1. Description
2. Prerequisites
3. Your contribution
Survey.jl
1. Prerequisites
2. Your contribution
3. References
Smoothing non-linear continuous time systems
1. Reinforcement Learning Environments
  1. Expected outcome
2. AlphaZero.jl
  1. Expected Outcomes
Numerical Linear Algebra
1. Matrix functions
Better Bignums Integration
1. Special functions
2. A Julia-native CCSA optimization algorithm
Massive parallel factorized bouncy particle sampler
Machine Learning Time Series Regression
Machine learning for nowcasting and forecasting
Time series forecasting at scales
GPU accelerated simulator of Clifford Circuits.
A Zoo of Quantum Error Correcting codes and/or decoders
Left/Right multiplications with small gates.
Generation of Fault Tolerant ECC Circuits, Flag Qubit Circuits and more
Measurement-Based Quantum Computing (MBQC) compiler
Implementing a Graph State Simulator
Simulation of Slightly Non-Clifford Circuits and States
Magic State Modeling - Distillation, Injection, Etc
GPU accelerated operators and ODE solvers
Autodifferentiation
Closer Integration with the SciML Ecosystem
Efficient Tensor Differentiation
Symbolic root finding
Symbolic Integration in Symbolics.jl
XLA-style optimization from symbolic tracing
Automatically improving floating point accuracy (Herbie)
Parquet.jl enhancements
DataFrames.jl join enhancements
Project 1: Conformal Prediction meets Bayes (Predictive Uncertainty)
Project 2: Counterfactual Regression (Model Explainability)
Project 3: Counterfactuals for LLMs (Model Explainability and Generative AI)
Project 4: From Counterfactuals to Interventions (Recourse through Minimal Causal Interventions)
About Us
How to Contact Us
Testing and benchmarking of TopOpt.jl
Machine learning in topology optimization
Optimization on a uniform rectilinear grid
Adaptive mesh refinement for topology optimization
Heat transfer design optimization
Compiler-based automatic differentiation with Enzyme.jl
Advanced visualization and in-situ visualization with ParaView
Implementing models from PosteriorDB in Turing / Julia
Improving the integration between Turing and Turing’s MCMC inference packages
GPU support for NormalizingFlows.jl and Bijectors.jl
Batched support for NormalizingFlows.jl and Bijectors.jl
Targets for Benchmarking Samplers with vectorization, GPU and high-order derivative supports
VS Code extension
Package installation UI
Code generation improvements and async ABI
Wasm threading
High performance, Low-level integration of js objects
DOM Integration
Porting existing web-integration packages to the wasm platform
Native dependencies for the web
Distributed computing with untrusted parties
Deployment

Categorical variable encoding

Extend the categorical variable encoding of MLJ.

Difficulty. Moderate. Duration. 350 hours

MLJ provides basic one-hot encoding of categorical variables but no sophisticated encoding techniques. One-hot encoding is rather limited, in particular when a categorical has a very large number of classes. Many other techniques exists, and this project aims to make some of these available to the MLJ user.

Mentors. Anthony Blaom (best contact: direct message on Julia slack)

Prerequisites

Julia language fluency is essential.
Git-workflow familiarity is strongly preferred.
Experience with machine learning and data science workflows.
Familiarity with MLJ's API a plus.

Your contribution

In this project you will survey popular existing methods for one-hot encoding categorical variables. In collaboration with the mentor, you will make a plan for integrating some of these techniques into MLJ. You will begin work on the plan, initially focusing on simple methods, providing MLJ interfaces to existing julia packages, or new implementations where needed. If the project advances well, you will implement more advanced techniques, such as entity embedding via MLJFlux.jl (MLJ's neural network interface).

References

Existing encoding in MLJ: OneHotEncoder; ContinuousEncoder; UnivariateContinuousTimeEncoder
StatsModels.jl encoders
MLJ feature request
Guo and Berkhahn [(2016]](https://arxiv.org/abs/1604.06737) "Entity Embeddings of Categorical Variables"
MLJFlux.jl

Machine Learning in Predictive Survival Analysis

Implement survival analysis models for use in the MLJ machine learning platform.

Difficulty. Moderate - hard. Duration. 350 hours

Description

Survival/time-to-event analysis is an important field of Statistics concerned with understanding the distribution of events over time. Survival analysis presents a unique challenge as we are also interested in events that do not take place, which we refer to as 'censoring'. Survival analysis methods are important in many real-world settings, such as health care (disease prognosis), finance and economics (risk of default), commercial ventures (customer churn), engineering (component lifetime), and many more. This project aims to implement models for performing survivor analysis with the MLJ machine learning framework.

mlr3proba is currently the most complete survival analysis interface, let's get SurvivalAnalysisA.jl to the same standard - but learning from mistakes along the way.

Mentors. Sebastian Vollmer, Anthony Blaom,

Prerequisites

Julia language fluency is essential.
Git-workflow familiarity is strongly preferred.
Some experience with survival analysis.
Familiarity with MLJ's API a plus.
A passing familiarity with machine learning goals and workflow is

preferred.

Your contribution

You will work towards creating a survival analysis package with a range of metrics, capable of making distribution predictions for classical and ML models. You will bake in competing risks in early, as well as prediction transformations, and include both left and interval censoring. You will code up basic models (Cox PH and AFT), as well as one ML model as a proof of concept (probably decision tree is simplest or Coxnet).

Specifically, you will:

Familiarize yourself with the training and evaluation machine

learning models in MLJ.

For SurvivalAnalysis.jl, implement the MLJ model interface.
Consider Explainability of SurvivalAnalysis through SurvSHAP(t)
Develop a proof of concept for newer advanced survival analysis

models not currently implemented in Julia.

References

Mateusz Krzyziński et al., SurvSHAP(t): Time-Dependent Explanations of Machine Learning Survival Models, Knowledge-Based Systems 262 (February 2023): 110234
Kvamme, H., Borgan, Ø., & Scheel, I. (2019). Time-to-event prediction with neural networks and Cox regression. Journal of Machine Learning Research, 20(129), 1–30.
Lee, C., Zame, W. R., Yoon, J., & van der Schaar, M. (2018). Deephit: A deep learning approach to survival analysis with competing risks. In Thirty-Second AAAI Conference on Artificial Intelligence.
Katzman, J. L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., & Kluger, Y. (2018). DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(1), 24.
Gensheimer, M. F., & Narasimhan, B. (2019). A scalable discrete-time survival model for neural networks.](https://peerj.com/articles/6257/) PeerJ, 7, e6257.
SurvivalAnalysis.jl

Deeper Bayesian Integration

Bayesian methods and probabilistic supervised learning provide uncertainty quantification. This project aims increasing integration to combine Bayesian and non-Bayesian methods using Turing.

Difficulty. Difficult. Duration. 350 hours.

Description

As an initial step reproduce SOSSMLJ in Turing. The bulk of the project is to implement methods that combine multiple predictive distributions.

Your contributions

Interface between Turing and MLJ
Comparisons of ensembling, stacking of predictive distribution
reproducible benchmarks across various settings.

References

Bayesian Stacking SKpro

Difficulty: Medium to Hard

Mentors: Hong Ge Sebastian Vollmer

Tracking and sharing MLJ workflows using MLflow

Help data scientists using MLJ track and share their machine learning experiments using MLflow. The emphasis iin this phase of the project is:

support asynchronous workflows, as appear in parallelized model tuning
support live logging while training iterative models, such as neural networks

Difficulty. Moderate. Duration. 350 hours.

Description

MLflow is an open source platform for the machine learning life cycle. It allows the data scientist to upload experiment metadata and outputs to the platform for reproducing and sharing purposes. MLJ already allows users to report basic model performance evaluation to an MLflow service and this project seeks to greatly enhance this integration.

Prerequisites

Julia language fluency essential
Understanding of asynchronous programming principles
Git-workflow familiarity strongly preferred.
General familiarity with data science workflows

Your contribution

You will familiarize yourself with MLJ, MLflow and MLflowClient.jl client APIs.
You will familiarize yourself with the MLJFlow.jl package providing MLJ <–> MLflow integration
Implement changes needed to allow correct asynchronous logging of model performance evaluations
Extend logging to (parallelized) model tuning (MLJ's TunedModel wrapper)
Extend logging to controlled training of iterative models (MLJ's IteratedModel wrapper)

References

Mentors. Anthony Blaom

Speed demons only need apply

Diagnose and exploit opportunities for speeding up common MLJ workflows.

Difficulty. Moderate. Duration. 350 hours.

Description

In addition to investigating a number of known performance bottlenecks, you will have some free reign in this to identify opportunities to speed up common MLJ workflows, as well as making better use of memory resources.

Prerequisites

Julia language fluency essential.
Experience with multi-threading and multi-processor computing essential, preferably in Julia.
Git-workflow familiarity strongly preferred.
Familiarity with machine learning goals and workflow preferred

Your contribution

In this project you will:

familiarize yourself with the training, evaluation and tuning of machine learning models in MLJ
benchmark and profile common workflows to identify opportunities for further code optimizations, with a focus on the most popular models
work to address problems identified
roll out new data front-end for iterative models, to avoid unnecessary copying of data
experiment with adding multi-processor parallelism to the current learning networks scheduler
implement some of these optimizations

References

MLJ Roadmap. See, in particular "Scalability" section.
Taking performance more seriously GitHub issue
Data front end for MLJ models.

Mentors. Anthony Blaom, Okon Samuel.

BayesianOptimization

Bayesian optimization is a global optimization strategy for (potentially noisy) functions with unknown derivatives. With well-chosen priors, it can find optima with fewer function evaluations than alternatives, making it well suited for the optimization of costly objective functions.

Well known examples include hyper-parameter tuning of machine learning models (see e.g. Taking the Human Out of the Loop: A Review of Bayesian Optimization). The Julia package BayesianOptimization.jl currently supports only basic Bayesian optimization methods. There are multiple directions to improve the package, including (but not limited to)

Hybrid Bayesian Optimization (duration: 175h, expected difficulty: medium) with discrete and continuous variables. Implement e.g. HyBO see also here.
Scalable Bayesian Optimization (duration: 175h, expected difficulty: medium): implement e.g. TuRBO or SCBO.
Better Defaults (duration: 175h, expected difficulty: easy): write an extensive test suite and implement better defaults; draw inspiration from e.g. dragonfly.

Recommended Skills: Familiarity with Bayesian inference, non-linear optimization, writing Julia code and reading Python code.

Expected Outcome: Well-tested and well-documented new features.

Mentor: Johanni Brea

Compiler Projects – Summer of Code

There are a number of compiler projects that are currently being worked on. Please contact Jameson Nash for additional details and let us know what specifically interests you about this area of contribution. That way, we can tailor your project to better suit your interests and skillset.

LLVM AliasAnalysis (175-350 hours) The Julia language utilizes LLVM as a backend for code generation, so the quality of code generation is very important for performance. This means that there are plenty of opportunities for those with knowledge of or interest in LLVM to contribute via working on Julia's code generation process. We have recently encountered issues with memcpy information only accepting a single aliasing metadata argument, rather than separate information for the source and destination. There are other similar missing descriptive or optimization steps in the aliasing information we produce or consume by LLVM's passes.

Expected Outcomes: Improve upon the alias information "LLVM level" of Julia codegen.
Skills: C/C++ programming
Difficulty: Hard
Macro hygiene re-implementation, to eliminate incorrect predictions inherent in current approach (350 hours)

This may be a good project for someone that wants to learn lisp/scheme! Our current algorithm runs in multiple passes, which means sometimes we compute the wrong scope for a variable in the earlier pass than when we assign the actual scope to each value. See https://github.com/JuliaLang/julia/labels/macros, and particularly issues such as https://github.com/JuliaLang/julia/issues/20241, https://github.com/JuliaLang/julia/issues/53667, https://github.com/JuliaLang/julia/issues/53673 and https://github.com/JuliaLang/julia/issues/34164.

Expected Outcomes: Ideally, re-implementation of hygienic macros. Realistically, resolving some or any of the macros issues.
Skills: Lisp/Scheme/Racket experience desired but not necessarily required.
Difficulty: Medium
Better debug information output for variables (175 hours)

We have part of the infrastructure in place for representing DWARF information for our variables, but only from limited places. We could do much better since there are numerous opportunities for improvement!

Expected Outcomes: Ability to see more variable, argument, and object details in gdb Recommended Skills: Most of these projects involve algorithms work, requiring a willingness and interest in seeing how to integrate with a large system.
Difficulty: Medium
Mentors: Jameson Nash, Gabriel Baraldi

Improving test coverage (175 hours)

Code coverage reports very good coverage of all of the Julia Stdlib packages, but it's not complete. Additionally, the coverage tools themselves (–track-coverage and https://github.com/JuliaCI/Coverage.jl) could be further enhanced, such as to give better accuracy of statement coverage, or more precision. A successful project may combine a bit of both building code and finding faults in others' code.

Another related side-project might be to explore adding Type information to the coverage reports?

Recommended Skills: An eye for detail, a thrill for filing code issues, and the skill of breaking things.
Contact: Jameson Nash

Multi-threading Improvement Projects (175 hours each)

Continuous on-going work is being done to improve the correctness and threaded code. A few ideas to get you started on how to join this effort, in brief, include:

Measure and optimize the performance of the scheduler partr algorithm, and add the ability to dynamically scale it by workload size. Or replace it with a workstealing implementation in Julia.
Automatic insertion, and subsequent optimization, of GC safe-points/regions, particularly around loops. Similarly for ccall, implement the ability to define a particular ccall as being a safe-region.
Solve various thread-safety and data-race bugs in the runtime. (e.g. https://github.com/JuliaLang/julia/issues/49778 and https://github.com/JuliaLang/julia/pull/42810)

Join the regularly scheduled multithreading call for discussion of any of these at #multithreading BoF calendar invite on the Julia Language Public Events calendar.

Recommended Skills: Varies by project, but generally some multi-threading and C experience is needed
Contact: Jameson Nash

Automation of testing / performance benchmarking (350 hours)

The Nanosoldier.jl project (and related https://github.com/JuliaCI/BaseBenchmarks.jl) tests for performance impacts of some changes. However, there remains many areas that are not covered (such as compile time) while other areas are over-covered (greatly increasing the duration of the test for no benefit) and some tests may not be configured appropriately for statistical power. Furthermore, the current reports are very primitive and can only do a basic pair-wise comparison, while graphs and other interactive tooling would be more valuable. Thus, there would be many great projects for a summer contributor to tackle here!

Expected Outcomes: Improvement of Julia's automated testing/benchmarking framework. Skills: Interest in and/or experience with CI systems. Difficulty: Medium

Contact: Jameson Nash, Tim Besard

Tensor network contraction order optimization and visualization

OMEinsum.jl is a pure Julia package for tensor network computation, which has been used in various projects, including

GenericTensorNetworks.jl for solving combinatorial optimization problems,
YaoToEinsum.jl for simulating large scale quantum circuit and
TensorInference.jl for Bayesian inference.

Unlike other tensor contraction packages such as ITensors.jl and TensorOperations.jl, it is designed for large scale tensor networks with arbitrary topology. The key feature of OMEinsum.jl is that it can automatically optimize the contraction order of a tensor network. Related features are implemented in OMEinsumContractionOrders.jl.

We are looking for a student to work on the following tasks:

Implement a better contraction order optimizer based on Tamaki's algorithm.
Implement a hyper-graph visualization tool based on arXiv:2308.05043
Port the contraction order optimizers to TensorOperations.jl

Recommended skills: familiarity with tensor networks, graph theory and high performance computing.

Expected results:

new features added to the package OMEinsumContractionOrders.jl along with tests and relevant documentation.
a new package about hyper-graph visualization, and relevant feature added to OMEinsum.jl.
a pull request to TensorOperations.jl for better contraction order optimization.

Mentors: Jin-Guo Liu, Jutho Haegeman and Lukas Devos

Project difficulty: Medium to Hard

Project length: 350 hrs

Contact: feel free to ask questions via email or the Julia slack (user name: JinGuo Liu).

Documentation tooling

Documenter.jl

The Julia manual and the documentation for a large chunk of the ecosystem is generated using Documenter.jl – essentially a static site generator that integrates with Julia and its docsystem. There are tons of opportunities for improvements for anyone interested in working on the interface of Julia, documentation and various front-end technologies (web, LaTeX).

Ferrite.jl - Finite Element Toolbox - Summer of Code

Ferrite.jl is a Julia package providing the basic building blocks to develop finite element simulations of partial differential equations. The package provides extensive examples to start from and is designed as a compromise between simplicity and generality, trying to map finite element concepts 1:1 with the code in a low-level . Ferrite is actively used in teaching finite element to students at several universities across different countries (e.g. Ruhr-University Bochum and Chalmers University of Technology). Further infrastructure is provided in the form of different mesh parsers and a Julia based visualizer called FerriteViz.jl.

Below we provide a four of potential project ideas in Ferrite.jl. However, interested students should feel free to explore ideas they are interested in. Please contact any of the mentors listed below, or join the #ferrite-fem channel on the Julia slack to discuss. Projects in finite element visualization are also possible with FerriteViz.jl.

Fluid-Structure Interaction Example

Difficulty: Easy-Medium (depending on your specific background)

Project size: 150-300 hours

Problem: Ferrite.jl is designed with the possibility to define partial differential equations on subdomains. This makes it well-suited for interface-coupled multi-physics problems, as for example fluid-structure interaction problems. However, we currently do not have an example showing this capability in our documentation. We also do not provide all necessary utilities for interface-coupled problems.

Minimum goal: The minimal goal of this project is to create a functional and documented linear fluid-structure interaction example coupling linear elasticity with a stokes flow in a simple setup. The code should come with proper test coverage.

Extended goal: With this minimally functional example it is possible to extend the project into different directions, e.g. optimized solvers or nonlinear fluid-structure interaction.

Recommended skills:

Basic knowledge the finite element method
Basic knowledge about solids or fluids
The ability (or eagerness to learn) to write fast code

Mentors: Dennis Ogiermann and Fredrik Ekre

Investigation of Performant Assembly Strategies

Difficulty: Medium

Project size: 250-350 hours

Problem: Ferrite.jl has an outstanding performance in single-threaded finite element simulations due to elaborate elimination of redundant workloads. However, we recently identified that the way the single-threaded assembly works makes parallel assembly memory bound, rendering the implementation for "cheap" assembly loops not scalable on a wide range of systems. This problem will also translate to high-order schemes, where the single-threaded strategy as is prevents certain common optimization strategies (e.g. sum factorization).

Minimum goal: As a first step towards better parallel assembly performance it is the investion of different assembly strategies. Local and global matrix-free schemes are a possibility to explore here. The code has to be properly benchmarked and tested to identify different performance problems.

Extended goal: With this minimally functional example it is possible to extend the project into different directions, e.g. optimized matrix-free solvers or GPU assembly.

Recommended skills:

Basic knowledge the finite element method
Basic knowledge about benchmarking
The ability (or eagerness to learn) to write fast code

Mentors: Maximilian Köhler and Dennis Ogiermann

Graph Neural Networks - Summer of Code

Graph Neural Networks (GNN) are deep learning models well adapted to data that takes the form of graphs with feature vectors associated to nodes and edges. GNNs are a growing area of research and find many applications in complex networks analysis, relational reasoning, combinatorial optimization, molecule generation, and many other fields.

GraphNeuralNetworks.jl is a pure Julia package for GNNs equipped with many features. It implements common graph convolutional layers, with CUDA support and graph batching for fast parallel operations. There are a number of ways by which the package could be improved.

Training on very large graphs

Graph containing several millions of nodes are too large for gpu memory. Mini-batch training is performed on subgraphs, as in the GraphSAGE algorithm.

Duration: 350h.

Expected difficulty: hard.

Expected outcome: The necessary algorithmic components to scale GNN training to very large graphs.

Adding graph convolutional layers

While we implement a good variety of graph convolutional layers, there is still a vast zoology to be implemented yet. Preprocessing tools, pooling operators, and other GNN-related functionalities can be considered as well.

Duration: 175h.

Expected difficulty: easy to medium.

Expected outcome: Enrich the package with a variety of new layers and operators.

Adding models and examples

As part of the documentation and for bootstrapping new projects, we want to add fully worked out examples and applications of graph neural networks. We can start with entry-level tutorials and progressively introduce the reader to more advanced features.

Duration: 175h.

Expected difficulty: medium.

Expected outcome: A few pedagogical and more advanced examples of graph neural networks applications.

Adding graph datasets

Provide Julia friendly wrappers for common graph datasets in MLDatasets.jl. Create convenient interfaces for the Julia ML and data ecosystem.

Duration: 175h.

Expected difficulty: easy.

Expected outcome: A large collection of graph datasets easily available to the Julia ecosystem.

Implement layers for heterogeneous graphs

In some complex networks, the relations expressed by edges can be of different types. We currently support this with the GNNHeteroGraph type but none of the current graph convolutional layers support heterogeneous graphs as inputs. With this project we will implement a few layers for heterographs.

Duration: 175h.

Expected difficulty: medium.

Expected outcome: The implementation of a new graph type for heterogeneous networks and corresponding graph convolutional layers.

Improving performance using sparse linear algebra

Many graph convolutional layers can be expressed as non-materializing algebraic operations involving the adjacency matrix instead of the slower and more memory consuming gather/scatter mechanism. We aim at extending as far as possible and in a gpu-friendly way these fused implementation.

Duration: 350h.

Expected difficulty: hard.

Expected outcome: A noticeable performance increase for many graph convolutional operations.

Support for AMGDPU and Apple Silicon

We currently support scatter/gather operation only on CPU and CUDA hardware. We aim at extending this to AMDGPU and Apple Silicon leveraging KernelAbstractions.jl, AMDGPU.jl and Metal.jl.

Duration: 175h.

Expected difficulty: medium.

Expected outcome: Graph convolution speedup for AMD GPU and Apple hardware, performance roughly on par with CUDA.

Implement layers for Temporal Graphs

A temporal graph is a graph whose topology changes over time. We currently support this with the TemporalSnapshotsGNNGraph type, but none of the current graph convolution and pooling layers support temporal graphs as input. Currently, there are a few convolutional layers that take as input the special case of static graphs with temporal features. In this project, we will implement new layers that take temporal graphs as input, and we will create tutorials demonstrating how to use them.

Duration: 350h. Expected difficulty: medium. Expected outcome: Implementation of new convolutional layers for temporal graphs and example tutorials.

Recommended skills

Familiarity with graph neural networks and Flux.jl.

Mentors

Carlo Lucibello (author of GraphNeuralNetworks.jl). Feel free to contact us on the Julia Slack Workspace or by opening an issue in the GitHub repo.

GUI projects – Summer of Code

QML and Makie integration

The QML.jl package provides Julia bindings for Qt QML on Windows, OS X and Linux. In the current state, basic GUI functionality exists, and rough integration with Makie.jl is available, allowing overlaying QML GUI elements over Makie visualizations.

Expected results

Split off the QML code for Makie into a separate package. This will allow specifying proper package compatibility between QML and Makie, without making Makie a mandatory dependency for QML (currently we use Requires.jl for that)
Improve the integration. Currently, connections between Makie and QML need to be set up mostly manually. We need to implement some commonly used functionality, such as the registration of clicks in a viewport with proper coordinate conversion and navigation of 3D viewports.

Recommended Skills: Familiarity with both Julia and the Qt framework, some basic C++ skills, affinity with 3D graphics and OpenGL.

Duration: 175h, expected difficulty: medium

Mentors: Bart Janssens and Simon Danish

Web apps in Makie and JSServe

Makie.jl is a visualization ecosystem for the Julia programming language, with a focus on interactivity and performance. JSServe.jl is the core infrastructure library that makes Makie's web-based backend possible.

At the moment, all the necessary ingredients exist for designing web-based User Interfaces (UI) in Makie, but the process itself is quite low-level and time-consuming. The aim of this project is to streamline that process.

Expected results

Implement novel UI components and refine existing ones.
Introduce data structures suitable for representing complex UIs.
Add simpler syntaxes for common scenarios, akin to Interact's @manipulate macro.
Improve documentation and tutorials.
Streamline the deployment process.

Bonus tasks. If time allows, one of the following directions could be pursued.

Make Makie web-based plots more suitable for general web apps (move more computation to the client side, improve interactivity and responsiveness).
Generalize the UI infrastructure to native widgets, which are already implemented in Makie but with a different interface.

Desired skills. Familiarity with HTML, JavaScript, and CSS, as well as reactive programming. Experience with the Julia visualization and UI ecosystem.

Duration. 350h.

Difficulty. Medium.

Mentors. Pietro Vertechi and Simon Danisch.

High Performance and Parallel Computing Projects – Summer of Code

Julia is emerging as a serious tool for technical computing and is ideally suited for the ever-growing needs of big data analytics. This set of proposed projects addresses specific areas for improvement in analytics algorithms and distributed data management.

Scheduling Algorithms for Dagger

Difficulty: Medium (175h)

Dagger.jl is a native Julia framework and scheduler for distributed execution of Julia code and general purpose data parallelism, using dynamic, runtime-generated task graphs which are flexible enough to describe multiple classes of parallel algorithms. This project proposes to implement different scheduling algorithms for Dagger to optimize scheduling of certain classes of distributed algorithms, such as mapreduce and merge sort, and properly utilizing heterogeneous compute resources. Contributors will be expected to find published distributed scheduling algorithms and implement them on top of the Dagger framework, benchmarking scheduling performance on a variety of micro-benchmarks and real problems.

Mentors: Julian Samaroo, Krystian Guliński

Distributed Training

Difficulty: Hard (350h)

Add a distributed training API for Flux models built on top of Dagger.jl. More detailed milestones include building Dagger.jl abstractions for UCX.jl, then building tools to map Flux models into data parallel Dagger DAGs. The final result should demonstrate a Flux model training with multiple devices in parallel via the Dagger.jl APIs. A stretch goal will include mapping operations with a model to a DAG to facilitate model parallelism as well.

There are projects now that host the building blocks: DaggerFlux.jl and Distributed Data Parallel Training which can serve as jumping off points.

Skills: Familiarity with UCX, representing execution models as DAGs, Flux.jl, CUDA.jl and data/model parallelism in machine learning

Mentors: Julian Samaroo, and Dhairya Gandhi

Distributed Arrays over Dagger

Difficulty: Medium (175h)

Array programming is possibly the most powerful abstraction in Julia, yet our distributed arrays support leaves much to be desired. This project's goal is to implement a new distributed array type on top of the Dagger.jl framework, which will allow this new array type to be easily distributed, multithreaded, and support GPU execution. Contributors will be expected to implement a variety of operations, such as mapreduce, sorting, slicing, and linear algebra, on top of their distributed array implementation. Final results will include extensive scaling benchmarks on a range of configurations, as well as an extensive test suite for supported operations.

Mentors: Julian Samaroo, Evelyne Ringoot

JuliaImages Projects – Summer of Code

View all GSoC/JSoC Projects
Projects
1. List of projects
Categorical variable encoding
1. Description
2. Prerequisites
3. Your contribution
4. References
Machine Learning in Predictive Survival Analysis
1. Description
2. Prerequisites
3. Your contribution
4. References
Deeper Bayesian Integration
1. Description
2. Your contributions
3. References
4. Difficulty: Medium to Hard
Tracking and sharing MLJ workflows using MLflow
1. Description
2. Prerequisites
3. Your contribution
4. References
Speed demons only need apply
1. Description
2. Prerequisites
3. Your contribution
4. References
Improving test coverage (175 hours)
Multi-threading Improvement Projects (175 hours each)
Automation of testing / performance benchmarking (350 hours)
Documenter.jl
Fluid-Structure Interaction Example
Investigation of Performant Assembly Strategies
1. Training on very large graphs
2. Adding graph convolutional layers
3. Adding models and examples
4. Adding graph datasets
5. Implement layers for heterogeneous graphs
6. Improving performance using sparse linear algebra
7. Support for AMGDPU and Apple Silicon
8. Implement layers for Temporal Graphs
Recommended skills
Mentors
QML and Makie integration
1. Expected results
Web apps in Makie and JSServe
1. Expected results
Scheduling Algorithms for Dagger
Distributed Training
Distributed Arrays over Dagger
Benchmarking against other frameworks
Where to go for discussion and to find mentors
C++
1. CxxWrap STL
  1. Expected outcome
Rust
1. General goal of JuliaConstraints
Constraint Programming-Based Design for Kumi Kumi Slope
1. Core Objectives
Agents.jl
DynamicalSystems.jl
Large Language Model Projects
1. Project 1: Enhancing llama2.jl with GPU Support
2. Project 2: Llama.jl - Low-level C interface
3. Project 3: Supercharging the Knowledge Base of AIHelpMe.jl
4. Project 4: Enhancing Julia's AI Ecosystem with ColBERT v2 for Efficient Document Retrieval
5. Project 5: Enhancing PromptingTools.jl with Advanced Schema Support and Functionality
6. Project 6: Expanding the Julia Large Language Model Leaderboard
7. Project 7: Counterfactuals for LLMs (Model Explainability and Generative AI)
How to Contact Us
Observational Health Subecosystem Projects
1. Project 1: Developing Tooling for Observational Health Research in Julia
2. Project 2: Developing Patient Level Prediction Tooling within Julia
Medical Imaging Subecosystem Projects
1. Project 3: Adding functionalities to medical imaging visualizations
2. Project 4: Adding dataset-wide functions and integrations of augmentations
3. Project 5: Highly-efficient MRI Simulations with Multi-Vendor GPU Support
MIDIfication of music from wave files
Efficient symbolic-numeric set computations
Reachability with sparse polynomial zonotopes
Improving the hybrid systems reachability API
Panel data analysis
1. Description
2. Prerequisites
3. Your contribution
4. References
Distributions.jl Expansion
1. Prerequisites
2. Your contribution
HypothesisTesting.jl Expansion
1. Prerequisites
2. Your contribution
3. References
CRRao.jl
1. Description
2. Prerequisites
3. Your contribution
JuliaStats Improvements
1. Description
2. Prerequisites
3. Your contribution
Survey.jl
1. Prerequisites
2. Your contribution
3. References
Smoothing non-linear continuous time systems
1. Reinforcement Learning Environments
  1. Expected outcome
2. AlphaZero.jl
  1. Expected Outcomes
Numerical Linear Algebra
1. Matrix functions
Better Bignums Integration
1. Special functions
2. A Julia-native CCSA optimization algorithm
Massive parallel factorized bouncy particle sampler
Machine Learning Time Series Regression
Machine learning for nowcasting and forecasting
Time series forecasting at scales
GPU accelerated simulator of Clifford Circuits.
A Zoo of Quantum Error Correcting codes and/or decoders
Left/Right multiplications with small gates.
Generation of Fault Tolerant ECC Circuits, Flag Qubit Circuits and more
Measurement-Based Quantum Computing (MBQC) compiler
Implementing a Graph State Simulator
Simulation of Slightly Non-Clifford Circuits and States
Magic State Modeling - Distillation, Injection, Etc
GPU accelerated operators and ODE solvers
Autodifferentiation
Closer Integration with the SciML Ecosystem
Efficient Tensor Differentiation
Symbolic root finding
Symbolic Integration in Symbolics.jl
XLA-style optimization from symbolic tracing
Automatically improving floating point accuracy (Herbie)
Parquet.jl enhancements
DataFrames.jl join enhancements
Project 1: Conformal Prediction meets Bayes (Predictive Uncertainty)
Project 2: Counterfactual Regression (Model Explainability)
Project 3: Counterfactuals for LLMs (Model Explainability and Generative AI)
Project 4: From Counterfactuals to Interventions (Recourse through Minimal Causal Interventions)
About Us
How to Contact Us
Testing and benchmarking of TopOpt.jl
Machine learning in topology optimization
Optimization on a uniform rectilinear grid
Adaptive mesh refinement for topology optimization
Heat transfer design optimization
Compiler-based automatic differentiation with Enzyme.jl
Advanced visualization and in-situ visualization with ParaView
Implementing models from PosteriorDB in Turing / Julia
Improving the integration between Turing and Turing’s MCMC inference packages
GPU support for NormalizingFlows.jl and Bijectors.jl
Batched support for NormalizingFlows.jl and Bijectors.jl
Targets for Benchmarking Samplers with vectorization, GPU and high-order derivative supports
VS Code extension
Package installation UI
Code generation improvements and async ABI
Wasm threading
High performance, Low-level integration of js objects
DOM Integration
Porting existing web-integration packages to the wasm platform
Native dependencies for the web
Distributed computing with untrusted parties
Deployment

JuliaImages (see the documentation) is a framework in Julia for multidimensional arrays, image processing, and computer vision (CV). It has an active development community and offers many features that unify CV and biomedical 3D/4D image processing, support big data, and promote interactive exploration.

Often the best ideas are the ones that candidate SoC contributors come up with on their own. We are happy to discuss such ideas and help you refine your proposal. Below are some potential project ideas that might help spur some thoughts. In general, anything that is missing in JuliaImages, and worths three-months' development can be considered as potential GSoC ideas. See the bottom of this page for information about mentors.

Benchmarking against other frameworks

Difficulty: Medium (175h) (High priority)

JuliaImages provides high-quality implementations of many algorithms; however, as yet there is no set of benchmarks that compare our code against that of other image-processing frameworks. Developing such benchmarks would allow us to advertise our strengths and/or identify opportunities for further improvement. See also the OpenCV project below.

Benchmarks for several performance-sensitive packages (e.g., ImageFiltering, ImageTransformations, ImageMorphology, ImageContrastAdjustment, ImageEdgeDetection, ImageFeatures, and/or ImageSegmentation) against frameworks like Scikit-image and OpenCV, and optionally others like ITK, ImageMagick, and Matlab/Octave. See also the image benchmarks repository.

This task splits into at least two pieces:

developing frameworks for collecting the data, and
visualizing the results.

One should also be aware of the fact that differences in implementation (which may include differences in quality) may complicate the interpretation of some benchmarks.

Skills: JuliaImages experiences is required. Some familiarities with other image processing frameworks is preferred.

Mentors: Tim Holy

Where to go for discussion and to find mentors

Interested contributors are encouraged to open an discussion in Images.jl to introduce themselves and discuss the detailed project ideas. To increase the chance of getting useful feedback, please provide detailed plans and ideas (don't just copy the contents here).

Language interoperability – Summer of Code

C++

CxxWrap STL

The CxxWrap.jl package provides a way to load compiled C++ code into Julia. It exposes a small fraction of the C++ standard library to Julia, but many more functions and containers (e.g. std::map) still need to be exposed. The objective of this project is to improve C++ standard library coverage.

Expected outcome

Add missing STL container types (easy)
Add support for STL algorithms (intermediate)
Investigate improvement of compile times and selection of included types (advanced)

Recommended Skills: Familiarity with both Julia and C++

Duration: 175h, expected difficulty: hard

Mentor: Bart Janssens

Rust

Take a look at the hyper.rs project, listed on the "Pluto" page, about wrapping a Rust HTTP server in a Julia package.

Constraint Programming in Julia

JuliaConstraints is an organization supporting packages for Constraint Programming in Julia. Although it is independent of it, it aims for a tight integration with JuMP.jl over time. For a detailed overview of basic Constraint Programming in Julia, please have a look at our video from JuliaCon 2021 Put some constraints into your life with JuliaCon(straints).

General goal of JuliaConstraints

Often, problem-solving involves taking two actions: model and solve. Typically, there is a trade-off between ease of modeling and efficiency of solving. Therefore, one is often required to be a specialist to model and solve an optimization problem efficiently. We investigate the theoretical fundamentals and the implementation of tools to automize and make optimization frameworks. A general user should focus on the model of practical problems, regardless of the software or hardware available. Furthermore, we aim to encourage technical users to use our tools to improve their solving efficiency.

Mentor: Jean-Francois Baffier (azzaare@github)

Constraint Programming-Based Design for Kumi Kumi Slope

This project is at the forefront of developing a level-design tool for the Kumi Kumi Slope game, leveraging the Julia programming language's capabilities. It prioritizes creating an interactive graphical user interface (GUI) for users to actively engage in design optimization. While (GL)Makie.jl is a strong candidate for this GUI, the project remains open to other innovative solutions, such as a Genie.jl-based interface, to accommodate diverse development preferences. Key to this initiative is handling arbitrary domains, generating pools of solutions for multi-objective optimization, and providing visual outputs of game designs. This venture into Constraint Programming (CP) sets the groundwork for efficiency in design while embracing user-defined aesthetic goals and marks a pioneering step towards human-machine collaboration in architectural design.

Core Objectives

Multi-Objective Optimization and Arbitrary Domains (100-150 hours)
- Framework for Multi-Objective Optimization: Establish a robust framework to tackle multiple objectives simultaneously, ranging from efficiency and compactness to playability and aesthetics.
- Support for Arbitrary Domains: Craft a method to define and manipulate arbitrary domains within the CP model, enabling a wide array of component types and design constraints.
Pools of Solutions and Interactive GUI Development (150-200 hours)
- Generation of Solution Pools: Design algorithms to create diverse pools of feasible solutions, catering to different optimization criteria and user preferences, fostering a nuanced approach to human-machine collaboration in design.
- Interactive GUI Development: Embark on the development of an interactive GUI, with (GL)Makie.jl or alternative tools, to facilitate the visualization and manipulation of Kumi Kumi Slope designs, empowering users to explore, select, and refine designs in a user-centric environment.
Visual Output and User Interaction (100-150 hours)
- Visual Representation of Designs: Guarantee that all potential solutions are visually represented within the GUI, enhancing user ability to evaluate and contrast different designs.
- Feedback Mechanism for Design Refinement: Integrate a feedback loop in the GUI, allowing user interactions to refine the solution pool, aligning it closer to user preferences and exemplifying the project's commitment to human-machine collaborative design.

This proposal aims to deliver a significant and impactful tool by the end of the GSoC period. It encourages candidates to dive deep into areas of particular interest, providing flexibility in project focus. By emphasizing realistic targets like the development of an interactive GUI and foundational work on handling arbitrary domains and multi-objective optimization, this project sets a precedent for future advancements in game design and opens the door to broader applications requiring sophisticated design and optimization tools.

Dynamical systems, complex systems & nonlinear dynamics – Summer of Code

Agents.jl

Difficulty: Medium to Hard.

Length: 175 to 350 hours depending on the project.

Agents.jl is a pure Julia framework for agent-based modeling (ABM). It has an extensive list of features, excellent performance and is easy to learn, use, and extend. Comparisons with other popular frameworks written in Python or Java (NetLOGO, MASON, Mesa), show that Agents.jl outperforms all of them in computational speed, list of features and usability.

In this project, contributors will be paired with lead developers of Agents.jl to improve Agents.jl with more features, better performance, and overall higher polish. We are open to discuss with potential candidate a project description and outline for it!

Possible features to implement are:

GPU and/or HPC support in Agents.jl by integrating existing ABM packages (Vanaha.jl or CellBasedModels.jl) into Agents.jl API.
New type of space representing a planet, which can be used in climate policy or human evolution modelling, and new interface for an overarching ABM composed of several smaller ABMs

Pre-requisite: Having already contributed to a Julia package either in JuliaDynamics or of sufficient relevance to JuliaDynamics.

Recommended Skills: Familiarity with agent based modelling, Agents.jl and Julia's Type System. Background in complex systems, sociology, or nonlinear dynamics is not required but would be advantageous.

Expected Results: Well-documented, well-tested useful new features for Agents.jl.

Mentors: George Datseris.

DynamicalSystems.jl

Difficulty: Easy to Medium to Hard, depending on the project.

Length: 175 to 350 hours, depending on the project.

DynamicalSystems.jl is an award-winning Julia software library for dynamical systems, nonlinear dynamics, deterministic chaos, and nonlinear time series analysis. It has an impressive list of features, but one can never have enough. In this project, contributors will be able to enrich DynamicalSystems.jl with new algorithms and enrich their knowledge of nonlinear dynamics and computer-assisted exploration of complex systems.

We do not outline possible projects here, and instead we invite interested candidates to reach out to one of the developers of DynamicalSystems.jl or its subpackages to devise a project outline. We strongly welcome candidates that already have potential project ideas in mind. To get ideas of possible projects we recommend having a look at the list of the open issues in the sub-packages of DynamicalSystems.jl.

Pre-requisite: Having already contributed to a Julia package either in JuliaDynamics or of sufficient relevance to JuliaDynamics.

Recommended Skills: Familiarity with nonlinear dynamics and/or differential equations and/or data analysis and the Julia language.

Expected Results: Well-documented, well-tested new algorithms for DynamicalSystems.jl.

Mentors: George Datseris

JuliaGenAI Projects

JuliaGenAI is an organization focused on advancing Generative AI research and looking for its applications within the Julia programming language ecosystem. Our community comprises AI researchers, developers, and enthusiasts passionate about pushing the boundaries of Generative AI using Julia's high-performance capabilities. We strive to create innovative tools and solutions that leverage the unique strengths of Julia in handling complex AI challenges.

There is a high overlap with organizations, you might be also interested in:

Projects with MLJ.jl - For more traditional machine learning projects
Projects in Reinforcement Learning - For projects around AlphaZero.jl
Projects with FluxML - For projects around Flux.jl, the backbone of Julia's deep learning ecosystem

Large Language Model Projects

Project 1: Enhancing llama2.jl with GPU Support

Project Overview: Llama2.jl is a Julia native port for Llama architectures, originally based on llama2.c. This project aims to enhance Llama2.jl by implementing GPU support through KernelAbstraction.jl, significantly improving its performance.

Mentor: Cameron Pfiffer

Project Difficulty: Hard

Estimated Duration: 350 hours

Ideal Candidate Profile:

Proficiency in Julia programming
Understanding of GPU computing
Experience with KernelAbstractions.jl

Project Goals and Deliverables:

Implementation of GPU support in llama2.jl
Comprehensive documentation and examples demonstrating the performance improvements
Contribution to llama2.jl's existing codebase and documentation

Project 2: Llama.jl - Low-level C interface

Project Overview: Llama.jl is a Julia interface for llama.cpp that powers many open-source tools today. It's currently leveraging only the high-level binaries. This project focuses on generating a low-level C interface to llama.cpp, enabling native access to internal model states, which would open incredible research opportunities and attractive applications (eg, constraint generation, novel sampling algorithms, etc.)

Mentor: Cameron Pfiffer

Project Difficulty: Hard

Estimated Duration: 175 hours

Ideal Candidate Profile:

Proficiency in Julia and C programming

Project Goals and Deliverables:

Auto-generated C interface for tokenization and sampling functionality
Access to internal model states in llama.cpp during token generation
Ability to generate text from a given model state

Project 3: Supercharging the Knowledge Base of AIHelpMe.jl

Project Overview:

Julia stands out as a high-performance language that's essential yet underrepresented in GenAI training datasets. AIHelpMe.jl is our ambitious initiative to bridge this gap by enhancing Large Language Models' (LLMs) understanding of Julia by providing this knowledge via In-Context Learning (RAG, prompting). This project focuses on expanding the embedded knowledge base with up-to-date, context-rich Julia information and optimizing the Q&A pipeline to deliver precise, relevant answers. By injecting targeted Julia code snippets and documentation into queries, AIHelpMe.jl aims to significantly improve the accuracy and utility of generative AI for Julia developers worldwide.

Mentor: Jan Siml / @svilup on JuliaLang Slack / Jan Siml on Julia Zulip

Project Difficulty: Medium

Estimated Duration: 175 hours

Who Should Apply:

Individuals with a solid grasp of the Julia programming language who are eager to deepen their involvement in the Julia and AI communities.
Applicants should have a foundational understanding of Retrieval-Augmented Generation (RAG) optimization techniques and a passion for improving AI technologies.

Project Goals and Deliverables:

Knowledge Base Expansion: Grow the AIHelpMe.jl knowledge base to include comprehensive, up-to-date resources from critical Julia ecosystems such as the Julia documentation site, DataFrames, Makie, Plots/StatsPlots, the Tidier-verse, SciML, and more. See Github Issue for more details. This expansion is crucial for enriching the context and accuracy of AI-generated responses related to Julia programming.
Performance Tuning: Achieve at least a 10% improvement in accuracy and relevance on a golden Q&A dataset, refining the AIHelpMe.jl Q&A pipeline for enhanced performance.

Project 4: Enhancing Julia's AI Ecosystem with ColBERT v2 for Efficient Document Retrieval

Project Overview:

Dive into the forefront of generative AI and information retrieval by bringing ColBERT v2, a cutting-edge document retrieval and re-ranking framework, into the Julia programming world. This initiative aims not only to translate ColBERT v2 to operate natively in Julia but to seamlessly integrate it with AIHelpMe.jl (and other downstream libraries). This integration promises to revolutionize the way users interact with AI by offering locally-hosted, more cost-efficient and highly performant document search capabilities. By enabling this sophisticated technology to run locally, we reduce dependency on large-scale commercial platforms, ensuring privacy and control over data, while maintaining minimal memory overheads.

Mentor: Jan Siml @svilup on JuliaLang Slack / Jan Siml on Julia Zulip

Project Difficulty: Hard

Estimated Duration: 350 hours

Ideal Candidate Profile:

Solid understanding of transformer architectures, with proficiency in Flux.jl or Transformers.jl.
Experience in semantic document retrieval (or with Retrieval-Augmented Generation applications), and a keen interest in pushing the boundaries of AI technology.
A commitment to open-source development and a passion for contributing to an evolving ecosystem of Julia-based AI tools.

Project Goals and Expected Outcomes:

Native Julia Translation of ColBERT v2: Successfully adapt ColBERT v2 to run within the Julia ecosystem. Focus is only the indexing and retrieval functionality of ColBERT v2, eg, the Retrieval and Indexing snippets you see in the Example Usage Section. For guidance, refer to the existing Indexing and Retrieval examples.
Integration with AIHelpMe.jl: Seamlessly integrate as one of the embedding and retrieval backends AIHelpMe.jl (defined in PromptingTools.jl).
Package Registration and Documentation: Register the fully functional package within the Julia ecosystem, accompanied by comprehensive documentation and usage examples to foster adoption and contribution from the community.

Project 5: Enhancing PromptingTools.jl with Advanced Schema Support and Functionality

Project Overview:

PromptingTools.jl, a key tool in the Julia GenAI ecosystem. This project is a concerted effort to broaden the utility and applicability of PromptingTools.jl by incorporating a wider array of prompt templates and schemas, thereby catering to a diverse set of LLM backends. The initiative directly corresponds to Issue #67, Issue #68 and Issue #69 on PromptingTools.jl's GitHub. By enhancing the library's functionality to support structured extraction with the Ollama backend and introducing more standardized prompt schemas, we aim to make PromptingTools.jl an even more powerful and indispensable resource for developers engaging with open-source Large Language Models (LLMs).

Mentor: Jan Siml / @svilup on JuliaLang Slack / Jan Siml on Julia Zulip

Project Difficulty: Medium

Estimated Duration: 175 hours

Ideal Candidate Profile:

Proficient in Julia with a commitment to the advancement of the Julia AI ecosystem.
Experience with open-source LLMs, such as llama.cpp and vLLM, and familiarity with prompt engineering concepts.

Project Goals and Deliverables:

Schema Integration and Functionality Enhancement: Implement and integrate a variety of common prompt schemas (see details in Issue #67). Develop methods for the render and aigenerate functions that enable easy use and rendering of these templates, complete with comprehensive documentation, examples, and tests.
Structured Extraction Support: Add aiextract support for the Ollama backend, as currently, this functionality is not supported. This involves creating methods and templates that facilitate structured data extraction, thereby broadening the use cases and efficiency of interacting with AI models through PromptingTools.jl. See details in Issue #68. Extending the functionality to other backends is a plus.
Support for Common Backends: Extend the functionality of PromptingTools.jl to support common backends such as HuggingFace Transformers and vLLM, ensuring that the library is compatible with a wide range of LLMs. We need to create an example for each backend to demonstrate the functionality. See details in Issue #69

Project 6: Expanding the Julia Large Language Model Leaderboard

Project Overview:

As a pivotal resource for the Julia community, the Julia LLM Leaderboard benchmarks open-source models for Julia code generation. This enhancement project seeks to extend the leaderboard by incorporating additional test cases and expanding benchmarks into Julia-specific applications beyond code generation, such as evaluating Retrieval-Augmented Generation (RAG) applications with a golden Q&A dataset and many others. This initiative, addressing several GitHub issues, aims to improve the leaderboard's utility and accuracy, making it an even more indispensable tool for the community. Participants will have the chance to deepen their knowledge of Generative AI while contributing to a project that enhances how the Julia community selects the most effective AI models for their needs.

Mentor: Jan Siml / @svilup on JuliaLang Slack / Jan Siml on Julia Zulip

Project Difficulty: Easy/Medium

Estimated Duration: 175 hours

Ideal Candidate Profile:

Strong proficiency in Julia and an active participant in the Julia community.
Basic knowledge or interest in Generative AI, with a keenness to learn more through practical application.
A passion for contributing to open-source projects and a desire to help the Julia community identify the most effective AI models for their needs.

Project Goals and Deliverables:

Test Case Expansion: Develop and integrate a diverse range of test cases to assess the capabilities of LLMs in Julia code generation more comprehensively, enhancing the leaderboard's robustness and reliability. See the details here.
Benchmark Extension: Extend the leaderboard's benchmarking capabilities to include evaluations of RAG applications (question-answering systems), focusing on their knowledge of the Julia programming language, and other Julia tasks like "help me speed up this code", "what is a more idiomatic way to write this", etc. There is a slight overlap with Project 3, however, the focus here is to pinpoint promising locally-hosted models with strong capabilities all around. See the details here.
Documentation and Outreach: Document findings and best practices in a series of blog posts to share insights, highlight top-performing models, and guide the community on leveraging Generative AI effectively within the Julia ecosystem.

Project 7: Counterfactuals for LLMs (Model Explainability and Generative AI)

Project Overview: This project aims to extend the functionality of CounterfactualExplanations.jl to Large Language Models (LLMs). As a backbone for this, support for computing feature attributions for LLMs will also need to be implemented. The project will contribute to both Taija and JuliaGenAI.

Mentor: Jan Siml (JuliaGenAI) and Patrick Altmeyer (Taija)

Project Difficulty: Medium

Estimated Duration: 175 hours

Ideal Candidate Profile:

Experience with Julia and multiple dispatch of advantage, but not crucial
Good knowledge of machine learning and statistics
Good understanding of Large Language Models (LLMs)
Ideally previous experience with Transformers.jl

Project Goals and Deliverables:

Carefully think about architecture choices: how can we fit support for LLMs into the existing code base of CounterfactualExplanations.jl?
Implement current state-of-the-art approaches such as MiCE and CORE
Comprehensively test and document your work

How to Contact Us

We'd love to hear your ideas and discuss potential projects with you.

Probably the easiest way is to join our JuliaLang Slack and join the #generative-ai channel. You can also reach out to us on Julia Zulip or post a GitHub Issue on our website JuliaGenAI.

JuliaHealth Projects

JuliaHealth is an organization dedicated to improving healthcare by promoting open-source technologies and data standards. Our community is made up of researchers, data scientists, software developers, and healthcare professionals who are passionate about using technology to improve patient outcomes and promote data-driven decision-making. We believe that by working together and sharing our knowledge and expertise, we can create powerful tools and solutions that have the potential to transform healthcare.

Observational Health Subecosystem Projects

Project 1: Developing Tooling for Observational Health Research in Julia

Description: The OMOP Common Data Model (OMOP CDM) is a widely used data standard that allows researchers to analyze large, heterogeneous healthcare datasets in a consistent and efficient manner. JuliaHealth has several packages that can interact with databases that adhere to the OMOP CDM (such as OMOPCDMCohortCreator.jl or OMOPCDMDatabaseConnector.jl). For this project, we are looking for students interested in further developing the tooling in Julia to interact with OMOP CDM databases.

Mentor: Jacob Zelko (aka TheCedarPrince) [email: jacobszelko@gmail.com]
Difficulty: Medium
Duration: 350 hours
Suggested Skills and Background:
- Experience with Julia
- Familiarity with some of the following Julia packages would be a strong asset:
  - FunSQL.jl
  - DataFrames.jl
  - Distributed.jl
  - OMOPCDMCohortCreator.jl
  - OMOPCDMDatabaseConnector.jl
  - OMOPCommonDataModel.jl
- Comfort with the OMOP Common Data Model (or a willingness to learn!)
Potential Outcomes:

Some potential project outcomes could be:

Expanding OMOPCDMCohortCreator.jl to enable users to add constraints to potential patient populations they want to create such as conditional date ranges for a given drug or disease diagnosis.
Support parallelization of OMOPCDMCohortCreator.jl based queries when developing a patient population.
Develop and explore novel ways for how population filters within OMOPCDMCohortCreator.jl can be composed together for rapid analysis.

In whatever functionality that gets developed for tools within JuliaHealth, it will also be expected for students to contribute to the existing package documentation to highlight how new features can be used. Although not required, if students would like to submit a lightning talks, posters, etc. to JuliaCon in the future about their work, they will be supported in this endeavor!

Please contact the mentor for this project if interested and want to discuss what else could be pursued in the course of this project.

Project 2: Developing Patient Level Prediction Tooling within Julia

Description: Patient level prediction (PLP) is an important area of research in healthcare that involves using patient data to predict outcomes such as disease progression, response to treatment, and hospital readmissions. JuliaHealth is interested in developing tooling for PLP that utilizes historical patient data, such as patient medical claims or electronic health records, that follow the OMOP Common Data Model (OMOP CDM), a widely used data standard that allows researchers to analyze large, heterogeneous healthcare datasets in a consistent and efficient manner. For this project, we are looking for students interested in developing PLP tooling within Julia.

Mentor: Sebastian Vollmer [email: sjvollmer@gmail.com], Jacob Zelko (aka TheCedarPrince) [email: jacobszelko@gmail.com]
Difficulty: Hard
Duration: 350 hours
Suggested Skills and Background:
- Experience with Julia
- Exposure to machine learning concepts and ideas
- Familiarity with some of the following Julia packages would be a strong asset:
  - DataFrames.jl
  - OMOPCDMCohortCreator.jl
  - MLJ.jl
  - ModelingToolkit.jl
- Comfort with the OMOP Common Data Model (or a willingness to learn)
Outcomes:

This project will be very experimental and exploratory in nature. To constrain the expectations for this project, here is a possible approach students will follow while working on this project:

Review existing literature on approaches to PLP
Familiarize oneself with tools for machine learning and prediction within the Julia ecosystem
Determine PLP research question to drive package development
Develop PLP package utilizing JuliaHealth tools to work with an OMOP CDM database
Test and validate PLP package for investigating the research question
Document findings and draft JuliaCon talk

In whatever functionality that gets developed for tools within JuliaHealth, it will also be expected for students to contribute to the existing package documentation to highlight how new features can be used. For this project, it will be expected as part of the proposal to pursue drafting and giving a talk at JuliaCon. Furthermore, although not required, publishing in the JuliaCon Proceedings will both be encouraged and supported by project mentors.

Additionally, depending on the success of the package, there is a potential to run experiments on actual patient data to generate actual patient population insights based on a chosen research question. This could possibly turn into a separate research paper, conference submission, or poster submission. Whatever may occur in this situation will be supported by project mentors.

Medical Imaging Subecosystem Projects

MedPipe3D.jl together with MedEye3D.jl MedEval3D.jl and currently in development MedImage.jl is a set of libraries created to provide essential tools for 3D medical imaging to the Julia language ecosystem.

MedImage is a package for the standardization of loading medical imaging data, and for its basic processing that takes into consideration its spatial metadata. MedEye3D is a package that supports the display of medical imaging data. MedEval3D has implemented some highly performant algorithms for calculating metrics needed to asses the performance of 3d segmentation models. MedPipe3D was created as a package that improves integration between other parts of the small ecosystem (MedEye3D, MedEval3D, and MedImage).

Project 3: Adding functionalities to medical imaging visualizations

Description: MedEye3D is a package that supports the display of medical imaging data. It includes multiple functionalities specific to this use case like automatic windowing to display soft tissues, lungs, and other tissues. The display takes into account voxel spacing, support of overlaying display for multimodal imaging, and more. All with high performance powered by OpenGL and Rocket.jl. Still, a lot of further improvements are possible and are described in the Potential Outcomes section.

Mentor: Jakub Mitura [email: jakub.mitura14@gmail.com]
Difficulty: Hard
Duration: 350 hours
Suggested Skills and Background:
- Experience with Julia
- Basic familiarity with computer graphics preferably OpenGL example link
- Some experience with 3d volumetric data with spatial metadata (or a willingness to learn!) look into for example link
Potential Outcomes:

Although MedEye3D already supports displaying medical images, there are still some functionalities that will be useful for the implementation of some more advanced algorithms, like supervoxel segmentation or image registration (and both of them are crucial for solving a lot of important problems in medical imaging). To achieve this this project's goal is to implement.

Developing support for multiple image viewing with indicators for image registration like display of the borders, and display lines connecting points.
Automatic correct windowing for MRI and PET.
Support of display for supervoxels (sv). Show borders of sv; indicate whether the gradient of the image is in agreement with sv borders.
Improve start time.
Simplify basic usage by providing high-level functions.

Success criteria and time needed: How the success of functionality described above is defined and the approximate time required for each.

The user can load 2 different images, and they would display concurrently one next to the other. During scrolling the same area of the body should be displayed (for well-registered sample images) based on the supplied metadata. While moving the mouse cursor on one image the position of the cursor in the same physical spot on the other image should be displayed (physical location calculated from spatial metadata). 120h
Given the most common PET and MRI modalities (random FDG PET/CT, and T2, T1, FLAIR, ADC, DWI on MRI) - the user will see the image similar to what is automatically displayed in 3DSlicer - 10h
Given an integer mask where a unique integer value will encode information about a single supervoxel and an underlying 3d medical image user will have the option to overlay the original image with the borders of the superpixels where adjacent borders will have different colors, or show those borders on the background of the image convolved with edge filter, for example, Sobel filter - 180h
Any measurable decrease in the start time of the viewer - 20h
The user will be able to display images just by supplying MedImage objects from the MedImage.jl library to a single display function - 20h

Project 4: Adding dataset-wide functions and integrations of augmentations

Description: MedPipe3D was created as a package that improves integration between other parts of the small ecosystem (MedEye3D, MedEval3D, and MedImage). Currently, it needs to be expanded and adapted so it can be a basis for a fully functional medical imaging pipeline. It requires utilities for preprocessing specific to medical imaging - like uniformization of spacing, orientation, cropping, or padding. It needs to k fold cross validation and simple ensembling. Other necessary part of the segmentation pipeline are the augmentations that should be easier to use, and provide test time augmentation for uncertainty quantification. The last thing in the pipeline that is also important for practitioners is postprocessing - and the most popular postprocessing is finding and keeping only the largest connected component.

Mentor: Jakub Mitura [email: jakub.mitura14@gmail.com]
Difficulty: Medium
Duration: 350 hours
Suggested Skills and Background:
- Experience with Julia
- Familiarity with some of the following Julia packages would be a strong asset:
  - MedEye3D.jl
  - MedEval3D.jl

Potential Outcomes:

Integrate augmentations like rotations recalling gamma etc.
Enable invertible augmentations and support test time augmentations.
Add patch-based data loading with probabilistic oversampling.
Calculate median and mean spacing and enable applying resampling to the median or mean spacing of the dataset.
Add basic post-processing like the largest connected component analysis.
Set all hyperparameters (of augmentation; size of a patch, threshold for getting binary mask from probabilities) in a struct or dictionary to enable hyperparameter tuning.
Enable automated display of the algorithm output in the validation epoch, including saving such outputs to persistent storage.
Support k-fold cross-validation.

This set of changes although time-consuming to implement should not pose a significant issue to anybody with experience with the Julia programming language. However, implementing those will be a huge step in making Julia language a good alternative to Python in developing end-to-end medical imaging segmentation algorithms.

Success criteria and time needed: How the success of functionality described above is defined and the approximate time required for each.

Given the configuration struct supplied by the user the supplied augmentations will be executed with some defined probability after loading the image: Brightness transform, Contrast augmentation transform, Gamma Transform, Gaussian noise transform, Rician noise transform, Mirror transform, Scale transform, Gaussian blur transform, Simulate low-resolution transform, Elastic deformation transform -100h.
Enable some transformation to be executed on the model input, then inverse this transform on the model output; execute model inference n times when n is supplied by the user and return mean and standard deviation of segmentation masks produced by the model as the output -60h.
given the size of the 3D patch by the user algorithm after data loading will crop or pad the supplied image to meet the set size criterion. The part of the image where the label is present should be selected more frequently than the areas without during cropping, the probability that the area with some label indicated on segmentation mas will be chosen will equal p (0-1) where p is supplied by the user -40h.
given the list of paths to medical images it will load them calculate the mean or median spacing (option selected by the user), and return it. Then during pipeline execution, all images should be resampled to a user-supplied spacing and user-supplied orientation - 40h.
Given a model output and a threshold that will be used for each channel of the output to binarize the output user will have an option to retrieve only n largest components from binarized algorithm output - 20h.
Probabilities and hyperparameters of all augmentations, thresholds for binarization of output channels chosen spacing for preprocessing, number and settings of test time augmentations should be available in a hyperparam struct that is the additional argument of the pipeline function and that can be used for hyperparameter tuning -30h.
During the validation epoch the images can be saved into persistent storage and a single random image loaded together with the output mask into MedEye3d for visualization during training -30h.
The user can set either val_percentage - which will lead to the division of the dataset to training and validation fold or supply k which will lead to k-fold cross-validation. In the latter option mean, threshold, and standard deviation of the ensemble will be returned as the final output of the model -30h.

For each point mentor will also supply the person responsible for implementation with examples of required functionalities in Python or will point to the Julia libraries already implementing it (that just need to be integrated).

Project 5: Highly-efficient MRI Simulations with Multi-Vendor GPU Support

Description: KomaMRI.jl is a Julia package designed for highly-efficient Magnetic Resonance Imaging (MRI) simulations, serving both educational and research purposes. Simulations can help to grasp hard-to-understand MRI concepts, like pulse sequences, signal generation and acquisition. Moreover, they can guide the design of novel pulse sequences, and generate synthetic data for training machine learning models.

Currently, our simulator performs GPU-accelerated computations using CUDA arrays. We are now advancing to implement a new simulation method (BlochKernel<:SimulationMethod) based on GPU kernel programming using KernelAbstractions.jl. This enhancement will not only boost computation speeds but also broaden accessibility with KernelAbstractions.jl's multi-vendor GPU support. This could enable the use of MRI simulations in iterative algorithms to solve inverse problems. We are seeking enthusiastic people interested in developing this functionality.

Mentors: Carlos Castillo [email: cncastillo@uc.cl], Boris Oróstica [email: beorostica@uc.cl], Pablo Irarrazaval [email: pim@uc.cl]
Difficulty: Hard
Duration: 350 hours (2 months, 8 hours per day)
Suggested Skills and Background:
- Experience with Julia
- Exposure to MRI concepts and ideas
- High-level knowledge of GPU programming
- Familiarity with some of the following Julia packages would be desired:
  - KernelAbstractions.jl
  - CUDA.jl
  - Adapt.jl
  - Functors.jl
Outcomes:

We expect the following outcomes by the end of this program:

Extended and/or improved GPU-accelerated simulations, having generated a new simulation method BlochKernel with multi-vendor GPU support.
Developed documentation explaining the new simulation method, including showcasing some use-case examples.
Implemented automatic pipelines on Buildkite for testing the simulations across multiple GPU architectures.
Reported performance improvements between BlochKernel and Bloch.

Please contact the mentors of this project if you are interested and want to discuss other aspects that could be pursued during the course of this project.

Music data analysis - Summer of Code

JuliaMusic is an organization providing packages and functionalities that allow analyzing the properties of music performances.

MIDIfication of music from wave files

Difficulty: Medium.

Length: 350 hours.

It is easy to analyze timing and intensity fluctuations in music that is the form of MIDI data. This format is already digitalized, and packages such as MIDI.jl and MusicManipulations.jl allow for seamless data processing. But arguably the most interesting kind of music to analyze is the live one. Live music performances are recorded in wave formats. Some algorithms exist that can detect the "onsets" of music hits, but they are typically focused only on the timing information and hence forfeit detecting e.g., the intensity of the played note. Plus, there are very few code implementations online for this problem, almost all of which are old and unmaintained. We would like to implement an algorithm in MusicProcessing.jl that given a recording of a single instrument, it can "MIDIfy" it, which means to digitalize it into the MIDI format.

Recommended Skills: Background in music, familiarity with digital signal processing.

Expected results: A well-tested, well-documented function midify in MusicProcessing.jl.

Mentors: George Datseris.

JuliaReach - Summer of Code

JuliaReach is the Julia ecosystem for reachability computations of dynamical systems. Application domains of set-based reachability include formal verification, controller synthesis and estimation under uncertain model parameters or inputs. For further context reach us on the JuliaReach zulip stream. You may also refer to the review article Set Propagation Techniques for Reachability Analysis.

Efficient symbolic-numeric set computations

Difficulty: Medium.

Description. LazySets is the core library of JuliaReach. It provides ways to symbolically compute with geometric sets, with a focus on lazy set representations and efficient high-dimensional processing. The library has been described in the article LazySets.jl: Scalable Symbolic-Numeric Set Computations.

The main interest in this project is to implement algorithms that leverage the structure of the sets. Typical examples include polytopes and zonotopes (convex), polynomial zonotopes and Taylor models (non-convex) to name a few.

Expected Results. The goal is to implement certain efficient state-of-the-art algorithms from the literature. The code is to be documented, tested, and evaluated in benchmarks. Specific tasks may include (to be driven by the interets of the candidate): efficient vertex enumeration of zonotopes; operations on polynomial zonotopes; operations on zonotope bundles; efficient disjointness checks between different set types; complex zonotopes.

Expected Length. 175 hours.

Recommended Skills. Familiarity with Julia and Git/GitHub is mandatory. Familiarity with LazySets is recommended. Basic knowledge of geometric terminology is appreciated but not required.

Mentors: Marcelo Forets, Christian Schilling.

Reachability with sparse polynomial zonotopes

Difficulty: Medium.

Description. Sparse polynomial zonotopes are a new non-convex set representation that are well-suited for reachability analysis of nonlinear dynamical systems. This project is a continuation of GSoC'2022 - Reachability with sparse polynomial zonotopes, which implemented the basics in LazySets.

Expected Results. It is expected to add efficient Julia implementations of a reachability algorithm for dynamical systems in ReachabilityAnalysis which leverages polynomial zonotopes. A successful project should:

Replicate the results from the article [Reachability Analysis for Linear Systems with Uncertain Parameters using Polynomial Zonotopes

](https://dl.acm.org/doi/abs/10.1145/3575870.3587130).

The code shall be documented, tested, and evaluated extensively in benchmarks.

For ambitious candidates it is possible to draw connections with neural-network control systems as implemented in ClosedLoopReachability.jl.

Expected Length. 175 hours.

Recommended Skills. Familiarity with Julia and Git/GitHub is mandatory. Familiarity with the mentioned Julia packages is appreciated but not required. The project does not require theoretical contributions, but it requires reading a research literature, hence a certain level of academic experience is recommended.

Literature and related packages. This video explains the concept of polynomial zonotopes (slides here). The relevant theory is described in this research article. There exists a Matlab implementation in CORA (the implementation of polynomial zonotopes can be found in this folder).

Mentors: Marcelo Forets, Christian Schilling.

Improving the hybrid systems reachability API

Difficulty: Medium.

Description. ReachabilityAnalysis is a Julia library for set propagation of dynamical systems. One of the main aims is to handle systems with mixed discrete-continuous behaviors (known as hybrid systems in the literature). This project will focus on enhancing the capabilities of the library and overall improvement of the ecosystem for users.

Expected Results. Specific tasks may include: problem-specific heuristics for hybrid systems; API for time-varying input sets; flowpipe underapproximations. The code is to be documented, tested, and evaluated in benchmarks. Integration with ModelingToolkit.jl can also be considered if there is interest.

Expected Length. 175 hours.

Recommended Skills. Familiarity with Julia and Git/GitHub is mandatory. Familiarity with LazySets and ReachabilityAnalysis is welcome but not required.

Mentors: Marcelo Forets, Christian Schilling.

JuliaStats Projects – Summer of Code

JuliaStats is an organization dedicated to providing high-quality packages for statistics in Julia.

Panel data analysis

Implement panel analysis models and estimators in Julia.

Difficulty. Moderate. Duration. 350 hours

Description

Panel data is an important kind of statistical data that deals with observations of multiple units across time. Common examples of panel data include economic statistics (where it is common to observe figures for several countries over time). This combination of longitudinal and cross-sectional data can be powerful for extracting causal structure from data.

Mentors. Nils Gudat, José Bayoán Santiago Calderón, Carlos Parada

Prerequisites

Must be fluent in at least one language for statistical computing, and willing to learn Julia before the start of projects.
Knowledge of basic statistical inference covering topics such as maximum likelihood estimation, confidence intervals, and hypothesis testing. (Must know before applying.)
Basic familiarity with time series statistics (e.g. ARIMA models, autocorrelations) or panel data. (Can be learned after applying.)

Your contribution

Participants will:

Learn and build on past approaches and packages for panel data analysis, such as those in Econometrics.jl and SynthControl.jl.
Generalize TreatmentPanels.jl into an abstract interface for dealing with and manipulating panel data.
Integrate existing estimators provided by packages such as Econometrics.jl into a single package for panel data estimation.

References

A Primer for Panel Data Analysis
Econometric Analysis of Cross Section and Panel Data by Jeffrey Wooldridge

Distributions.jl Expansion

Distributions.jl is a package providing basic probability distributions and associated functions.

Difficulty. Easy-Medium. Duration. 175-350 hours

Prerequisites

Must be fluent in Julia.
A college-level introduction to probability covering topics such as probability density functions, moments and cumulants, and multivariate distributions.

Your contribution

Possible improvements to Distributions.jl include:

New distribution families, such as elliptical distributions or distributions of order statistics.
Additional parametrizations and keyword constructors for current distributions.
Extended support for distributions of transformed variables.
Replace RMath RNGs.

HypothesisTesting.jl Expansion

HypothesisTesting.jl is a package that implements a range of hypothesis tests.

Difficulty. Medium. Duration. 350 hours

Mentors. Sourish Das, Mousum Dutta

Prerequisites

Must be fluent in Julia.
A college-level introduction to probability covering topics such as probability density functions, moments and cumulants, and multivariate distributions.

Your contribution

Improvements to Distributions.jl include:

Develop Breusch-Pagan test against heteroskedasticity
Develop Harvey-Collier Test for linearity
Develop Bartlet Rank Test for randomness
Develop an exact dynamic programming solution to Wilcoxon–Mann–Whitney (WMW) test

References

bptest in R
randtests in R
Alexander Marx, etal. (2016) “Exact Dynamic Programing Solution of the Wilcoxon–Mann–Whitney Test” Genomics Proteomics Bioinformatics, 14, 55-61

CRRao.jl

Implement consistent APIs for statistical modeling in Julia.

Difficulty. Medium. Duration. 350 hours

Description

Currently, the Julia statistics ecosystem is quite fragmented. There is value in having a consistent API for a wide variety of statistical models. The CRRao.jl package offers this design.

Mentors. Sourish Das, Ayush Patnaik

Prerequisites

Must be fluent in Julia.
Basic statistical inference covering topics such as maximum likelihood estimation, confidence intervals, and hypothesis testing.

Your contribution

Participants will:

Help create, test, and document standard statistical APIs for Julia.
Integrate MixedModels.jl

JuliaStats Improvements

General improvements to JuliaStats packages, depending on the interests of participants.

Difficulty. Easy-Hard. Duration. 175-350 hours.

Description

JuliaStats provides many of the most popular packages in Julia, including:

StatsBase.jl for basic statistics (e.g. weights, sample statistics, moments).
MixedModels.jl for random and mixed-effects linear models.
GLM.jl for generalized linear models.

All of these packages are critically important to the Julia statistics community, and all could be improved.

Mentors. Mousum Dutta, Ayush Patnaik, Carlos Parada

Prerequisites

Must be fluent in at least one language for statistical computing, and willing to learn Julia before the start of projects.
Knowledge of basic statistical inference covering topics such as maximum likelihood estimation, confidence intervals, and hypothesis testing.

Your contribution

Participants will:

Make JuliaStats better! This can include additional estimators, new features, performance improvements, or anything else you're interested in.
StatsBase.jl improvements could include support for cumulants, L-moments, or additional estimators.
Improved nonparametric density estimators, e.g. those in R's Locfit.

Survey.jl

This package is used to study complex survey data. Examples of real-world surveys include official government surveys in areas like economics, health and agriculture; financial and commercial surveys. Social and behavioural scientists like political scientists, sociologists, psychologists, biologists and macroeconomists also analyse surveys in academic and theoretical settings. The prevalence of "big" survey datasets has exploded with the ease of administering surveys online. The project aims to use performance enhancements of Julia to create a fast package for modern "large" surveys.

Difficulty. Easy-Hard. Duration. 175-350 hours

Mentors. Ayush Patnaik

Prerequisites

Experience with at least one language for statistical computing (Julia, R, Python, SAS, Stata etc), and willing to learn Julia before the start of projects.
Knowledge of basic statistical and probability concepts, preferably covered from academic course(s).
(Bonus) Any prior experience or coursework with survey analysis, using any software or tool.

Your contribution

The project can be tailored around the background and interests of participants and depending on ability, several standalone mini-projects can be created. Participants can potentially work on:

Generalised variance estimation methods using taylor linearisation
Post-stratification, raking or calibration, GREG estimation and related methods.
Connect Survey.jl with FreqTable.jl for contingency table analysis, or to survival analysis, or a machine learning library.
Improve support for multistage and Probability Proportional to Size (PPS) sampling with or without replacement.
Association tests (with contingency tables), Rao-Scott, likelihood ratio tests for glms, Cox models, loglinear models.
Handling missing data, imputation like mitools.

References

Survey.jl - see some issues, past PR's and milestone ideas
Julia discourse post asking for community suggestions here
JuliaCon Statistics Symposium clip for Survey
Model Assisted Survey Sampling - Sarndal, Swensson, Wretman (1992)
Complex Surveys: a guide to analysis using R
Survey analysis in R for high level topics than can be implemented for Julia

Stochastic differential equations and continuous time signal processing – Summer of Code

Smoothing non-linear continuous time systems

The contributor implements a state of the art smoother for continuous-time systems with additive Gaussian noise. The system's dynamics can be described as an ordinary differential equation with locally additive Gaussian random fluctuations, in other words a stochastic ordinary differential equation.

Given a series of measurements observed over time, containing statistical noise and other inaccuracies, the task is to produce an estimate of the unknown trajectory of the system that led to the observations.

Linear continuous-time systems are smoothed with the fixed-lag Kalman-Bucy smoother (related to the Kalman–Bucy_filter). It relies on coupled ODEs describing how mean and covariance of the conditional distribution of the latent system state evolve over time. A versatile implementation in Julia is missing.

Expected Results: Build efficient implementation of non-linear smoothing of continuous stochastic dynamical systems.

Recommended Skills: Gaussian random variables, Bayes' formula, Stochastic Differential Equations

Mentors: Moritz Schauer

Rating: Hard, 350 hours

Machine Learning Projects - Summer of Code

Note: FluxML participates as a NumFOCUS sub-organization. Head to the FluxML GSoC page for their idea list.

Reinforcement Learning Environments

Time: 175h

Develop a series of reinforcement learning environments, in the spirit of the OpenAI Gym. Although we have wrappers for the gym available, it is hard to install (due to the Python dependency) and, since it's written in Python and C code, we can't do more interesting things with it (such as differentiate through the environments).

Expected outcome

A pure-Julia version of selected environments that supports a similar API and visualisation options would be valuable to anyone doing RL with Flux.

Mentors: Dhairya Gandhi.

AlphaZero.jl

The philosophy of the AlphaZero.jl project is to provide an implementation of AlphaZero that is simple enough to be widely accessible for contributors and researchers, while also being sufficiently powerful and fast to enable meaningful experiments on limited computing resources (our latest release is consistently between one and two orders of magnitude faster than competing Python implementations).

Here are a few project ideas that build on AlphaZero.jl. Please contact us for additional details and let us know about your experience and interests so that we can build a project that best suits your profile.

[Easy (175h)] Integrate AlphaZero.jl with the OpenSpiel game library and benchmark it on a series of simple board games.
[Medium (175h)] Use AlphaZero.jl to train a chess agent. In order to save computing resources and allow faster bootstrapping, you may train an initial policy using supervised learning.
[Hard (350h)] Build on AlphaZero.jl to implement the MuZero algorithm.
[Hard (350h)] Explore applications of AlphaZero beyond board games (e.g. theorem proving, chip design, chemical synthesis...).

Expected Outcomes

In all these projects, the goal is not only to showcase the current Julia ecosystem and test its limits, but also to push it forward through concrete contributions that other people can build on. Such contributions include:

Improvements to existing Julia packages (e.g. AlphaZero, ReinforcementLearning, CommonRLInterface, Dagger, Distributed, CUDA...) through code, documentation or benchmarks.
A well-documented and replicable artifact to be added to AlphaZero.Examples, ReinforcementLearningZoo or released in its own package.
A blog post that details your experience, discusses the challenges you went through and identifies promising areas for future work.

Mentors: Jonathan Laurent

Molecular Simulation - Summer of Code

Much of science can be explained by the movement and interaction of molecules. Molecular dynamics (MD) is a computational technique used to explore these phenomena, from noble gases to biological macromolecules. Molly.jl is a pure Julia package for MD, and for the simulation of physical systems more broadly. The package is currently under development with a focus on proteins and differentiable molecular simulation. There are a number of ways that the package could be improved:

Machine learning potentials (duration: 175h, expected difficulty: easy to medium): in the last few years machine learning potentials have been improved significantly. Models such as ANI, ACE, NequIP and Allegro can be added to Molly.
Better GPU performance (duration: 175h, expected difficulty: medium): custom GPU kernels can be written to significantly speed up molecular simulation and make the performance of Molly comparable to mature software.
Constraint algorithms (duration: 175h, expected difficulty: medium): many simulations keep fast degrees of freedom such as bond lengths and bond angles fixed using approaches such as SHAKE, RATTLE and SETTLE. A fast implementation of these algorithms would be a valuable contribution.
Electrostatic summation (duration: 175h, expected difficulty: medium to hard): methods such as particle-mesh Ewald (PME) are in wide use for molecular simulation. Developing fast, flexible implementations and exploring compatibility with GPU acceleration and automatic differentiation would be an important contribution.

Recommended skills: familiarity with computational chemistry, structural bioinformatics or simulating physical systems.

Expected results: new features added to the package along with tests and relevant documentation.

Mentor: Joe Greener

Contact: feel free to ask questions via email or #juliamolsim on the Julia Slack.

Numerical Projects – Summer of Code

View all GSoC/JSoC Projects
Projects
1. List of projects
Categorical variable encoding
1. Description
2. Prerequisites
3. Your contribution
4. References
Machine Learning in Predictive Survival Analysis
1. Description
2. Prerequisites
3. Your contribution
4. References
Deeper Bayesian Integration
1. Description
2. Your contributions
3. References
4. Difficulty: Medium to Hard
Tracking and sharing MLJ workflows using MLflow
1. Description
2. Prerequisites
3. Your contribution
4. References
Speed demons only need apply
1. Description
2. Prerequisites
3. Your contribution
4. References
Improving test coverage (175 hours)
Multi-threading Improvement Projects (175 hours each)
Automation of testing / performance benchmarking (350 hours)
Documenter.jl
Fluid-Structure Interaction Example
Investigation of Performant Assembly Strategies
1. Training on very large graphs
2. Adding graph convolutional layers
3. Adding models and examples
4. Adding graph datasets
5. Implement layers for heterogeneous graphs
6. Improving performance using sparse linear algebra
7. Support for AMGDPU and Apple Silicon
8. Implement layers for Temporal Graphs
Recommended skills
Mentors
QML and Makie integration
1. Expected results
Web apps in Makie and JSServe
1. Expected results
Scheduling Algorithms for Dagger
Distributed Training
Distributed Arrays over Dagger
Benchmarking against other frameworks
Where to go for discussion and to find mentors
C++
1. CxxWrap STL
  1. Expected outcome
Rust
1. General goal of JuliaConstraints
Constraint Programming-Based Design for Kumi Kumi Slope
1. Core Objectives
Agents.jl
DynamicalSystems.jl
Large Language Model Projects
1. Project 1: Enhancing llama2.jl with GPU Support
2. Project 2: Llama.jl - Low-level C interface
3. Project 3: Supercharging the Knowledge Base of AIHelpMe.jl
4. Project 4: Enhancing Julia's AI Ecosystem with ColBERT v2 for Efficient Document Retrieval
5. Project 5: Enhancing PromptingTools.jl with Advanced Schema Support and Functionality
6. Project 6: Expanding the Julia Large Language Model Leaderboard
7. Project 7: Counterfactuals for LLMs (Model Explainability and Generative AI)
How to Contact Us
Observational Health Subecosystem Projects
1. Project 1: Developing Tooling for Observational Health Research in Julia
2. Project 2: Developing Patient Level Prediction Tooling within Julia
Medical Imaging Subecosystem Projects
1. Project 3: Adding functionalities to medical imaging visualizations
2. Project 4: Adding dataset-wide functions and integrations of augmentations
3. Project 5: Highly-efficient MRI Simulations with Multi-Vendor GPU Support
MIDIfication of music from wave files
Efficient symbolic-numeric set computations
Reachability with sparse polynomial zonotopes
Improving the hybrid systems reachability API
Panel data analysis
1. Description
2. Prerequisites
3. Your contribution
4. References
Distributions.jl Expansion
1. Prerequisites
2. Your contribution
HypothesisTesting.jl Expansion
1. Prerequisites
2. Your contribution
3. References
CRRao.jl
1. Description
2. Prerequisites
3. Your contribution
JuliaStats Improvements
1. Description
2. Prerequisites
3. Your contribution
Survey.jl
1. Prerequisites
2. Your contribution
3. References
Smoothing non-linear continuous time systems
1. Reinforcement Learning Environments
  1. Expected outcome
2. AlphaZero.jl
  1. Expected Outcomes
Numerical Linear Algebra
1. Matrix functions
Better Bignums Integration
1. Special functions
2. A Julia-native CCSA optimization algorithm
Massive parallel factorized bouncy particle sampler
Machine Learning Time Series Regression
Machine learning for nowcasting and forecasting
Time series forecasting at scales
GPU accelerated simulator of Clifford Circuits.
A Zoo of Quantum Error Correcting codes and/or decoders
Left/Right multiplications with small gates.
Generation of Fault Tolerant ECC Circuits, Flag Qubit Circuits and more
Measurement-Based Quantum Computing (MBQC) compiler
Implementing a Graph State Simulator
Simulation of Slightly Non-Clifford Circuits and States
Magic State Modeling - Distillation, Injection, Etc
GPU accelerated operators and ODE solvers
Autodifferentiation
Closer Integration with the SciML Ecosystem
Efficient Tensor Differentiation
Symbolic root finding
Symbolic Integration in Symbolics.jl
XLA-style optimization from symbolic tracing
Automatically improving floating point accuracy (Herbie)
Parquet.jl enhancements
DataFrames.jl join enhancements
Project 1: Conformal Prediction meets Bayes (Predictive Uncertainty)
Project 2: Counterfactual Regression (Model Explainability)
Project 3: Counterfactuals for LLMs (Model Explainability and Generative AI)
Project 4: From Counterfactuals to Interventions (Recourse through Minimal Causal Interventions)
About Us
How to Contact Us
Testing and benchmarking of TopOpt.jl
Machine learning in topology optimization
Optimization on a uniform rectilinear grid
Adaptive mesh refinement for topology optimization
Heat transfer design optimization
Compiler-based automatic differentiation with Enzyme.jl
Advanced visualization and in-situ visualization with ParaView
Implementing models from PosteriorDB in Turing / Julia
Improving the integration between Turing and Turing’s MCMC inference packages
GPU support for NormalizingFlows.jl and Bijectors.jl
Batched support for NormalizingFlows.jl and Bijectors.jl
Targets for Benchmarking Samplers with vectorization, GPU and high-order derivative supports
VS Code extension
Package installation UI
Code generation improvements and async ABI
Wasm threading
High performance, Low-level integration of js objects
DOM Integration
Porting existing web-integration packages to the wasm platform
Native dependencies for the web
Distributed computing with untrusted parties
Deployment

Numerical Linear Algebra

Matrix functions

Matrix functions map matrices onto other matrices, and can often be interpreted as generalizations of ordinary functions like sine and exponential, which map numbers to numbers. Once considered a niche province of numerical algorithms, matrix functions now appear routinely in applications to cryptography, aircraft design, nonlinear dynamics, and finance.

This project proposes to implement state of the art algorithms that extend the currently available matrix functions in Julia, as outlined in issue #5840. In addition to matrix generalizations of standard functions such as real matrix powers, surds and logarithms, contributors will be challenged to design generic interfaces for lifting general scalar-valued functions to their matrix analogues for the efficient computation of arbitrary (well-behaved) matrix functions and their derivatives.

Recommended Skills: A strong understanding of calculus and numerical analysis.

Expected Results: New and faster methods for evaluating matrix functions.

Mentors: Jiahao Chen, Steven Johnson.

Difficulty: Hard

Better Bignums Integration

Julia currently supports big integers and rationals, making use of the GMP. However, GMP currently doesn't permit good integration with a garbage collector.

This project therefore involves exploring ways to improve BigInt, possibly including:

Modifying GMP to support high-performance garbage-collection
Reimplementation of aspects of BigInt in Julia
Lazy graph style APIs which can rewrite terms or apply optimisations

This experimentation could be carried out as a package with a new implementation, or as patches over the existing implementation in Base.

Expected Results: An implementation of BigInt in Julia with increased performance over the current one.

Require Skills: Familiarity with extended precision numerics OR performance considerations. Familiarity either with Julia or GMP.

Mentors: Jameson Nash

Difficulty: Hard

Special functions

As a technical computing language, Julia provides a huge number of special functions, both in Base as well as packages such as StatsFuns.jl. At the moment, many of these are implemented in external libraries such as Rmath and openspecfun. This project would involve implementing these functions in native Julia (possibly utilising the work in SpecialFunctions.jl), seeking out opportunities for possible improvements along the way, such as supporting Float32 and BigFloat, exploiting fused multiply-add operations, and improving errors and boundary cases.

Recommended Skills: A strong understanding of calculus.

Expected Results: New and faster methods for evaluating properties of special functions.

Mentors: Steven Johnson, Oscar Smith. Ask on Discourse or on slack

A Julia-native CCSA optimization algorithm

The CCSA algorithm by Svanberg (2001) is a nonlinear programming algorithm widely used in topology optimization and for other large-scale optimization problems: it is a robust algorithm that can handle arbitrary nonlinear inequality constraints and huge numbers of degrees of freedom. Moreover, the relative simplicity of the algorithm makes it possible to easily incorporate sparsity in the Jacobian matrix (for handling huge numbers of constraints), approximate-Hessian preconditioners, and as special-case optimizations for affine terms in the objective or constraints. However, currently it is only available in Julia via the NLopt.jl interface to an external C implementation, which greatly limits its flexibility.

Recommended Skills: Experience with nonlinear optimization algorithms and understanding of Lagrange duality, familiarity with sparse matrices and other Julia data structures.

Expected Results: A package implementing a native-Julia CCSA algorithm.

Mentors: Steven Johnson.

Event-chain Monte Carlo methods – Summer of Code

Massive parallel factorized bouncy particle sampler

At JuliaCon 2021 a new sampler Monte Carlo method (for example as sampling algorithm for the posterior in Bayesian inference) was introduced [1]. The method exploits the factorization structure to sample a single continuous time Markov chain targeting a joint distribution in parallel. In contrast to parallel Gibbs sampling in the method at no time a subset of coordinates is kept fixed. In Gibbs sampling keeping a subset fixed is the main device to achieve massive parallelism: given a separating set of coordinates, the conditional posterior factorizes into independent subproblems. In the presented method, a particle representing a parameter vector sampled from the posterior never ceases to move, and it is only the decisions about changes of the direction of the movement which happen in parallel on subsets of coordinates.

There are already two implementations available which make use of Julias multithreading capabilities. Starting from that, the contributor implements a version of the algorithm using GPU computing techniques as the methods is are suitable for these approaches.

Expected Results: Implement massive parallel factorized bouncy particle sampler [1,2] using GPU computing.

Recommended Skills: GPU computing, Markov processes, Bayesian inference.

Mentors: Moritz Schauer

Rating: Hard, 350 hours

[1] Moritz Schauer: ZigZagBoomerang.jl - parallel inference and variable selection. JuliaCon 2021 contribution [https://pretalx.com/juliacon2021/talk/LUVWJZ/], Youtube: [https://www.youtube.com/watch?v=wJAjP_I1BnQ], 2021.

[2] Joris Bierkens, Paul Fearnhead, Gareth Roberts: The Zig-Zag Process and Super-Efficient Sampling for Bayesian Analysis of Big Data. The Annals of Statistics, 2019, 47. Vol., Nr. 3, pp. 1288-1320. [https://arxiv.org/abs/1607.03188].

Pluto.jl projects

Unfortunately we won't have time to mentor this year.  Check back next year!

Pythia – Summer of Code

Machine Learning Time Series Regression

Pythia is a package for scalable machine learning time series forecasting and nowcasting in Julia.

The project mentors are Andrii Babii and Sebastian Vollmer.

Machine learning for nowcasting and forecasting

This project involves developing scalable machine learning time series regressions for nowcasting and forecasting. Nowcasting in economics is the prediction of the present, the very near future, and the very recent past state of an economic indicator. The term is a contraction of "now" and "forecasting" and originates in meteorology.

The objective of this project is to introduce scalable regression-based nowcasting and forecasting methodologies that demonstrated the empirical success in data-rich environment recently. Examples of existing popular packages for regression-based nowcasting on other platforms include the "MIDAS Matlab Toolbox", as well as the 'midasr' and 'midasml' packages in R. The starting point for this project is porting the 'midasml' package from R to Julia. Currently Pythia has the sparse-group LASSO regression functionality for forecasting.

The following functions are of interest: in-sample and out-of sample forecasts/nowcasts, regularized MIDAS with Legendre polynomials, visualization of nowcasts, AIC/BIC and time series cross-validation tuning, forecast evaluation, pooled and fixed effects panel data regressions for forecasting and nowcasting, HAC-based inference for sparse-group LASSO, high-dimensional Granger causality tests. Other widely used existing functions from R/Python/Matlab are also of interest.

Recommended skills: Graduate-level knowledge of time series analysis, machine learning, and optimization is helpful.

Expected output: The contributor is expected to produce code, documentation, visualization, and real-data examples.

References: Contact project mentors for references.

Time series forecasting at scales

Modern business applications often involve forecasting hundreds of thousands of time series. Producing such a gigantic number of reliable and high-quality forecasts is computationally challenging, which limits the scope of potential methods that can be used in practice, see, e.g., the 'forecast', 'fable', or 'prophet' packages in R. Currently, Julia lacks the scalable time series forecasting functionality and this project aims to develop the automated data-driven and scalable time series forecasting methods.

The following functionality is of interest: forecasting intermittent demand (Croston, adjusted Croston, INARMA), scalable seasonal ARIMA with covariates, loss-based forecasting (gradient boosting), unsupervised time series clustering, forecast combinations, unit root tests (ADF, KPSS). Other widely used existing functions from R/Python/Matlab are also of interest.

Recommended skills: Graduate-level knowledge of time series analysis is helpful.

Expected output: The contributor is expected to produce code, documentation, visualization, and real-data examples.

References: Contact project mentors for references.

Tools for simulation of Quantum Clifford Circuits

Clifford circuits are a class of quantum circuits that can be simulated efficiently on a classical computer. As such, they do not provide the computational advantage expected of universal quantum computers. Nonetheless, they are extremely important, as they underpin most techniques for quantum error correction and quantum networking. Software that efficiently simulates such circuits, at the scale of thousands or more qubits, is essential to the design of quantum hardware. The QuantumClifford.jl Julia project enables such simulations.

GPU accelerated simulator of Clifford Circuits.

Simulation of Clifford circuits involves significant amounts of linear algebra with boolean matrices. This enables the use of many standard computation accelerators like GPUs, as long as these accelerators support bit-wise operations. The main complications is that the elements of the matrices under consideration are usually packed in order to increase performance and lower memory usage, i.e. a vector of 64 elements would be stored as a single 64 bit integer instead of as an array of 64 bools. A Summer of Code project could consist of implement the aforementioned linear algebra operations in GPU kernels, and then seamlessly integrating them in the rest of the QuantumClifford library. At a minimum that would include Pauli-Pauli products and certain small Clifford operators, but could extend to general stabilizer tableau multiplication and even tableau diagonalization. Some of these features are already implemented, but significant polish and further improvements and implementation of missing features is needed.

Recommended skills: Basic knowledge of the stabilizer formalism used for simulating Clifford circuits. Familiarity with performance profiling tools in Julia and Julia's GPU stack, including KernelAbstractions and Tullio.

Mentors: Stefan Krastanov <stefan@krastanov.org> and QuantumClifford.jl team members

Expected duration: 175 hours (but applicants can scope it to a longer project by including work on GPU-accelerated Gaussian elimination used in the canonicalization routines)

Difficulty: Medium if the applicant is familiar with Julia, even without understanding of Quantum Information Science (but applicants can scope it to "hard" by including the aforementioned additional topics)

A Zoo of Quantum Error Correcting codes and/or decoders

Quantum Error Correcting codes are typically represented in a form similar to the parity check matrix of a classical code. This form is referred to as a Stabilizer tableaux. This project would involve creating a comprehensive library of frequently used quantum error correcting codes and/or implementing syndrome-decoding algorithms for such codes. The library already includes some simple codes and interfaces to a few decoders – adding another small code or providing a small documentation pull request could be a good way to prove competence when applying for this project. The project can be extended to a much longer one if work on decoders is included. A large part of this project would involve literature surveys. Some suggestions for codes to include: color codes, higher-dimensional topological codes, hyper graph product codes, twists in codes, newer LDPC codes, honeycomb codes, Floquet codes. Some suggestions for decoders to work on: iterative, small-set flip, ordered statistical decoding, belief propagation, neural belief propagation.

Recommended skills: Knowledge of the stabilizer formalism used for simulating Clifford circuits. Familiarity with tools like python's ldpc, pymatching, and stim can help. Consider checking out the PyQDecoders.jl julia wrapper package as well.

Mentors: Stefan Krastanov <stefan@krastanov.org> and QuantumClifford.jl team members

Expected duration: 175 hours (but applicants can scope it as longer, depending on the list of functionality they plan to implement)

Difficulty: Medium. Easy with some basic knowledge of quantum error correction

Left/Right multiplications with small gates.

Applying an n-qubit Clifford gate to an n-qubit state (tableaux) is an operation similar to matrix multiplication, requiring O(n^3) steps. However, applying a single-qubit or two-qubit gate to an n-qubit tableaux is much faster as it needs to address only one or two columns of the tableaux. This project would focus on extending the left-multiplication special cases already started in symbolic_cliffords.jl and creating additional right-multiplication special cases (for which the Stim library is a good reference).

Recommended skills: Knowledge of the stabilizer formalism used for simulating Clifford circuits. Familiarity with performance profiling tools in Julia. Understanding of C/C++ if you plan to use the Stim library as a reference.

Mentors: Stefan Krastanov <stefan@krastanov.org> and QuantumClifford.jl team members

Expected duration: 175 hours (but applicants can scope it as longer if they plan for other significant optimization and API design work)

Difficulty: Easy

Generation of Fault Tolerant ECC Circuits, Flag Qubit Circuits and more

The QuantumClifford library already has some support for generating different types of circuits related to error correction (mostly in terms of syndrome measurement circuits like Shor's) and for evaluating the quality of error correcting codes and decoders. Significant improvement can be made by implementing more modern compilation schemes, especially ones relying on flag qubits.

Recommended skills: Knowledge of the variety of flag qubit methods. Some useful references could be a, b, c, and this video lecture.

Mentors: Stefan Krastanov <stefan@krastanov.org> and QuantumClifford.jl team members

Expected duration: 175 hours (but applicants can scope it as longer if they plan more extensive work)

Difficulty: Hard

Measurement-Based Quantum Computing (MBQC) compiler

The MBQC model of quantum computation has a lot of overlap with the study of Stabilizer states. This project would be about the creation of an MBQC compiler and potentially simulator in Julia. E.g. if one is given an arbitrary graph state and a circuit, how is this circuit to be compiled in an MBQC model.

Recommended skills: Knowledge of the MBQC model of quantum computation. This paper and the related python library can be a useful reference. Consider also this reference.

Mentors: Stefan Krastanov <stefan@krastanov.org> and QuantumClifford.jl team members

Expected duration: 175 hours (but applicants can scope it as longer if they plan more extensive work)

Difficulty: Hard

Implementing a Graph State Simulator

The graph states formalism is a way to work more efficiently with stabilizer states that have a sparse tableaux. This project would involve creation of the necessary gate simulation algorithms and conversions tools between graph formalism and stabilizer formalism (some of which are already available in the library).

Recommended skills: Understanding of the graph formalism. This paper can be a useful reference.

Mentors: Stefan Krastanov <stefan@krastanov.org> and QuantumClifford.jl team members

Expected duration: 175 hours (but applicants can scope it as longer if they plan more extensive work)

Difficulty: Medium

Simulation of Slightly Non-Clifford Circuits and States

There are various techniques used to augment Clifford circuit simulators to model circuits that are only "mostly" Clifford. Particularly famous are the Clifford+T gate simulators. This project is about implementing such extensions.

Recommended skills: In-depth understanding of the Stabilizer formalism, and understanding of some of the extensions to that method. We have some initial implementations. This IBM paper can also be a useful reference for other methods.

Mentors: Stefan Krastanov <stefan@krastanov.org> and QuantumClifford.jl team members

Expected duration: 175 hours (but applicants can scope it as longer if they plan more extensive work)

Difficulty: Hard

Magic State Modeling - Distillation, Injection, Etc

Magic states are important non-stabilizer states that can be used for inducing non-Clifford gates in otherwise Clifford circuits. They are crucial for the creation of error-corrected universal circuits. This project would involve contributing tools for the analysis of such states and for the evaluation of distillation circuits and ECC circuits involving such states.

Recommended skills: In-depth understanding of the theory of magic states and their use in fault tolerance.

Mentors: Stefan Krastanov <stefan@krastanov.org> and QuantumClifford.jl team members

Expected duration: 175 hours (but applicants can scope it as longer if they plan more extensive work)

Difficulty: Hard

Quantum Optics and State Vector Modeling Tools

The most common way to represent and model quantum states is the state vector formalism (underlying Schroedinger's and Heisenberg's equations as well as many other master equations). The QuantumOptics.jl Julia project enables such simulations, utilizing much of the uniquely powerful DiffEq infrastructure in Julia.

GPU accelerated operators and ODE solvers

Much of the internal representation of quantum states in QuantumOptics.jl relies on standard dense arrays. Thanks to the multiple-dispatch nature of Julia, much of these objects can already work well with GPU arrays. This project would involve a thorough investigation and validation of the current interfaces to make sure they work well with GPU arrays. In particular, attention will have to be paid to the "lazy" operators as special kernels might need to be implemented for them.

Recommended skills: Familiarity with performance profiling tools in Julia and Julia's GPU stack, potentially including KernelAbstractions.

Mentors: Stefan Krastanov <stefan@krastanov.org> and QuantumOptics.jl team members

Expected duration: 175 hours (but applicants can scope it as longer if they plan more extensive work)

Difficulty: Medium

Autodifferentiation

Autodifferentiation is the capability of automatically generating efficient code to evaluate the numerical derivative of a given Julia function. Similarly to the GPU case above, much of this functionality already "magically" works, but there is no detailed test suite for it and no validation has been done. This project would involve implementing, validating, and testing the use of Julia autodiff tools in QuantumOptics.jl. ForwardDiff, Enzyme, Zygote, Diffractor, and AbstractDifferentiation are all tools that should have some level of validation and support, both in ODE solving and in simple operator applications.

Recommended skills: Familiarity with the Julia autodiff stack and the SciML sensitivity analysis tooling. Familiarity with the difficulties to autodiff complex numbers (in general and specifically in Julia). Understanding of the AbstractDifferentiation.jl package.

Mentors: Stefan Krastanov <stefan@krastanov.org> and QuantumOptics.jl team members

Expected duration: 175 hours (but applicants can scope it as longer if they plan more extensive work)

Difficulty: Easy-to-Medium

Closer Integration with the SciML Ecosystem

SciML is the umbrella organization for much of the base numerical software development in the Julia ecosystem. We already use many of their capabilities, but it would be beneficial to more closely match the interfaces they expect. This project would be heavily on the software engineering side. Formal and informal interfaces we want to support include: better support for DiffEq problem types (currently we wrap DiffEq problems in our own infrastructure and it is difficult to reuse them in SciML); better support for broadcast operations over state objects (so that we can treat them closer to normal arrays and we can simply provide an initial state to a DiffEq solver without having to wrap/unwrap the data); relying more heavily on SciMLOperators which have significant overlap with our lazy operators.

Recommended skills: Familiarity with the SciML stack.

Mentors: Stefan Krastanov <stefan@krastanov.org> and QuantumOptics.jl team members

Expected duration: 175 hours (but applicants can scope it as longer if they plan more extensive work)

Difficulty: Easy

Symbolic computation project ideas

Efficient Tensor Differentiation

Implement the D* algorithm for tensor expressions.

Recommended Skills: High school/freshman calculus and basic graph theory (optional)

Expected Results: A working implementation of the D* algorithm that is capable of performing efficient differentiations on tensor expressions.

Mentors: Yingbo Ma

Duration: 350 hours

Symbolic root finding

Symbolics.jl have robust ways to convert symbolic expressions into multi-variate polynomials. There is now a robust Groebner basis implementation in (Groebner.jl). Finding roots and varieties of sets of polynomials would be extremely useful in many applications. This project would involve implementing various techniques for solving polynomial systems, and where possible other non-linear equation systems. A good proposal should try to enumerate a number of techniques that are worth implementing, for example:

Analytical solutions for polynomial systems of degree <= 4
Use of HomotopyContinuations.jl for testing for solvability and finding numerical solutions
Newton-raphson methods
Using Groebner basis computations to find varieties

The API for these features should be extremely user-friendly:

A single roots function should take the sets of equations and result in the right type of roots as output (either varieties or numerical answers)
It should automatically find the fastest strategy to solve the set of equations and apply it.
It should fail with descriptive error messages when equations are not solvable, or degenerate in some way.
This should allow implementing symbolic eigenvalue computation when eigs is called.

Mentors: Shashi Gowda, Alexander Demin Duration: 350 hours

Symbolic Integration in Symbolics.jl

Implement the heuristic approach to symbolic integration. Then hook into a repository of rules such as RUMI. See also the potential of using symbolic-numeric integration techniques (https://github.com/SciML/SymbolicNumericIntegration.jl)

Recommended Skills: High school/Freshman Calculus

Expected Results: A working implementation of symbolic integration in the Symbolics.jl library, along with documentation and tutorials demonstrating its use in scientific disciplines.

Mentors: Shashi Gowda, Yingbo Ma

Duration: 350 hours

XLA-style optimization from symbolic tracing

Julia functions that take arrays and output arrays or scalars can be traced using Symbolics.jl variables to produce a trace of operations. This output can be optimized to use fused operations or call highly specific NNLib functions. In this project you will trace through Flux.jl neural-network functions and apply optimizations on the resultant symbolic expressions. This can be mostly implemented as rule-based rewriting rules (see https://github.com/JuliaSymbolics/Symbolics.jl/pull/514).

Recommended Skills: Knowledge of space and time complexities of array operations, experience in optimizing array code.

Mentors: Shashi Gowda

Duration: 175 hours

Automatically improving floating point accuracy (Herbie)

Herbie documents a way to optimize floating point functions so as to reduce instruction count while reorganizing operations such that floating point inaccuracies do not get magnified. It would be a great addition to have this written in Julia and have it work on Symbolics.jl expressions. An ideal implementation would use the e-graph facilities of Metatheory.jl to implement this.

Mentors: Shashi Gowda, Alessandro Cheli

Duration: 350 hours

Tabular Data – Summer of Code

Parquet.jl enhancements

Difficulty: Medium

Duration: 175 hours

Apache Parquet is a binary data format for tabular data. It has features for compression and memory-mapping of datasets on disk. A decent implementation of Parquet in Julia is likely to be highly performant. It will be useful as a standard format for distributing tabular data in a binary format. There exists a Parquet.jl package that has a Parquet reader and a writer. It currently conforms to the Julia Tabular file IO interface at a very basic level. It needs more work to add support for critical elements that would make Parquet.jl usable for fast large scale parallel data processing. Each of these goals can be targeted as a single, short duration (175 hrs) project.

Lazy loading and support for out-of-core processing, with Arrow.jl and Tables.jl integration. Improved usability and performance of Parquet reader and writer for large files.
Reading from and writing data on to cloud data stores, including support for partitioned data.
Support for missing data types and encodings making the Julia implementation fully featured.

Resources:

The Parquet file format (also are many articles and talks on the Parquet storage format on the internet)
A tour of the data ecosystem in Julia
Tables.jl
Arrow.jl

Recommended skills: Good knowledge of Julia language, Julia data stack and writing performant Julia code.

Expected Results: Depends on the specific projects we would agree on.

Mentors: Tanmay Mohapatra

DataFrames.jl join enhancements

Difficulty: Hard

Duration: 175 hours

DataFrames.jl is one of the more popular implementations of tabular data type for Julia. One of the features it supports is data frame joining. However, more work is needed to improve this functionality. The specific targets for this project are (a final list of targets included in the scope of the project can be decided later).

fully implement multi-threading support by joins, reduce memory requirements of used join algorithms (which should additionally improve their performance), verify efficiency of alternative joining strategies in comparison to those currently used and implement them along with adaptive algorithm choosing the right joining strategy depending on the passed data;
implement join allowing for efficient matching on non-equal keys; special attention should be made to matching on keys that are date/time and spatial objects;
implement join allowing for an in-place update of columns of one data frame by values stored in another data frame based on matching key and condition specifying when an update should be performed;
implement an more flexible mechanizm than currently available allowing to define output data frame column names when performing a join.

Resources:

Recommended skills: Good knowledge of Julia language, Julia data stack and writing performant multi-threaded Julia code. Experience with benchmarking code and writing tests. Knowledge of join algorithms (as e.g. used in databases like DuckDB or other tabular data manipulation ecosystems e.g. Polars or data.table).

Expected Results: Depends on the specific projects we would agree on.

Mentors: Bogumił Kamiński

Taija Projects

Taija is an organization that hosts software geared towards Trustworthy Artificial Intelligence in Julia. Taija currently covers a range of approaches towards making AI systems more trustworthy:

Model Explainability (CounterfactualExplanations.jl)
Algorithmic Recourse (CounterfactualExplanations.jl, AlgorithmicRecourseDynamics.jl)
Predictive Uncertainty Quantification (ConformalPrediction.jl, LaplaceRedux.jl)
Effortless Bayesian Deep Learning (LaplaceRedux.jl)
Hybrid Learning (JointEnergyModels.jl)

Various meta packages can be used to extend the core functionality:

Plotting (TaijaPlotting.jl)
Datasets for testing and benchmarking (TaijaData.jl)
Interoperability with other programming languages (TaijaInteroperability.jl)

There is a high overlap with organizations, you might be also interested in:

Projects with MLJ.jl - For more traditional machine learning projects
Projects with FluxML - For projects around Flux.jl, the backbone of Julia's deep learning ecosystem

Project 1: Conformal Prediction meets Bayes (Predictive Uncertainty)

Project Overview: ConformalPrediction.jl is a package for Predictive Uncertainty Quantification through Conformal Prediction for Machine Learning models trained in MLJ. This project aims to enhance ConformalPrediction.jl by adding support for Conformal(ized) Bayes.

Mentor: Patrick Altmeyer and/or Mojtaba Farmanbar

Project Difficulty: Medium

Estimated Duration: 175 hours

Ideal Candidate Profile:

Basic knowledge of Julia or strong knowledge of similar programming languages (R, Python, MATLAB, ...)
Good understanding of Bayesian methods
Basic knowledge of Conformal Prediction

Project Goals and Deliverables:

Implement support for conformalizing predictive distributions (#109)
Implement support for Conformal Bayes through Add-One-In Importance Sampling (#110)
Implement other recent approaches combining Bayes with Conformal Prediction that you find interesting
Comprehensively test and document your work

Project 2: Counterfactual Regression (Model Explainability)

Project Overview: CounterfactualExplanations.jl is a package for Counterfactual Explanations and Algorithmic Recourse in Julia. This project aims to extend the package functionality to regression models.

Mentor: Patrick Altmeyer

Project Difficulty: Hard

Estimated Duration: 350 hours

Ideal Candidate Profile:

Experience with Julia and multiple dispatch of advantage, but not crucial
Good knowledge of machine learning and statistics
Solid understanding of supervised models (classification and regression)

Project Goals and Deliverables:

Carefully think about architecture choices: how can we fit support for regression models into the existing code base?
Add support for the following approaches: ad-hoc thresholding, Bayesian optimisation, information-theoretic saliency.
Comprehensively test and document your work

Project 3: Counterfactuals for LLMs (Model Explainability and Generative AI)

Mentor: Patrick Altmeyer (Taija) and Jan Siml (JuliaGenAI)

Project Difficulty: Medium

Estimated Duration: 175 hours

Ideal Candidate Profile:

Experience with Julia and multiple dispatch of advantage, but not crucial
Good knowledge of machine learning and statistics
Good understanding of Large Language Models (LLMs)
Ideally previous experience with Transformers.jl

Project Goals and Deliverables:

Carefully think about architecture choices: how can we fit support for LLMs into the existing code base of CounterfactualExplanations.jl?
Implement current state-of-the-art approaches such as MiCE and CORE
Comprehensively test and document your work

Project 4: From Counterfactuals to Interventions (Recourse through Minimal Causal Interventions)

Project Overview: This extension aims to enhance the CounterfactualExplanations.jl package by incorporating a module for generating actionable recourse through minimal causal interventions.

Mentor: Patrick Altmeyer (Taija) and Moritz Schauer (CausalInference.jl)

Project Difficulty: Hard

Estimated Duration: 350 hours

Ideal Candidate Profile:

Experience with Julia
Background in causality and familiarity with counterfactual reasoning.
Basic knowledge of minimal interventions and causal graph building.

Project Goals and Deliverables:

Carefully think about architecture choices: how can we fit support for causal interventions into the existing code base?
Develop code that could integrate causal graph building with other Julia libs such as Graphs.jl, GraphPlot.jl and CausalInference.jl.
Implement current state-of-the-art approaches for minimal interventions using structured causal models (SCMs).
Comprehensively test and document your work.

About Us

Patrick Altmeyer is a PhD Candidate in Trustworthy Artificial Intelligence at Delft University of Technology working on the intersection of Computer Science and Finance. He has presented work related to Taija at JuliaCon 2022 and 2023. In the past year, Patrick has mentored multiple groups of students at Delft University of Technology who have made major contributions to Taija.

How to Contact Us

We'd love to hear your ideas and discuss potential projects with you.

Probably the easiest way is to join our JuliaLang Slack and join the #taija channel. You can also post a GitHub Issue on our organization repo.

TopOpt Projects – Summer of Code

TopOpt.jl is a topology optimization package written in pure Julia. Topology optimization is an exciting field at the intersection of shape representation, physics simulations and mathematical optimization, and the Julia language is a great fit for this field. To learn more about TopOpt.jl, check the following JuliaCon talk.

The following is a tentative list of projects in topology optimization that you could be working on in the coming Julia Season of Contributions or Google Summer of Code. If you are interested in exploring any of these topics or if you have other interests related to topology optimization, please reach out to the main mentor Mohamed Tarek via email.

Testing and benchmarking of TopOpt.jl

Project difficulty: Medium

Work load: 350 hours

Description: The goal of this project is to improve the unit test coverage and reliability of TopOpt.jl by testing its implementations against other software's outputs. Testing and benchmarking stress and buckling constraints and their derivatives will be the main focus of this project. Matlab scripts from papers may have to be translated to Julia for correctness and performance comparison.

Knowledge prerequisites: structural mechanics, optimization, Julia programming

Machine learning in topology optimization

Project difficulty: Medium

Work load: 350 hours

Description: There are numerous ways to use machine learning for design optimization in topology optimization. The following are all recent papers with applications of neural networks and machine learning in topology optimization. There are also exciting research opportunities in this direction.

In this project you will implement one of the algorithms discussed in any of these papers.

Knowledge prerequisites: neural networks, optimization, Julia programming

Optimization on a uniform rectilinear grid

Project difficulty: Medium

Work load: 350 hours

Description: Currently in TopOpt.jl, there are only unstructured meshes supported. This is a very flexible type of mesh but it's not as memory efficient as uniform rectilinear grids where all the elements are assumed to have the same shape. This is the most common grid used in topology optimization in practice. Currently in TopOpt.jl, the uniform rectilinear grid will be stored as an unstructured mesh which is unnecessarily inefficient. In this project, you will optimize the finite element analysis and topology optimization codes in TopOpt.jl for uniform rectilinear grids.

Knowledge prerequisites: knowledge of mesh types, Julia programming

Adaptive mesh refinement for topology optimization

Project difficulty: Medium

Work load: 350 hours

Description: Topology optimization problems with more mesh elements take longer to simulate and to optimize. In this project, you will explore the use of adaptive mesh refinement starting from a coarse mesh, optimizing and only refining the elements that need further optimization. This is an effective way to accelerate topology optimization algorithms.

Knowledge prerequisites: adaptive mesh refinement, Julia programming

Heat transfer design optimization

Project difficulty: Medium

Work load: 175 or 350 hours

Description: All of the examples in TopOpt.jl and problem types are currently of the linear elasticity, quasi-static class of problems. The goal of this project is to implement more problem types and examples from the field of heat transfer. Both steady-state heat transfer problems and linear elasticity problems make use of elliptic partial differential equations so the code from linear elasticity problems should be largely reusable for heat transfer problems with minimum changes.

Knowledge prerequisites: finite element analysis, heat equation, Julia programming

Modern computational fluid dynamics with Trixi.jl

Trixi.jl is a Julia package for adaptive high-order numerical simulations of conservation laws. It is designed to be simple to use for students and researchers, extensible for research and teaching, as well as efficient and suitable for high-performance computing.

Compiler-based automatic differentiation with Enzyme.jl

Difficulty: Medium (up to hard, depending on the chosen subtasks)

Project size: 175 hours or 350 hours, depending on the chosen subtasks

Enzyme.jl is the Julia frontend of Enzyme, a modern automatic differentiation (AD) framework working at the level of LLVM code. It can provide fast forward and reverse mode AD and - unlike some other AD packages - supports mutating operations. Since Trixi.jl relies on mutating operations and caches for performance, this feature is crucial to obtain an implementation that works efficiently for both simulation runs and AD.

The overall goal of this project is to create a working prototype of Trixi.jl (or a subset thereof) using Enzyme.jl for AD, and to support as many of Trixi's advanced features as possible, such as adaptive mesh refinement, shock capturing etc.

Possible subtasks in this project include

Explore and implement forward/backward mode AD via Enzyme.jl for a simplified simulation for the 1D advection equation or the 1D compressible Euler equations (e.g., compute the Jacobian of the right-hand side evaluation Trixi.rhs! on a simple mesh in serial execution)
Explore and implement forward mode AD via Enzyme.jl of semidiscretizations provided by Trixi.jl, mimicking the functionality that is already available via ForwardDiff.jl
Explore and implement reverse mode AD via Enzyme.jl of semidiscretizations provided by Trixi.jl as required for modern machine learning tasks
Explore and implement AD via Enzyme.jl of full simulations combining semidiscretizations of Trixi.jl with time integration methods of OrdinaryDiffEq.jl

Related subtasks in this project not related directly to Enzyme.jl but using other packages include

Explore and implement means to improve the current handling of caches to simplify AD and differentiable programming with semidiscretizations of Trixi.jl in general, e.g., via PreallocationTools.jl.
Extend the current AD support based on ForwardDiff.jl to other functionality of Trixi.jl, e.g., shock capturing discretizations, MPI parallel simulations, and other features currently not supported

This project is good for both software engineers interested in the fields of numerical analysis and scientific machine learning as well as those students who are interested in pursuing graduate research in the field.

Recommended skills: Good knowledge of at least one numerical discretization scheme (e.g., finite volume, discontinuous Galerkin, finite differences); initial knowledge in automatic differentiation; preferably the ability (or eagerness to learn) to write fast code

Expected results: Contributions to state of the art and production-quality automatic differentiation tools for Trixi.jl

Mentors: Hendrik Ranocha, Michael Schlottke-Lakemper

Advanced visualization and in-situ visualization with ParaView

Difficulty: Medium

Project size: 175 hours or 350 hours, depending on the chosen subtasks

Visualizing and documenting results is a crucial part of the scientific process. In Trixi.jl, we rely for visualization on a combination of pure Julia packages (such as Plots.jl and Makie.jl) and the open-source scientific visualization suite ParaView. While the Julia solutions are excellent for visualizing 1D and 2D data, ParaView is the first choice for creating publication-quality figures from 3D data.

Currently, visualization with ParaView is only possible after a simulation is finished and requires an additional postprocessing step, where the native output files of Trixi.jl are converted to VTK files using Trixi2Vtk.jl. This extra step makes it somewhat inconvenient to use, especially when the current state of a numerical solution is to be checked during a long, multi-hour simulation run.

The goal of this project is therefore to make such visualizations easier by introducing two significant improvements:

Add the capability to write out native VTKHDF files directly during a simulation, in serial and parallel.
Enable parallel in-situ visualization of the results, i.e., to visualize results by connecting ParaView to a currently running, parallel Trixi.jl simulation using the Catalyst API.

Both tasks are related in that they require the student to familiarize themselves with both the data formats internally used in Trixi.jl as well as the visualization pipelines of VTK/ParaView. However, they can be performed independently and thus this project is suitable for either a 175 hour or a 350 hour commitment, depending on whether one or both tasks are to be tackled.

This project is good for both software engineers interested in the fields of visualization and scientific data analysis as well as those students who are interested in pursuing graduate research in the field of numerical analysis and high-performance computing.

Recommended skills: Some knowledge of at least one numerical discretization scheme (e.g., finite volume, discontinuous Galerkin, finite differences) is helpful; initial knowledge about visualization or parallel processing; preferably the ability (or eagerness to learn) to write fast code.

Expected results: Scalable, production quality visualization of scientific results for Trixi.jl.

Mentors: Michael Schlottke-Lakemper, Benedict Geihe, Johannes Markert

Turing Projects - Summer of Code

Turing is a universal probabilistic programming language embedded in Julia. Turing allows the user to write models in standard Julia syntax, and provide a wide range of sampling-based inference methods for solving problems across probabilistic machine learning, Bayesian statistics and data science etc. Since Turing is implemented in pure Julia code, its compiler and inference methods are amenable to hacking: new model families and inference methods can be easily added.

Below is a list of ideas for potential projects, though you are welcome to propose your own to the Turing team. If you are interested in exploring any of these projects, please reach out to the listed project mentors or Xianda Sun (at xs307[at]cam.ac.uk). You can find their contact information here.

Implementing models from PosteriorDB in Turing / Julia

Mentors: Seth Axen, Tor Fjelde, Kai Xu, Hong Ge

Project difficulty: Medium

Project length: 175 hrs or 350 hrs

Description: posteriordb is a database of 120 diverse Bayesian models implemented in Stan (with 1 example model in PyMC) with reference posterior draws, data, and metadata. For performance comparison and for showcasing best practices in Turing, it is useful to have Turing implementations of these models. The goal of this project is to implement a large subset of these models in Turing/Julia.

For each model, we consider the following tasks: Correctness test: when reference posterior draws and sampler configuration are available in posteriordb, correctness of the implementation and consistency can be tested by sampling the model with the same configuration and comparing the samples to the reference draws. Best practices: all models must be checked to be differentiable with all Turing-supported AD frameworks.

Improving the integration between Turing and Turing’s MCMC inference packages

Mentors: Tor Fjelde, Jaime Ruiz Zapatero, Cameron Pfiffer, David Widmann

Project difficulty: Easy

Project length: 175 hrs

Description: Most samplers in Turing.jl implements the AbstractMCMC.jl interface, allowing a unified way for the user to interact with the samplers. The interface of AbstractMCMC.jl is currently very bare-bones and does not lend itself nicely to interoperability between samplers.

For example, it’s completely valid to compose to MCMC kernels, e.g. taking one step using the RWMH from AdvancedMH.jl, followed by taking one step using NUTS from AdvancedHMC.jl. Unfortunately, implementing such a composition requires explicitly defining conversions between the state returned from RWMH and the state returned from NUTS, and conversion of state from NUTS to state of RWMH. Doing this for one such sampler-pair is generally very easy to do, but once you have to do this for N samplers, suddenly the amount of work needed to be done becomes insurmountable.

One way to deal alleviate this issue would be to add a simple interface for interacting with the states of the samplers, e.g. a method for getting the current values in the state, a method for setting the current values in the state, in addition to a set of glue-methods which can be overridden in the specific case where more information can be shared between the states.

As an example of some ongoing work that attempts to take a step in this direction is: https://github.com/TuringLang/AbstractMCMC.jl/pull/86

GPU support for NormalizingFlows.jl and Bijectors.jl

Mentors: Tor Fjelde, Tim Hargreaves, Xianda Sun, Kai Xu, Hong Ge

Project difficulty: Hard

Project length: 175 hrs or 350 hrs

Description: Bijectors.jl, a package that facilitates transformations of distributions within Turing.jl, currently lacks full GPU compatibility. This limitation stems partly from the implementation details of certain bijectors and also from how some distributions are implemented in the Distributions.jl package. NormalizingFlows.jl, a newer addition to the Turing.jl ecosystem built atop Bijectors.jl, offers a user-friendly interface and utility functions for training normalizing flows but shares the same GPU compatibility issues.

The aim of this project is to enhance GPU support for both Bijectors.jl and NormalizingFlows.jl.

Batched support for NormalizingFlows.jl and Bijectors.jl

Mentors: Tor Fjelde, Xianda Sun, David Widmann, Hong Ge

Project difficulty: Medium

Project length: 350 hrs

Description: This project aims to introduce a batched mode to Bijectors.jl and NormalizingFlows.jl, which are built on top of Bijectors.jl.

Put simply, we want to enable users to provide multiple inputs to the model simultaneously by “stacking” the parameters into a higher-dimensional array.

The implementation can take various forms, as a team of developers who care about both performance and user experience, we are open to different approaches and discussions. One possible approach is to develop a mechanism that signals the code to process the given input as a batch rather than as individual entries. A preliminary implementation can be found here.

Targets for Benchmarking Samplers with vectorization, GPU and high-order derivative supports

Mentors: Kai Xu, Hong Ge

Project difficulty: Medium

Project length: 175 hrs

Description: The project aims to develop a comprehensive collection of target distributions designed to study and benchmark Markov Chain Monte Carlo (MCMC) samplers in various computational environments. This collection will be an extension and enhancement of the existing Julia package, VecTargets.jl, which currently offers limited support for vectorization, GPU acceleration, and high-order derivatives. The main objectives of this project include:

Ensuring that the target distributions fully support vectorization and GPU acceleration
Making high-order derivatives (up to 3rd order) seamlessly integrable with the target distributions
Creating a clear and comprehensive documentation that outlines the capabilities and limitations of the project, including explicit details on cases where vectorization, GPU acceleration, or high-order derivatives are not supported.
Investigating and documenting how different Automatic Differentiation (AD) packages available in Julia can be combined or utilized to achieve efficient and accurate computation of high-order derivatives.

By achieving these goals, the project aims to offer a robust framework that can significantly contribute to the research and development of more efficient and powerful MCMC samplers, thereby advancing the field of computational statistics and machine learning.

VS Code projects

VS Code extension

We are generally looking for folks that want to help with the Julia VS Code extension. We have a long list of open issues, and some of them amount to significant projects.

Required Skills: TypeScript, Julia, web development.

Expected Results: Depends on the specific projects we would agree on.

Mentors: David Anthoff

Package installation UI

The VSCode extension for Julia could provide a simple way to browse available packages and view what's installed on a users system. To start with, this project could simply provide a GUI that reads in package data from a Project.toml/Manifest.toml and show some UI elements to add/remove/manage those packages.

This could also be extended by having metadata about the package, such as a readme, github stars, activity and so on (somewhat similar to the VSCode-native extension explorer).

Expected Results: A UI in VSCode for package operations.

Recommended Skills: Familiarity with TypeScript and Julia development.

Mentors: Sebastian Pfitzner

Also take a look at Pluto - VS Code integration!

Web Platform Projects – Summer of Code

Julia has early support for targeting WebAssembly and running in the web browser. Please note that this is a rapidly moving area (see the project repository for a more detailed overview), so if you are interested in this work, please make sure to inform yourself of the current state and talk to us to scope out an appropriate project. The below is intended as a set of possible starting points.

Mentor for these projects is Keno Fischer unless otherwise stated.

Code generation improvements and async ABI

Because Julia relies on an asynchronous task runtime and WebAssembly currently lacks native support for stack management, Julia needs to explicitly manage task stacks in the wasm heap and perform a compiler transformation to use this stack instead of the native WebAssembly stack. The overhead of this transformation directly impacts the performance of Julia on the wasm platform. Additionally, since all code Julia uses (including arbitrary C/C++ libraries) must be compiled using this transformation, it needs to cover a wide variety of inputs and be coordinated with other users having similar needs (e.g. the Pyodide project to run python on the web). The project would aim to improve the quality, robustness and flexibility of this transformation.

Recommended Skills: Experience with LLVM.

Wasm threading

WebAssembly is in the process of standardizing threads. Simultaneously, work is ongoing to introduce a new threading runtime in Julia (see #22631 and replated PRs). This project would investigate enabling threading support for Julia on the WebAssembly platform, implementing runtime parallel primitives on the web assembly platform and ensuring that high level threading constructs are correctly mapped to the underlying platform. Please note that both the WebAssembly and Julia threading infrastructure is still in active development and may continue to change over the duration of the project. An informed understanding of the state of these projects is a definite prerequisite for this project.

Recommended Skills: Experience with C and multi-threaded programming.

High performance, Low-level integration of js objects

WebAssembly is in the process of adding first class references to native objects to their specification. This capability should allow very high performance integration between julia and javascript objects. Since it is not possible to store references to javascript objects in regular memory, adding this capability will require several changes to the runtime system and code generation (possibly including at the LLVM level) in order to properly track these references and emit them either as direct references to as indirect references to the reference table.

Recommended Skills: Experience with C.

DOM Integration

While Julia now runs on the web platform, it is not yet a language that's suitable for first-class development of web applications. One of the biggest missing features is integration with and abstraction over more complicated javascript objects and APIs, in particular the DOM. Inspiration may be drawn from similar projects in Rust or other languages.

Recommended Skills: Experience with writing libraries in Julia, experience with JavaScript Web APIs.

Porting existing web-integration packages to the wasm platform

Several Julia libraries (e.g. WebIO.jl, Escher.jl) provide input and output capabilities for the web platform. Porting these libraries to run directly on the wasm platform would enable a number of existing UIs to automatically work on the web.

Recommended Skills: Experience with writing libraries in Julia.

Native dependencies for the web

The Julia project uses BinaryBuilder to provide binaries of native dependencies of julia packages. Experimental support exists to extend this support to the wasm platform, but few packages have been ported. This project would consist of attempting to port a significant fraction of the binary dependencies of the julia ecosystem to the web platform by improving the toolchain support in BinaryBuilder or (if necessary), porting upstream packages to fix assumptions not applicable on the wasm platform.

Recommended Skills: Experience with building native libraries in Unix environments.

Distributed computing with untrusted parties

The Distributed computing abstractions in Julia provide convenient abstraction for implementing programs that span many communicating Julia processes on different machines. However, the existing abstractions generally assume that all communicating processes are part of the same trust domain (e.g. they allow messages to execute arbitrary code on the remote). With some of the nodes potentially running in the web browser (or multiple browser nodes being part of the same distributed computing cluster via WebRPC), this assumption no longer holds true and new interfaces need to be designed to support multiple trust domains without overly restricting usability.

Recommended Skills: Experience with distributed computing and writing libraries in Julia.

Deployment

Currently supported use cases for Julia on the web platform are primarily geared towards providing interactive environments to support exploration of the full language. Of course, this leads to significantly larger binaries than would be required for using Julia as part of a production deployment. By disabling dynamic language features (e.g. eval) one could generate small binaries suitable for deployment. Some progress towards this exists in packages like PackageCompiler.jl, though significant work remains to be done.

Recommended Skills: Interest in or experience with Julia internals.