I can summarize my HPC experience as a non linear trip from problem formulation to efficient implementation on various platforms, with different algorithmic and computation paradigms in between. Keep in mind that there is always an effort (typical and/or personal) to switch for a moment to another fundamental approach or context application. At the price of the consequent step-by-step intellectual and time overhead, due to the need to move from an intermediate level to a more mature state for each, the trip reveals rewarding for me so far, since all these skills are indeed additive. By the way, although being a general consensus, learning and investigating really appear incremental to me.

Back to Homepage


  Journal Papers (Show abstracts)
  1. Claude Tadonki, and Bernard Philippe,
    Parallel multiplication of a vector by a Kronecker product of matrices,
    Parallel Distributed Computing Practices PDCP, volume 2(4), 1999.
    Abstract. We provide a parallel algorithm for the Kronecker product of matrices based on a cyclic partition of loops. Our general model unifies one scheme with redundant computations but no communication and an opposite scheme without redundant computation but with interprocessor communications. Both schema can be mixed with a certain balance influenced by the volume of floating point operations and the characteristics of the target architecture. This work has valuated experimental validations ({\small CENJU and INTEL PARAGON}).
  2. Claude Tadonki, and Bernard Philippe,
    Parallel multiplication of a vector by a Kronecker product of matrices (part II),
    Parallel Distributed Computing Practices PDCP, volume 3(3), 2000.
    Abstract. This paper presents a revised and generalized version of our previous parallel algorithm for the Kronecker product of matrices. We show how to map the computation on any number of processors that is a divisor of the product of the matrices sizes (instead of being a prefix of this product as previously). Moreover, we show that the minimum number of parallel communications with p processors is log(p) whatever the algorithm, and that our algorithm achieves this optimal performance. The work is validated by significant efficiencies obtained from experimental measures on the {\small CRAY} machine.
  3. Sanjay Rajopadhye, Tanguy Risset, et Claude Tadonki,
    Algebraic Path Problem on linear arrays,
    Techniques et Sciences Informatiques  TSI, 20 (5), 2001.
    Abstract. We seek a linear SPMD implementation of the Warshal algorithm for the Algebraic Path Problem (Unified model of the transitive closure, shortest paths, Gauss elimination, ...). Our parallel algorithm is systolic-like in its original version, and offers the important advantage of being able to run on a shorter number of processors by a natural round- robbing remapping. We derive and validate a blocked version for standard distributed memory parallel machines, as the cost of data communication would be severe otherwise. We show experimental validations on the {\small CRAY} machine.
  4. Claude Tadonki,
    A Recursive Method for Graph Scheduling,
    Scientific Annals of Cuza University, Vol 11, p.121-131, 2002
    Abstract. This paper presents a recursive graph scheduling method. Our paradigm applies on any acyclic graph that can be partitioned into isomorphic subgraphs. Indeed, many common problem domains are 3D graphs that can be organized as a chain of isomorphic 2D subgraphs. Starting from any valid schedule of the source subgraph of the chain, we use the generic isomorphism to systematically derive a valid global schedule, with a local pipeline between two consecutive subgraphs. Our technique is then characterized by the nature of the partition and the source schedule. We discuss the impact of these characteristics on the complexity on the generated parallel schedules.
  5. C. Beltran, C. Tadonki and J.-Ph. Vial,
    Solving the p-median problem with a semi-Lagrangian relaxation,
    Computational Optimization and Applications, Volume 35(2), October 2006. (pdf)
    Abstract. This paper deals with operation research and non differentiable optimization. The so-called P-median problem is the problem of locating P "facilities" relative to a set of "customers" such that the sum of the shortest demand weighted distance between "customers" and "facilities" is minimized. Indeed, this a classical combinatorial optimization problem with a huge set of potential solutions. Using a semi-Lagrangian relaxation, we tackle the problem in its associated continuous formulation and report our non-smooth convex optimization engineering results.
  6. F. Babonneau, C. Beltran, A. Haurie, C. Tadonki and J.-P. Vial,
    Proximal-ACCPM: a versatile oracle based optimization method,
    Computational and Management Science, Volume 9, 2007. (pdf)
    Abstract. Oracle Based Optimization (OBO) conveniently designates an approach to handle a class of convex optimization problems in which the information pertaining to the function to be minimized and/or to the feasible domain takes the form of a linear outer approximation revealed by an oracle. We show how difficult problems can be cast in this format, and then solved within our context. We present our method, so-called Proximal-ACCPM, to trigger the OBO approach and give a snapshot on numerical results. This paper summarizes several contributions with the OBO approach and aims to give, in a single report, enough information on the method and its implementation to facilitate new applications.
  7. C. Tadonki,
    Mathematical and Computational Engineering in X-Ray Crystallography,
    International Journal of Advanced Computer Engineering, volume 1(2) 2008.
    Abstract. The main purpose of X-Ray Crystallography is to predict a macromolecular structure using from X-Ray synchrotron radiation. Among existing paradigms to achieve this task, analytical approaches come up as good candidates for automation trough mathematical approximations and computational engineering. In addition to the later, statistical processing are required in order to refine the data according to the physical model and the boarding effects of the experiments. We revisit the basis of the problem and focus on a more precise effect, so-called radiation damage, for which it has been also proven that it can be artificially managed to become instructive.
  8. C. Tadonki,
    Integer Programming Heuristic for the Dual Power Setting Problem in Wireless Sensors Networks,
    Int. Journal of Advanced Research in Computer Engineering, vol 3(1) 2009.
    Abstract. We seek an integer programming based heuristic for solving the dual power management problem in wireless sensor networks. For a given network with two possible transmission powers (low and high), the problem is to find a minimum size subset of nodes such that if they are assigned high transmission power while the others are assigned low transmission power, the network will be strongly connected. The main purpose behind this efficient setting is to minimize the total communication power consumption while maintaining the network connectivity. In a theoretical point of view, the problem is known to be difficult to solve exactly. An approach to approximate the solution is to work with a spanning tree of clusters. Each cluster is a strongly connected component when consider low transmission power. We follow the same approach, and we formulate the node selection problem inside clusters as an integer programming problem which is solved exactly using specialized codes. Experimental results show that our algorithm is efficient both in execution time and solution quality.
  9. C. Tadonki, G. Grodidier, O. Pene,
    An efficient CELL library for lattice quantum chromodynamics,
    ACM SIGARCH Computer Architecture News, vol 38(4) 2011.
    Abstract. Quantum chromodynamics (QCD) is the theory of subnuclear physics, aiming at modeling the strong nuclear force, which is responsible for the interactions of nuclear particles. Numerical QCD studies are performed through a discrete formalism called LQCD (Lattice Quantum Chromodynamics). Typical simulations involve very large volume of data and numerically sensitive entities, thus the crucial need of high performance computing systems. We propose a set of CELL-accelerated routines for basic LQCD calculations. Our framework is provided as a unified library and is particularly optimized for an iterative use. Each routine is parallelized among the SPUs, and each SPU achieves it task by looping on small chunk of arrays from the main memory. Our SPU implementation is vectorized with double precision data, and the cooperation with the PPU shows a good overlap between data transfers and computations. Moreover, we permanently keep the SPU context and use mailboxes to synchronize between consecutive calls. We validate our library by using it to derive a CELL version of an existing LQCD package (tmLQCD). Experimental results on individual routines show a significant speedup compare to standard processor, 11 times better than a 2.83 GHz INTEL processor for instance (without SSE). This ratio is around 9 (with QS22 blade) when consider a more cooperative context like solving a linear system of equations (usually referred as Wislon-Dirac inversion). Our results clearly demonstrate that the CELL is a very promising way for high-scale LQCD simulations.
  10. T. Saidani, L. Lacassagne, J. Falcou, C. Tadonki, Samir Bouaziz,
    Parallelization Schemes for Memory Optimization on the Cell Processor : A Case Study on the Harris Corner Detector,
    Transactions on High-Performance Embedded Architectures and Compilers, volume 3(3) 2011.
    Abstract. The Cell processor is a typical example of heterogeneous multiprocessor on-chip architecture that uses several levels of parallelism to deliver high performance. Although its efficiency potential, the execution mode and part of hardware specificities make it being non trivial to deal with. Indeed, reducing the gap between peak performance and effective performance is the challenge for compiler design and efficient implementations. Image processing and media applications are typical "main stream" applications one could consider while investigating on Cell benchmarks. Our investigations, trough various implementation of the Harris detection algorithm, reveal that the impact of DMA controlled data transfers and synchronizations between SPEs are key points for global performance.
  11. El Wardani Dadi, El Mostafa Daoudi, C. Tadonki,
    Improving 3D Shape Retrieval Methods based on Bag-of-Feature Approach by using Local Codebooks,
    International Journal of Future Generation Communication and Networking, Vol. 5, No. 4, December, 2012.
    Abstract. Recent investigations illustrate that view-based methods, with pose normalization preprocessing get better performances in retrieving rigid models than other approaches and still the most popular and practical methods in the field of 3D shape retrieval [1, 2, 3, 4, 5]. In this paper we present an improvement of 3D shape retrieval methods based on bag-of features approach. These methods use this approach to integrate a set of features extracted from 2D views of the 3D objects using the SIFT (Scale Invariant Feature Transform [6]) algorithm into histograms using vector quantization which is based on a global visual codebook. In order to improve the retrieval performances, we propose to associate to each 3D object its local visual codebook instead of a unique global codebook. Experimental results obtained on the Princeton Shape Benchmark database, for the BF-SIFT method proposed by Ohbuchi, et al., and CM-BOF proposed by Zhouhui et al., show that the proposed approach performs better than its original.
  12. D. Barthou, O. Brand-Foissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki,
    Automated Code Generation for Lattice Quantum Chromodynamics and beyond,
    Journal of Physics: Conference Series, Institute of Physics: Open Access Journals, 510, pp.012005, 2014.
    Abstract. We present here our ongoing work on a Domain Specific Language which aims to simplify Monte-Carlo simulations and measurements in the domain of Lattice Quantum Chromodynamics. The tool-chain, called Qiral, is used to produce high-performance OpenMP C code from LaTeX sources. We discuss conceptual issues and details of implementation and optimization. The comparison of the performance of the generated code to the well-established simulation software is also made.
  13. C. Tadonki, F. Meyer, and F. Irigoin,
    Dendrogram Based Algorithm for Dominated Graph Flooding,
    Procedia Computer Science, vol(29), pp. 586-598, 2014.
    Abstract. In this paper, we are concerned with the problem of flooding undirected weighted graphs un- der ceiling constraints. We provide a new algorithm based on a hierarchical structure called dendrogram, which offers the significant advantage that it can be used for multiple flooding with various scenarios of the ceiling values. In addition, when exploring the graph through its dendrogram structure in order to calculate the flooding levels, independent sub-dendrograms are generated, thus offering a natural way for parallel processing. We provide an efficient im- plementation of our algorithm through suitable data structures and optimal organisation of the computations. Experimental results show that our algorithm outperforms well established classical algorithms, and reveal that the cost of building the dendrogram highly predominates over the total running time, thus validating both the efficiency and the hallmark of our method. Moreover, we exploit the potential parallelism exposed by the flooding procedure to design a multi-thread implementation. As the underlying parallelism is created on the fly, we use a queue to store the list of the sub-dendrograms to be explored, and then use a cyclic distribution to assign them to the participating threads. This yields a load balanced and scalable process as shown by additional benchmark results. Our program runs in few seconds on an ordinary computer to flood graphs with more that 20 millions of nodes.
  14. A. Ferreira Leite, A. Boukerche, A. C. Magalhaes Alves de Melo, C. Eisenbeis, C. Tadonki, and C. Ghedini Ralha, ,
    Power-Aware Server Consolidation for Federated Clouds,
    J Concurrency and Computation: Practice and Experience (CCPE), ISSN: 1532-0626, Wiley Press, New York, USA., 2016.
    Abstract. Cloud computing has evolved to provide computing resources on-demand through a virtualized infrastructure, letting applications, computing power, data storage, and network resources to be provisioned and managed over private networks or over the Internet. Cloud services normally run on large data centers and demand a huge amount of electricity. Consequently, the electricity cost represents one of the major concerns of data centers, since it is sometimes nonlinear with the capacity of the data centers, and it is also associated with a high amount of carbon emission (CO2). However, energy-saving schemes that result in too much degradation of the system performance or in violations of service-level agreement (SLA) parameters would eventually cause the users to move to another cloud provider. Thus, there is a need to reach a balance between energy savings and the costs incurred by these savings in the execution of the applications. Therefore, in this paper we propose and evaluate a power and SLA-aware application consolidation solution for cloud federations. It comprises a multi-agent system (MAS) for server consolidation, taking into account service-level agreement, power consumption, and carbon footprint. Different for similar solutions available in the literature, in our solution, when a cloud is overloaded its data center needs to negotiate with other data centers before migrating the workload to another cloud. Simulation results show that our approach can reduce up to 46% of the power consumption while trying to meet performance requirements. Furthermore, we show can provide an adequate solution to deal with power consumption in the clouds.
  15. A. Ferreira Leite, V. Alves, G. Nunes Rodrigues, C. Tadonki, C. Eisenbeis, A. C. Magalhaes Alves de Melo,
    Dohko: An Autonomic System for Provision, Configuration, and Management of Inter-Cloud Environments based on a Software Product Line Engineering Method,
    Cluster Computing Special, 2017.
    Abstract. Configuring and executing applications across multiple clouds is a challenging task due to the various terminologies used by the cloud providers. Therefore, we advocate the use of autonomic systems to do this work automatically. Thus, in this paper, we propose and evaluate Dohko, an autonomic and goal-oriented system for inter-cloud environments. Dohko implements self- configuration, self-healing, and context-awareness properties. Likewise, it relies on a hierarchical P2P overlay (a) to manage the virtual machines running on the clouds and (b) to deal with inter-cloud communication. Furthermore, it depends on a software product line engineering (SPLE) method to enable applications’ deployment and reconfiguration, without requiring pre-configured virtual machine images. Experimental results show that Dohko can free the users from the duty of executing non-native cloud application on single and over many clouds. In particular, it tackles the lack of middleware prototypes that can support different scenarios when using simultaneous services from multiple clouds.
  16. Y. Samadi, M. Zbakh, C. Tadonki,
    Performance comparison between Hadoop and Spark frameworks using Hibench benchmarks,
    Concurrency and Computation: Practice and Experience (CCPE), 2017.
    Abstract. Big data has become one of the major areas of research for cloud service providers due to a large amount of data produced every day, and the inefficiency of traditional algorithms and technologies to handle this large amounts of data. Big data with its characteristics such as Volume, Variety, and Veracity (3V) etc., requires efficient technologies to process in real-time. To solve this problem and to process and analyze this vast amount of data, there are many powerful tools like Hadoop and Spark, which are mainly used in the context of Big Data. They work following the principles of parallel computing. The challenge is to specify which Big Data’s tool is better depending on the processing context. In this paper, we present and discuss a performance comparison between two popular Big Data frameworks deployed on virtual machines. Hadoop MapReduce and Apache Spark are used to efficiently process a vast amount of data in parallel and distributed mode on large clusters, and both of them suit for Big Data processing. We also present the execution results of Apache Hadoop in Amazon EC2, a major Cloud Computing environment. To compare the performance of these two frameworks, we use HiBench benchmark suite, which is an experimental approach for measuring the effectiveness of any computer system. The comparison is made based on three criteria: execution time, throughput and speed up. We teste Wordcount workload with different data sizes for more accurate results. Our experimental results show that the performance of these frameworks varies significantly based on the use case implementation. Furthermore, from our results we draw the conclusion that Spark is more efficient than Hadoop to deal with a large amount of data in major cases. However, Spark requires higher memory allocation, since it loads the data to be processed into memory and keeps them in caches for a while, just like standard databases. So, the choice depends on performance level and memory constraints.
  17. O. Haggui, C. Tadonki, L. Lacassagne, F. Sayadi, B. Ounid ,
    Harris Corner Detection on a NUMA Manycore,
    Future Generation Computer Systems (DOI: 10.1016/j.future.2018.01.048), 2018.
    Abstract. Corner detection is a key kernel for many image processing procedures including pattern recognition and motion detection. The latter, for instance, mainly relies on the corner points for which spatial analyses are performed, typically on (probably live) videos or temporal flows of images. Thus, highly efficient corner detection is essential to meet the real-time requirement of associated applications. In this paper, we consider the corner detection algorithm proposed by Harris, whose the main work-flow is a composition of basic operators represented by their approximations using 3 × 3 matrices. The corresponding data access patterns follow a stencil model, which is known to require careful memory organization and management. Cache misses and other additional hindering factors with NUMA architectures need to be skillfully addressed in order to reach an efficient scalable implementation. In addition, with an increasingly wide vector registers, an efficient SIMD version should be designed and explicitly implemented. In this paper, we study a direct and explicit implementation of common and novel optimization strategies, and provide a NUMA-aware parallelization. Experimental results on a dual-socket INTEL Broadwell-E/EP show a noticeably good scalability performance.
  18. Y. Samadi, M. Zbakh, and C. Tadonki,
    Graph-based Model and Algorithm for Minimizing Big Data Movement in a Cloud Environment,
    Int. J. High Performance Computing and Networking, 2018.
    Abstract. In this paper, we discuss load balancing and data placement strategies in heterogeneous Cloud environments. Load balancing is crucial in large-scale data processing applications, especially in a distributed heterogeneous context like the Cloud. The main goal in data placement strategies is to improve the overall performance through the reduction of data movements among the participating datacenters, taking into account the dependencies. Typically, datacenters are geographically distributed based on theirs characteristics such as the processing speed, the storage capacity, among others technical considerations. Load balancing and efficient data placement on Cloud systems are critical problems, that are difficult to simultaneously cope with, especially in the emerging heterogeneous clusters. In this context, we propose a threshold-based load balancing algorithm, which first balances the load between datacenters, and afterwards minimizes the overhead of data exchanges. The proposed approach is divided into three phases. First, the dependencies between the datasets are identified. Second, the load threshold of each datacenter is estimated based on the processing speed and the storage capacity. Third, the load balancing between the datacenters is managed through the threshold parameters. The heterogeneity of the datacenters together with the dependencies between the datasets are both taken into account. Our experimental results show that our approach can efficiently reduce the frequency of data movement and keep a good load balancing between the datacenters.
  19. Y. Samadi, M. Zbakh, and C. Tadonki,
    DT-MG: many-to-one matching game for tasks scheduling towards resources optimization in cloud computing,
    International Journal of Computers and Applications (DOI: 10.1080/1206212X.2018.1519630), 2018.
    Abstract. The increasing demand of cloud computing motivates researchers to make cloud environments more efficient for its users and more profitable for the providers. More and more datacenters are being built to cater customers' needs. However, datacenters consume large amounts of energy, and this draws negative attention. Therefore, cloud providers are confronted with great pressures to reduce the energy consumed by datacenters. To address this issue, efficient algorithms to reduce energy consumption and to guarantee the quality of service are needed. In this paper, we propose a load balancing algorithm named DT-MG, which aims to reduce energy consumption and maximize the efficiency of the available resources. First, we used the Matching Game Theory model for assigning tasks to datacenters. We then study the optimal operation of the resources by migrating all the tasks of the physical machine under sub-regime to other physical machine, followed by their systematic switch to standby mode. Experimental results prove that the proposed approach reduces energy consumption and the number of task migration while maintaining the service level agreement in comparison with some existing techniques.
  20. A. Susungi and C. Tadonki,
    Intermediate Representations for Explicitly Parallel Programs,
    ACM Computing Surveys, Volume 54, Issue 5 (DOI:, May 2021.
    Abstract. While compilers generally support parallel programming languages and APIs, their internal program representations are mostly designed from the sequential programs standpoint (exceptions include source-to-source parallel compilers, for instance). This makes the integration of compilation techniques dedicated to parallel programs more challenging. In addition, parallelism has various levels and different targets, each of them with specific characteristics and constraints. With the advent of multi-core processors and general purpose accelerators, parallel computing is now a common and pervasive consideration. Thus, software support to parallel programming activities is essential to make this technical transition more realistic and beneficial. The case of compilers is fundamental as they deal with (parallel) programs at a structural level, thus the need for intermediate representations. This article surveys and discusses attempts to provide intermediate representations for the proper support of explicitly parallel programs. We highlight the gap between available contributions and their concrete implementation in compilers and then exhibit possible future research directions.
  21. L. Bouhouch and C. Tadonki, M. Zbakh,
    Dynamic Data Replication and Placement Strategy in Geographically Distributed Data centers,
    Concurrency and Computation: Practice and Experience (CCPE) - 10.1002/cpe.6858 , 2022
    Abstract. WWith the evolution of geographically distributed data centers in the Cloud Computing landscape along with the amount of data being processed in these data centers, which is growing at an exponential rate, processing massive data applications become an important topic. Since a given task may require many datasets for its execution and the datasets are spread over several different data centers, finding an efficient way to manage the datasets storage across nodes of a Cloud system is a difficult problem. In fact, the execution time of a task might be influenced by the cost of data transfers, which mainly depends on two criterias. The first one is the initial placement of the input datasets during the build-time phase, while the second is the replication of the datasets during the runtime phase. The replication is explicitly consider when datasets are being migrated over the data centers in order to make them locally available wherever needed. Data placement and data replication are important challenges in Cloud Computing. Nevertheless, many studies focus on data placement or data replication exclusively. In this paper, a combination of a data placement strategy followed by a dynamic data replication management strategy is proposed, with the purpose of reducing the associated cost of all data transfers between the (distant) data centers. Our proposed data placement approach considers the main characteristics of a data center such as storage capacity and read/write speeds to efficiently store the datasets, while our dynamic data replication management approach considers three parameters: the number of replicas in the system, the dependency between datasets and tasks and the storage capacity of data centers. The decision of when and whether to keep or to delete replicas is determined by the fulfillment of those three parameters. Our approach estimates the total execution time of the tasks as well as the monetary cost, considering the data transfers activity. Our experiments are conducted using Cloudsim simulator. The obtained results show that our proposed strategies produce an efficient data management by reducing the overheads of the data transfers, compared to both a data placement without replication (by 76%) and the selected data replication approach from Kouidri et al. (by 52%), and by improving the financial cost.
 Conference/Symposium/Workshop Papers (grouped by topic)

1. Parallel Computing (algorithm, scheduling, complexity, implementation, dynamic system)
Overview. This subset of my outputs is related to general purpose parallel computing. It includes adhoc parallel algorithms, methodology for systematic parallel scheduling, efficient parallel implementation, and parallel dynamic systems. My actual focus in this topic is on the design and analysis of powerful methodologies specific to multicore processors and accelarators based architectures (CELL, GPU, ...), both for domain specific considerations and a wider audience. I keep investigating on fundamental aspects, since new hypothesis came up with emergent architectures and the increasing and pervasive HPC demand.
  1. Claude Tadonki,
    Système d'équations récurrentes et multiplication parallèle d'un vecteur par un produit tensoriel de matrices,
    Rencontres Francophones de Parallelisme Renpar'11, Rennes (France), 1999.
  2. Sanjay Rajopadhye, Tanguy Risset, et Claude Tadonki,
    The algebraic path problem revisited,
    European Conference on Parallel Computing  Europar99, Toulouse (France), Lncs Sringer-Verlag, N° 1685, p. 698-707, August 1999.
  3. Claude Tadonki,
    Ordonnancements canoniques,
    , Rencontres Francophones de Parallelisme, Besançon (France), Juin 2000.
  4. Claude Tadonki,
    Parallel Cholesky Factorization,
    Parallel Matrix Algorithms and Appliations  PMAA Worshop, Neuchatel (Switzerland), August 2000.
  5. Claude Tadonki, et Bernard Philippe,
    Méthodologie de conception d'algorithmes efficaces pour le produit tensoriel,
    CARI2000, Tananarive (Madagascar), Octobre 2000.
  6. Patrice Quinton, Claude Tadonki, et Maurice Tchuente,
    Un échéancier systolique et son utilisation dans l'ATM,
    CARI2000, Tananarive (Madagascar), Octobre 2000.
  7. Claude Tadonki,
    Complexité des ordonnancements canoniques et dérivation d'architecture,
    Rencontres Francophones de Parallelisme Renpar13, Paris (France), Avril 2001 ( get it! ).
  8. Claude Tadonki,
    A Recursive Method for Graph Scheduling,
    International Symposium on Parallel and Distributed Computing (SPDC), Iasi, Romania, July 2002 ( get it! ).
  9. R. Ndoundam, C. Tadonki, and M. Tchuente,
    Parallel chip firing game associated with n-cube orientation,
    International Conference on Computational Science, ICCS04 (LNCS/Springer), Krakow, Poland, June 2004 .
  10. T. Saidani, J. Falcou, C. Tadonki, L. Lacassagne, and D. Etiemble,
    Algorithmic Skeletons within an Embedded Domain Specific Language for the CELL Processor,
    Parallel Architectures and Compilation Techniques (PACT), PACT09, Raleigh, North Carolina (USA), September 12-16, 2009. (pdf)
  11. C. Tadonki, G. Grosdidier, and O. Pene,
    An efficient CELL library for Lattice Quantum Chromodynamics,
    International Workshop on Highly Efficient Accelerators and Reconfigurable Technologies (HEART) in conjunction with the 24th ACM International Conference on Supercomputing (ICS), pp. 67-71, Epochal Tsukuba, Tsukuba, Japan, June 1-4, 2010. (ACM Computer Architecture News)
  12. C. Tadonki, L. Lacassagne T. Saidani, J. Falcou, K. Hamidouche,
    The Harris algorithm revisited on the CELL processor ,
    International Workshop on Highly Efficient Accelerators and Reconfigurable Technologies (HEART) in conjunction with the 24th ACM International Conference on Supercomputing (ICS), pp. 97-100, Epochal Tsukuba, Tsukuba, Japan, June 1-4, 2010. (ACM Computer Architecture News)
  13. C. Tadonki,
    Ring pipelined algorithm for the algebraic path problem on the CELL Broadband Engine,
    Workshop on Applications for Multi and Many Core Architectures (WAMMCA 2010) in conjunction with the International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2010), Petropolis, Rio de Janeiro, Brazil, October 27-30, 2010. (IEEE digital library) - abstract - slides - pdf - code
  14. C. Tadonki,
    Large Scale Kronecker Product on Supercomputers,
    2nd Workshop on Architecture and Multi-Core Applications (WAMCA 2011) in conjunction with the International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011), Vitoria, Espirito Santo, Brazil, October 26-29, 2011. (IEEE digital library) - abstract - slides - pdf - code
  15. D. Barthou, G. Grosdidier, M. Kruse, O. Pene and C. Tadonki,
    QIRAL: A High Level Language for Lattice QCD Code Generation,
    Programming Language Approaches to Concurrency and Communication-cEntric Software (PLACES'12) in conjunction with the European joint Conference on Theory & Practice of Software (ETAPS), Tallinn, Estonia, March 24-April 1, 2012.
  16. C. Tadonki,
    Basic parallel and distributed computing curriculum,
    Second NSF/TCPP Workshop on Parallel and Distributed Computing Education (EduPar'12) in conjunction with the 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Shanghai, China, May 21-25, 2012.
  17. C. Tadonki, L. Lacassagne, E. Dadi, M. Daoudi
    Accelerator-based implementation of the Harris algorithm,
    5th International Conference on Image Processing (ICISP 2012), Agadir, Morocco, June 28-30, 2012.
  18. P.-L. Caruana and C. Tadonki
    Seamless Parallelism in MATLAB,
    Parallel Distributed Computing and Networks, Innsbruck, Austria, Feb 16-18, 2014.
  19. F. Meyer, C. Tadonki, and F. Irigoin
    Dendrogram Based Algorithm for Dominated Graph Flooding,
    International Conference on Computational Science (ICCS 2014), Cairns, Australia, June 10-12, 2014.
  20. A. Susungi, A. Cohen, and C. Tadonki,
    More Data Locality for Static Control Programs on NUMA Architectures,
    7th International Workshop on Polyhedral Compilation Techniques (IMPACT 2017), Stockholm, Sweden, January 23, 2017.
  21. C. Tadonki,
    Scalable NUMA-Aware Wilson-Dirac on Supercomputers,
    International Conference on High Performance Computing & Simulation (HPCS 2017), Genoa, Italy, July 17-21, 2017.
  22. A. Susungi, N. A. Rink, J. Castrillon, I. Huismann, A. Cohen, C. Tadonki, J. Stiller, J. Frohlich,
    Towards Compositional and Generative Tensor Optimizations,
    16th International Conference on Generative Programming: Concepts & Experience (GPCE 2017), Vancouver, Canada, October 23-24 2017.
  23. N. A. Rink, A. Susungi, J. Castrillon, I. Huismann, A. Cohen, . Stiller, and C. Tadonki,
    CFDlang: High-level code generation for high-order methods in fluid dynamics,
    International Workshop on Real World Domain Specific Languages 2018 (RWDSL 2018) in conjunction with the CGO'18 international symposium on Code Generation and Optimisation, DOI10.1145/3183895.3183900, Vienna, Austria, February 24, 2018.
  24. O. Haggui, C. Tadonki, F. Sayadi, B. Ouni,
    Evaluation of an OpenMP Parallelization of Lucas-Kanade on a NUMA-Manycore,
    9th Workshop on Architecture and Multi-Core Applications (WAMCA 2018) in conjunction with the 30th International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011), Ecole Nationale Superieur de Lyon, Lyon, France, September 24-27, 2018.
  25. A. Susungi, N. A. Rink, A. Cohen, J. Castrillon, C. Tadonki,
    Meta-programming for Cross-Domain Tensor Optimizations,
    17th International Conference on Generative Programming: Concepts & Experience (GPCE 2017) - (copy of the paper), Boston - Massachusetts, USA, November 5-6 2018.
  26. O. Haggui, C. Tadonki, F. Sayadi, B. Ouni,
    Efficient GPU Implementation of Lucas-Kanade through OpenACC,
    14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP 2019), Prague, Czech Republic, February 25-27, 2019.
  27. O. Haggui, C. Tadonki, F. Sayadi, B. Ouni,
    Memory Efficient Deployment of an Optical Flow Algorithm on GPU Using OpenMP,
    20th International Conference on Image Analysis AND Processing ( ICIAP 2019), Trento, Italy, 9-13 September, 2019.

2. Operation Research (algorithm, modeling, method, tool)
Overview. The main concern here is operation research and convex optimization. My work on this topic covers the design and implementation of efficient solvers for both continuous optimization and discrete optimization. The connection between the two universes through geometric, analytic, and algebraic techniques (global optimization, semidefinite programming, spectral theory, ...) is something which makes the topic very exciting, since this synergy has proven to be a good way of tackling difficult combinatorial problems.
  1. L. Drouet, A. Dubois, A. Haurie and C. Tadonki,
    A MARKAL-Lite Model for Sustainable Urban Transportation,
    Optimization days, Montreal, Canada, May, 2003. 
  2. Claude Tadonki,
    ProxAccpm: A convex optimization solver,
    International Symposium on Mathematical Programing, ISMP2003, Copengagen, Danmark, August 2003 ( get it! ).
  3. O. Briant, C. Lemarechal,K. Monneris,N. Perrot,C. Tadonki,F. Vanderbeck,J.-P. Vial,C. Beltran,P. Meurdesoif,
    Comparison of various approaches for column generation,
    Eigth Aussois Workshop on Combinatorial Optimization, 5-9 january 2004.
  4. Claude Tadonki and Jean-Philippe Vial,
    Efficient algorithm for linear pattern separation,
    International Conference on Computational Science, ICCS04 (LNCS/Springer), Krakow, Poland, June 2004 .
  5. Cesar Beltran, Claude Tadonki, Jean-Philippe Vial,
    Semi-Lagrangian relaxation ,
    Computational Management Science Conference and Workshop on Computational Econometrics and Statistics, Link, Neuchatel, Switzerland, April 2004 .
  6. Claude Tadonki, Cesar Beltran and Jean-Philippe Vial ,
    Portfolio management with integrality constraints,
    Computational Management Science Conference and Workshop on Computational Econometrics and Statistics, Link, Neuchatel, Switzerland, April 2004 .
  7. C. Beltran, C. Tadonki and J.-Ph. Vial,
    The p-median problem solved by semi-Lagrangian relaxation,
    First Mathematical Programming Society International Conference on Continuous Optimization (ICCOPT I), Troy, USA, August 2-4, 2004.

3. Scientific and Technical Computing (Sensors network, power aware computing, program comprehension, data analysis, image processing)
Overview. This group is related to specialized algorithms, program optimization and data analysis. My investigations on sensors networks mainly focus on the network topology (disk graph) and the cooperation among sensors (distributed algorithms). Concerning the power aware computing topic, the problem is to reduced the energy dissipated by the execution of a given program, particularly in a context where the energy is a critical resource (embedded systems). My contribution includes combinatorial and analytical methodologies to achieve the task of modeling energy complexity and how to reschedule the algorithm accordingly. Regarding data refinement, this is related to statistical and approximation approaches to improve the matching between experimental data and the model. Further steps in experimental research are sensitive to this agreement.
  1. Claude Tadonki, Mitali Singh, Jose Rolim and Viktor K. Prasanna,
    Combinatorial Techniques for Memory Power State Scheduling in Energy Constrained Systems,
    Workshop on Approximation and Online Algorithms
    (WAOA), WAOA2003 (LNCS/Springer), Budapest, Hungary, September 2003 .
  2. Claude Tadonki and Jose Rolim ,
    An analytical model for energy minimization,
    III Workshop on Efficient and Experimental Algorithms, WEA04 (LNCS/Springer), Angra dos Reis, Rio de Janeiro, Brazil, May 2004.
  3. Claude Tadonki ,
    Universal Report: A Generic Reverse Engineering Tool ,
    12th IEEE International Workshop on Program Comprehension, IWPC 2004 (IEEE), University of Bari, Bari, Italy , June 2004 .
  4. Claude Tadonki and Jose Rolim,
    An integer programming heuristic for the dual power management problem in wireless sensor networks,
    2nd International Workshop on Managing Ubiquitous Communications and Services, MUCS2004, Dublin, Ireland, December 13, 2004.
  5. Claude Tadonki,
    Refinement experiments with RADDAM data,
    EMBL bilateral meeting, Hamburg, Germany, June 26-28, 2006.
  6. Claude Tadonki,
    Off-line settings in wireless networks,
    3rd International Symposium on Computational Intelligence and Intelligent Informatics, ISCIII2007, Agadir, Morocco, March 28-30, 2007.
  7. E. Dadi, M. Daoudi, C. Tadonki
    3D Shape Retrieval using Bag-of-feature method basing on local codebooks,
    5th International Conference on Image Processing (ICISP 2012), Agadir, Morocco, June 28-30, 2012.
  8. E. Dadi, M. Daoudi, C. Tadonki
    Fast 3D shape retrieval method for classified databases,
    International Conference on Complex Systems (ICCS'12), Agadir, Morocco, November 5-6, 2012.
  9. A. Leite, C. Tadonki, C. Eisenbeis, T. Raiol, M.E. Walter, and A. de Melo
    Excalibur: An Autonomic Cloud Architecture for Executing Parallel Applications,
    Fourth International Workshop on Cloud Data and Platforms (CloudDP 2014), Amsterdam, Netherlands, April 13, 2014.
  10. A. Leite, C. Tadonki, C. Eisenbeis, and A. de Melo
    A Fine-grained Approach for Power Consumption Analysis and Prediction,
    International Conference on Computational Science (ICCS 2014), Cairns, Australia, June 10-12, 2014.
  11. A. F. Leite, V. Alves, G. N. Rodrigues, C. Tadonki, C. Eisenbeis, A. C. M. A. de Melo
    Automating Resource Selection and Configuration in Inter-clouds through a Software Product Line Method,
    8th IEEE International Conference on Cloud Computing, CLOUD 2015, New York City, NY, USA, June 27 - July 2, 2015.
  12. Y. Samadi, M. Zbakh, C. Tadonki
    Comparative study between Hadoop and Spark based on Hibench benchmarks,
    2nd International Conference on Cloud Computing Technologies and Applications (CloudTech 2016), Marrakesh, Morocco, 24-26 May, 2016.
  13. A. F. Leite, V. Alves, G. N. Rodrigues, C. Tadonki, C. Eisenbeis, A. C. M. A. de Melo
    ADohko: An Autonomic System for Provision, Configuration, and Management of Inter- Cloud Environments based on a Software Product Line Engineering Method,
    IEEE International Conference on Cloud and Autonomic Computing, CICCAC 2016, Augsburg, Germany, September 12-16, 2016.
  14. P. Kiepas, J. Kozlak, C. Tadonki and C. Ancourt,
    Profile-based Vectorization for MATLAB,
    5th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming ARRAY 2018, Philadelphia, USA - June 19, 2018.
  15. Yassir Samadi, Mostapha Zbakh, and C. Tadonki,
    Workflow Scheduling Issues and Techniques in Cloud Computing: A Systematic Literature Review, Cloud Computing and Big Data: Technologies, Applications and Security, Zbakh, M., Essaaidi, M., Manneback, P., Rong, C. (Eds.), ISBN 978-3-319-97719-5, Springer, 2018.
  16. Y. Samadi, M. Zbakh, C. Tadonki
    Analyzing fault tolerance mechanism of Hadoop Mapreduce under different type of failures,
    4th International Conference on Cloud Computing Technologies and Applications
    (CloudTech 2018), Brussels - Belgium, 26-28 November, 2018.
  17. Patryk Kiepas (MINES ParisTech / PSL University), Corinne Ancourt, C. Tadonki, and Jarosław Koźlak(AGH University of Science and Technology, Kraków)
    Using performance event profiles to deduce an execution model of MATLAB with Just-In-Time compilation,
    32nd Workshop on Languages and Compilers for Parallel Computing (LCPC 2019), ATLANTA - USA, OCTOBER 22-24, 2019.
  18. L. Bouhouch, M. Zbakh, C. Tadonki
    Data Migration - Cloudsim Extension,
    3rd International Conference on Big Data Research (ICBDR 2019), Paris - France, 20-22 November, 2019.
3rd International Conference on Big Data Research (ICBDR 2019), Nov 2019, Paris, France  

 Technical Reports
  1. Claude Tadonki, and Bernard Philippe, Parallel multiplication of a vector by a Kronecker product of matrices, IRISA report n° 1194, 1998.
    Optimal Parallel Algorithm for the Kronecker Product in log(p) communication steps
  2. Patrice Quinton, Claude Tadonki, et Maurice Tchuente,  Un échéancier systolique et son utilisation dans l'ATMIRISA report n° 1348, 2000.
  3. Claude Tadonki, Synthèse d'ordonnancements parallèles par reproduction canonique, IRISA report n° 1349, also INRIA report n° 3996, 2000.
  4. David Cachera, Sanjay Rajopadhye, Tanguy Risset, and  Claude Tadonki, Parallelization of the algebraic path problem on linear simd/spmd arrays, IRISA report n° 1409, 2001.
  5. Claude Tadonki and Jean-Philippe Vial, The linear separation problem revisited with accpm , Cahier de Recherche n° 2002.11, University of Geneva, June 2002. ( get it! )
  6. F. Babonneau, C. Beltran, O. du Merle, C. Tadonki and J.-P. Vial, The proximal analytic center cutting plane method, Technical report, Logilab, HEC, University of Geneva, 2003.
  7. Cesar Beltran, Claude Tadonki, and Jean-philippe Vial, Semi-Lagrangian relaxation, Technical report, Logilab, HEC, University of Geneva, 2004.
  Book Chapters
  1. Ordonnancement pour l'informatique parallele, A. Moukrim and C. Picouleau (Edt), Hermes, ( details! ).
 Unpublished/Submited material
  1. A. Dubois, A. Haurie, C. Tadonki, and D. Zachary, An Operational Energy Modelling System, 2003 ( get it! )
  2. R. Ndoundam, C. Tadonki, and M. Tchuente, Parallel chip firing game associated with n-cube orientation, 2000 ( get it! ).
  3. F. Babonneau, C. Beltran, O. du Merle, C. Tadonki, and J.-P. Vial, The proximal analytic center cutting plane method , 2003 ( get it! ).
  4. C. Tadonki, Universal Report: A generic reverse engeneering tool, 2003 ( get it! ).
  5. C. Tadonki, M. Singh, R. Jose, and V. Prasanna, Combinatorial technic for memory power state scheduling in energy-constrained system, 2003 ( get it! ).
  6. D. Cachera, S. Rajopadye, T. Risset, and C. Tadonki,  Algorithmic tiling for efficient parallel APP implementation, 2003 ( get it! ).
  7. C. Tadonki and J.-P. Vial,  Portfilio Selection with Cardinality and Bound Constraint, 2003 ( get it! ).
  8. C. Tadonki and J. Rolim,  An analytical model for energy minimization, 2003 ( get it! ).
  9. C. Tadonki and J. Rolim,  An integer programming heuristic for the dual power management problem in wireless sensor networks, 2004 ( get it! ).
 Work in progress
  1. Portfolio Selection with Cardinality Constraints
  2. Efficient Matrix Computations in Cutting Planes Algorithms
  3. Algorithmic Technics of Designing Energy Efficient Algorithms
  4. Structural Method for Lower Bounds in Complexity
  5. Improved Graph Model for Optical Network of Sensors
  6. Dynamic Behavior of Parallel Chip Firing Game on Regular Graphs
  7. Chapter of a book on "Parallel Scheduling" published by Hermes
  8. Book in "Scientific Computation"
How to use CPLEX with Matlab


Download my CV in PDF