
Claude Tadonki, and Bernard
Philippe,
Parallel multiplication of a vector by a Kronecker product of
matrices,
Parallel
Distributed Computing Practices PDCP, volume 2(4),
1999.
Abstract. We provide a parallel algorithm for the Kronecker product of matrices based on a cyclic partition of loops. Our general model unifies one scheme with redundant computations but no communication and an opposite scheme without redundant computation but with interprocessor communications. Both schema can be mixed with a certain balance influenced by the volume of floating point operations and the characteristics of the target architecture. This work has valuated experimental validations ({\small CENJU and INTEL PARAGON}).

Claude Tadonki, and Bernard
Philippe,
Parallel multiplication of a vector by a Kronecker product of
matrices (part II),
Parallel Distributed Computing Practices PDCP, volume 3(3),
2000.
Abstract. This paper presents a revised and generalized version of our previous parallel algorithm for the Kronecker product of matrices. We show how to map the computation on any number of processors that is a divisor of the product of the matrices sizes (instead of being a prefix of this product as previously). Moreover, we show that the minimum number of parallel communications with p processors is log(p) whatever the algorithm, and that our algorithm achieves this optimal performance. The work is validated by significant efficiencies obtained from experimental measures on the {\small CRAY} machine.

Sanjay Rajopadhye, Tanguy
Risset, et Claude Tadonki,
Algebraic Path Problem on linear arrays,
Techniques et Sciences Informatiques TSI, 20 (5), 2001.
Abstract. We seek a linear SPMD implementation of the Warshal algorithm for the Algebraic Path Problem (Unified model of the transitive closure, shortest paths, Gauss elimination, ...). Our parallel algorithm is systoliclike in its original version, and offers the important advantage of being able to run on a shorter number of processors by a natural round robbing remapping. We derive and validate a blocked version for standard distributed memory parallel machines, as the cost of data communication would be severe otherwise. We show experimental validations on the {\small CRAY} machine.

Claude Tadonki,
A
Recursive Method for Graph Scheduling,
Scientific Annals of Cuza University,
Vol 11, p.121131, 2002
Abstract. This paper presents a recursive graph scheduling method. Our paradigm applies on any acyclic graph that can be partitioned into isomorphic subgraphs. Indeed, many common problem domains are 3D graphs that can be organized as a chain of isomorphic 2D subgraphs. Starting from any valid schedule of the source subgraph of the chain, we use the generic isomorphism to systematically derive a valid global schedule, with a local pipeline between two consecutive subgraphs. Our technique is then characterized by the nature of the partition and the source schedule. We discuss the impact of these characteristics on the complexity on the generated parallel schedules.

C. Beltran, C. Tadonki and J.Ph. Vial,
Solving the pmedian problem with a semiLagrangian relaxation,
Computational Optimization and
Applications, Volume 35(2), October 2006. (pdf)
Abstract. This paper deals with operation research and non differentiable optimization. The socalled Pmedian problem is the problem of locating P "facilities" relative to a set of "customers" such that the sum of the shortest demand weighted distance between "customers" and "facilities" is minimized. Indeed, this a classical combinatorial optimization problem with a huge set of potential solutions. Using a semiLagrangian relaxation, we tackle the problem in its associated continuous formulation and report our nonsmooth convex optimization engineering results.

F. Babonneau, C. Beltran, A. Haurie, C. Tadonki and J.P. Vial,
ProximalACCPM: a versatile oracle based optimization
method,
Computational and Management
Science, Volume 9, 2007. (pdf)
Abstract. Oracle Based Optimization (OBO) conveniently designates an approach to handle a class of convex optimization problems in which the information pertaining to the function to be minimized and/or to the feasible domain takes the form of a linear outer approximation revealed by an oracle. We show how difficult problems can be cast in this format, and then solved within our context. We present our method, socalled ProximalACCPM, to trigger the OBO approach and give a snapshot on numerical results. This paper summarizes several contributions with the OBO approach and aims to give, in a single report, enough information on the method and its implementation to facilitate new applications.

C. Tadonki,
Mathematical and Computational Engineering in XRay Crystallography,
International Journal of Advanced Computer Engineering, volume 1(2) 2008.
Abstract. The main purpose of XRay Crystallography is to predict a macromolecular structure using from XRay synchrotron radiation. Among existing paradigms to achieve this task, analytical approaches come up as good candidates for automation trough mathematical approximations and computational engineering. In addition to the later, statistical processing are required in order to refine the data according to the physical model and the boarding effects of the experiments. We revisit the basis of the problem and focus on a more precise effect, socalled radiation damage, for which it has been also proven that it can be artificially managed to become instructive.

C. Tadonki,
Integer Programming Heuristic for the Dual Power Setting Problem in Wireless
Sensors Networks,
Int. Journal of Advanced Research in Computer Engineering, vol 3(1) 2009.
Abstract. We seek an integer programming based heuristic for solving the dual power management problem in wireless sensor networks. For a given network with two possible transmission powers (low and high), the problem is to find a minimum size subset of nodes such that if they are assigned high transmission power while the others are assigned low transmission power, the network will be strongly connected. The main purpose behind this efficient setting is to minimize the total communication power consumption while maintaining the network connectivity. In a theoretical point of view, the problem is known to be difficult to solve exactly. An approach to approximate the solution is to work with a spanning tree of clusters. Each cluster is a strongly connected component when consider low transmission power. We follow the same approach, and we formulate the node selection problem inside clusters as an integer programming problem which is solved exactly using specialized codes. Experimental results show that our algorithm is efficient both in execution time and solution quality.

C. Tadonki, G. Grodidier, O. Pene,
An efficient CELL library for lattice quantum chromodynamics,
ACM SIGARCH Computer Architecture News, vol 38(4) 2011.
Abstract. Quantum chromodynamics (QCD) is the theory of subnuclear physics, aiming at modeling the strong nuclear force, which is responsible for the interactions of nuclear particles. Numerical QCD studies are performed through a discrete formalism called LQCD (Lattice Quantum Chromodynamics). Typical simulations involve very large volume of data and numerically sensitive entities, thus the crucial need of high performance computing systems. We propose a set of CELLaccelerated routines for basic LQCD calculations. Our framework is provided as a unified library and is particularly optimized for an iterative use. Each routine is parallelized among the SPUs, and each SPU achieves it task by looping on small chunk of arrays from the main memory. Our SPU implementation is vectorized with double precision data, and the cooperation with the PPU shows a good overlap between data transfers and computations. Moreover, we permanently keep the SPU context and use mailboxes to synchronize between consecutive calls. We validate our library by using it to derive a CELL version of an existing LQCD package (tmLQCD). Experimental results on individual routines show a significant speedup compare to standard processor, 11 times better than a 2.83 GHz INTEL processor for instance (without SSE). This ratio is around 9 (with QS22 blade) when consider a more cooperative context like solving a linear system of equations (usually referred as WislonDirac inversion). Our results clearly demonstrate that the CELL is a very promising way for highscale LQCD simulations.

T. Saidani, L. Lacassagne, J. Falcou, C. Tadonki, Samir Bouaziz,
Parallelization Schemes for Memory Optimization on the Cell Processor : A Case Study on the Harris Corner Detector,
Transactions on HighPerformance Embedded Architectures and Compilers, volume 3(3) 2011.
Abstract. The Cell processor is a typical example of heterogeneous multiprocessor onchip architecture that uses several levels of parallelism to deliver high performance. Although its efficiency potential, the execution mode and part of hardware specificities make it being non trivial to deal with. Indeed, reducing the gap between peak performance and effective performance is the challenge for compiler design and efficient implementations. Image processing and media applications are typical "main stream" applications one could consider while investigating on Cell benchmarks. Our investigations, trough various implementation of the Harris detection algorithm, reveal that the impact of DMA controlled data transfers and synchronizations between SPEs are key points for global performance.
 El Wardani Dadi, El Mostafa Daoudi, C. Tadonki,
Improving 3D Shape Retrieval Methods based on BagofFeature
Approach by using Local Codebooks,
International Journal of Future Generation Communication and Networking,
Vol. 5, No. 4, December, 2012.
Abstract. Recent investigations illustrate that viewbased methods, with pose normalization preprocessing get better performances in retrieving rigid models than other approaches and still
the most popular and practical methods in the field of 3D shape retrieval [1, 2, 3, 4, 5]. In
this paper we present an improvement of 3D shape retrieval methods based on bagof
features approach. These methods use this approach to integrate a set of features extracted
from 2D views of the 3D objects using the SIFT (Scale Invariant Feature Transform [6])
algorithm into histograms using vector quantization which is based on a global visual
codebook. In order to improve the retrieval performances, we propose to associate to each
3D object its local visual codebook instead of a unique global codebook. Experimental
results obtained on the Princeton Shape Benchmark database, for the BFSIFT method
proposed by Ohbuchi, et al., and CMBOF proposed by Zhouhui et al., show that the
proposed approach performs better than its original.
 D. Barthou, O. BrandFoissac, O. Pene, G. Grosdidier, R. Dolbeau, C. Eisenbeis, M. Kruse, K. Petrov, C. Tadonki,
Automated Code Generation for Lattice Quantum Chromodynamics and beyond,
Journal of Physics: Conference Series, Institute of Physics: Open Access Journals,
510, pp.012005, 2014.
Abstract. We present here our ongoing work on a Domain Specific Language which aims to simplify MonteCarlo simulations and measurements in the domain of Lattice Quantum Chromodynamics. The toolchain, called Qiral, is used to produce highperformance OpenMP C code from LaTeX sources. We discuss conceptual issues and details of implementation and optimization. The comparison of the performance of the generated code to the wellestablished simulation software is also made.
 C. Tadonki, F. Meyer, and F. Irigoin,
Dendrogram Based Algorithm for Dominated Graph Flooding,
Procedia Computer Science,
vol(29), pp. 586598, 2014.
Abstract. In this paper, we are concerned with the problem of flooding undirected weighted graphs un der ceiling constraints. We provide a new algorithm based on a hierarchical structure called dendrogram, which offers the significant advantage that it can be used for multiple flooding with various scenarios of the ceiling values. In addition, when exploring the graph through its dendrogram structure in order to calculate the flooding levels, independent subdendrograms are generated, thus offering a natural way for parallel processing. We provide an efficient im plementation of our algorithm through suitable data structures and optimal organisation of the computations. Experimental results show that our algorithm outperforms well established classical algorithms, and reveal that the cost of building the dendrogram highly predominates over the total running time, thus validating both the efficiency and the hallmark of our method. Moreover, we exploit the potential parallelism exposed by the flooding procedure to design a multithread implementation. As the underlying parallelism is created on the fly, we use a queue to store the list of the subdendrograms to be explored, and then use a cyclic distribution to assign them to the participating threads. This yields a load balanced and scalable process as shown by additional benchmark results. Our program runs in few seconds on an ordinary computer to flood graphs with more that 20 millions of nodes.
 A. Ferreira Leite, A. Boukerche, A. C. Magalhaes Alves de Melo, C. Eisenbeis, C. Tadonki, and C. Ghedini Ralha, ,
PowerAware Server Consolidation for Federated Clouds,
J Concurrency and Computation: Practice and Experience (CCPE), ISSN: 15320626, Wiley Press, New York, USA., 2016.
Abstract. Cloud computing has evolved to provide computing resources ondemand through a virtualized infrastructure, letting applications, computing power, data storage, and network resources to be provisioned and managed over private networks or over the Internet. Cloud services normally run on large data centers and demand a huge amount of electricity. Consequently, the electricity cost represents one of the major concerns of data centers, since it is sometimes nonlinear with the capacity of the data centers, and it is also associated with a high amount of carbon emission (CO2). However, energysaving schemes that result in too much degradation of the system performance or in violations of servicelevel agreement (SLA) parameters would eventually cause the users to move to another cloud provider. Thus, there is a need to reach a balance between energy savings and the costs incurred by these savings in the execution of the applications. Therefore, in this paper we propose and evaluate a power and SLAaware application consolidation solution for cloud federations. It comprises a multiagent system (MAS) for server consolidation, taking into account servicelevel agreement, power consumption, and carbon footprint. Different for similar solutions available in the literature, in our solution, when a cloud is overloaded its data center needs to negotiate with other data centers before migrating the workload to another cloud. Simulation results show that our approach can reduce up to 46% of the power consumption while trying to meet performance requirements. Furthermore, we show can provide an adequate solution to deal with power consumption in the clouds.
 A. Ferreira Leite, V. Alves, G. Nunes Rodrigues, C. Tadonki, C. Eisenbeis, A. C. Magalhaes Alves de Melo,
Dohko: An Autonomic System for Provision, Configuration, and Management of InterCloud Environments based on a Software Product Line Engineering Method,
Cluster Computing Special, 2017.
Abstract. Configuring and executing applications across multiple clouds is a challenging task due to the various terminologies used by the cloud providers. Therefore, we advocate the use of autonomic systems to do this work automatically. Thus, in this paper, we propose and evaluate Dohko, an autonomic and goaloriented system for intercloud environments. Dohko implements self configuration, selfhealing, and contextawareness properties. Likewise, it relies on a hierarchical P2P overlay (a) to manage the virtual machines running on the clouds and (b) to deal with intercloud communication. Furthermore, it depends on a software product line engineering (SPLE) method to enable applications’ deployment and reconfiguration, without requiring preconfigured virtual machine images. Experimental results show that Dohko can free the users from the duty of executing nonnative cloud application on single and over many clouds. In particular, it tackles the lack of middleware prototypes that can support different scenarios when using simultaneous services from multiple clouds.
 Y. Samadi, M. Zbakh, C. Tadonki,
Performance comparison between Hadoop and Spark frameworks using Hibench benchmarks,
Concurrency and Computation: Practice and Experience (CCPE), 2017.
Abstract. Big data has become one of the major areas of research for cloud service providers due to a large amount
of data produced every day, and the inefficiency of traditional algorithms and technologies to handle
this large amounts of data. Big data with its characteristics such as Volume, Variety, and Veracity
(3V) etc., requires efficient technologies to process in realtime. To solve this problem and to process
and analyze this vast amount of data, there are many powerful tools like Hadoop and Spark, which
are mainly used in the context of Big Data. They work following the principles of parallel computing.
The challenge is to specify which Big Data’s tool is better depending on the processing context. In this
paper, we present and discuss a performance comparison between two popular Big Data frameworks
deployed on virtual machines. Hadoop MapReduce and Apache Spark are used to efficiently process a
vast amount of data in parallel and distributed mode on large clusters, and both of them suit for Big
Data processing. We also present the execution results of Apache Hadoop in Amazon EC2, a major
Cloud Computing environment. To compare the performance of these two frameworks, we use HiBench
benchmark suite, which is an experimental approach for measuring the effectiveness of any computer
system. The comparison is made based on three criteria: execution time, throughput and speed up.
We teste Wordcount workload with different data sizes for more accurate results. Our experimental
results show that the performance of these frameworks varies significantly based on the use case
implementation. Furthermore, from our results we draw the conclusion that Spark is more efficient than
Hadoop to deal with a large amount of data in major cases. However, Spark requires higher memory
allocation, since it loads the data to be processed into memory and keeps them in caches for a while,
just like standard databases. So, the choice depends on performance level and memory constraints.
 O. Haggui, C. Tadonki, L. Lacassagne, F. Sayadi, B. Ounid ,
Harris Corner Detection on a NUMA Manycore,
Future Generation Computer Systems (DOI: 10.1016/j.future.2018.01.048), 2018.
Abstract. Corner detection is a key kernel for many image processing procedures including pattern recognition and motion detection. The latter, for instance, mainly relies on the corner points for which spatial analyses are performed, typically on (probably live) videos or temporal flows of images. Thus, highly efficient corner detection is essential to meet the realtime requirement of associated applications. In this paper, we consider the corner detection algorithm proposed by Harris, whose the main workflow is a composition of basic operators represented by their approximations using 3 × 3 matrices. The corresponding data access patterns follow a stencil model, which is known to require careful memory organization and management. Cache misses and other additional hindering factors with NUMA architectures need to be skillfully addressed in order to reach an efficient scalable implementation. In addition, with an increasingly wide vector registers, an efficient SIMD version should be designed and explicitly implemented. In this paper, we study a direct and explicit implementation of common and novel optimization strategies, and provide a NUMAaware parallelization. Experimental results on a dualsocket INTEL BroadwellE/EP show a noticeably good scalability performance.
 Y. Samadi, M. Zbakh, and C. Tadonki,
Graphbased Model and Algorithm for Minimizing Big Data Movement in a Cloud Environment,
Int. J. High Performance Computing and Networking, 2018.
Abstract. In this paper, we discuss load balancing and data placement strategies in
heterogeneous Cloud environments. Load balancing is crucial in largescale data processing
applications, especially in a distributed heterogeneous context like the Cloud. The main goal
in data placement strategies is to improve the overall performance through the reduction of
data movements among the participating datacenters, taking into account the dependencies.
Typically, datacenters are geographically distributed based on theirs characteristics such
as the processing speed, the storage capacity, among others technical considerations. Load
balancing and efficient data placement on Cloud systems are critical problems, that are
difficult to simultaneously cope with, especially in the emerging heterogeneous clusters. In
this context, we propose a thresholdbased load balancing algorithm, which first balances the
load between datacenters, and afterwards minimizes the overhead of data exchanges. The
proposed approach is divided into three phases. First, the dependencies between the datasets
are identified. Second, the load threshold of each datacenter is estimated based on the
processing speed and the storage capacity. Third, the load balancing between the datacenters
is managed through the threshold parameters. The heterogeneity of the datacenters together
with the dependencies between the datasets are both taken into account. Our experimental
results show that our approach can efficiently reduce the frequency of data movement and
keep a good load balancing between the datacenters.
 Y. Samadi, M. Zbakh, and C. Tadonki,
DTMG: manytoone matching game for tasks scheduling towards resources optimization in cloud computing,
International Journal of Computers and Applications (DOI: 10.1080/1206212X.2018.1519630), 2018.
Abstract. The increasing demand of cloud computing motivates researchers to make cloud environments more efficient for its users and more profitable for the providers. More and more datacenters are being built to cater customers' needs. However, datacenters consume large amounts of energy, and this draws negative attention. Therefore, cloud providers are confronted with great pressures to reduce the energy consumed by datacenters. To address this issue, efficient algorithms to reduce energy consumption and to guarantee the quality of service are needed. In this paper, we propose a load balancing algorithm named DTMG, which aims to reduce energy consumption and maximize the efficiency of the available resources. First, we used the Matching Game Theory model for assigning tasks to datacenters. We then study the optimal operation of the resources by migrating all the tasks of the physical machine under subregime to other physical machine, followed by their systematic switch to standby mode. Experimental results prove that the proposed approach reduces energy consumption and the number of task migration while maintaining the service level agreement in comparison with some existing techniques.
 A. Susungi and C. Tadonki,
Intermediate Representations for Explicitly Parallel Programs,
ACM Computing Surveys, Volume 54, Issue 5 (DOI: https://doi.org/10.1145/3452299), May 2021.
Abstract.
While compilers generally support parallel programming languages and APIs, their internal program representations are mostly designed from the sequential programs standpoint (exceptions include sourcetosource parallel compilers, for instance). This makes the integration of compilation techniques dedicated to parallel programs more challenging. In addition, parallelism has various levels and different targets, each of them with specific characteristics and constraints. With the advent of multicore processors and general purpose accelerators, parallel computing is now a common and pervasive consideration. Thus, software support to parallel programming activities is essential to make this technical transition more realistic and beneficial. The case of compilers is fundamental as they deal with (parallel) programs at a structural level, thus the need for intermediate representations. This article surveys and discusses attempts to provide intermediate representations for the proper support of explicitly parallel programs. We highlight the gap between available contributions and their concrete implementation in compilers and then exhibit possible future research directions.
 L. Bouhouch and C. Tadonki, M. Zbakh,
Dynamic Data Replication and Placement Strategy in Geographically Distributed Data centers,
Concurrency and Computation: Practice and Experience (CCPE)  10.1002/cpe.6858 , 2022
Abstract.
WWith the evolution of geographically distributed data centers in the Cloud Computing landscape along with the amount of data being processed in these data centers, which is growing at an exponential rate, processing massive data applications become an important topic.
Since a given task may require many datasets for its execution and the datasets are spread over several different data centers, finding an efficient way to manage the datasets storage across nodes of a Cloud system is a difficult problem. In fact, the execution time of a task might be influenced by the cost of data transfers, which mainly depends on two criterias. The first one is the initial placement of the input datasets during the buildtime phase, while the second is the replication of the datasets during the runtime phase. The replication is explicitly consider when datasets are being migrated over the data centers in order to make them locally available wherever needed.
Data placement and data replication are important challenges in Cloud Computing. Nevertheless, many studies focus on data placement or data replication exclusively. In this paper, a combination of a data placement strategy followed by a dynamic data replication management strategy is proposed, with the purpose of reducing the associated cost of all data transfers between the (distant) data centers.
Our proposed data placement approach considers the main characteristics of a data center such as storage capacity and read/write speeds to efficiently store the datasets, while our dynamic data replication management approach considers three parameters: the number of replicas in the system, the dependency between datasets and tasks and the storage capacity of data centers. The decision of when and whether to keep or to delete replicas is determined by the fulfillment of those three parameters.
Our approach estimates the total execution time of the tasks as well as the monetary cost, considering the data transfers activity.
Our experiments are conducted using Cloudsim simulator. The obtained results show that our proposed strategies produce an efficient data management by reducing the overheads of the data transfers, compared to both a data placement without replication (by 76%) and the selected data replication approach from Kouidri et al. (by 52%), and by improving the financial cost.
1. Parallel Computing (algorithm, scheduling, complexity, implementation, dynamic system)
Overview. This subset of my outputs is related to general purpose parallel computing. It includes adhoc parallel algorithms, methodology for systematic parallel scheduling, efficient parallel implementation, and parallel dynamic systems. My actual focus in this topic is on the design and analysis of powerful methodologies specific to multicore processors and accelarators based architectures (CELL, GPU, ...), both for domain specific considerations and a wider audience. I keep investigating on fundamental aspects, since new hypothesis came up with emergent architectures and the increasing and pervasive HPC demand.

Claude Tadonki,
Système
d'équations récurrentes et multiplication parallèle d'un vecteur par un
produit tensoriel de matrices,
Rencontres
Francophones de Parallelisme Renpar'11,
Rennes (France), 1999.

Sanjay Rajopadhye, Tanguy
Risset, et Claude Tadonki,
The algebraic path problem revisited,
European Conference on Parallel Computing
Europar99,
Toulouse (France), Lncs SringerVerlag, N° 1685, p. 698707, August 1999.

Claude Tadonki,
Ordonnancements
canoniques,
Renpar12,
Rencontres Francophones de Parallelisme, Besançon (France), Juin 2000.

Claude Tadonki,
Parallel
Cholesky Factorization,
Parallel
Matrix Algorithms and Appliations PMAA
Worshop, Neuchatel (Switzerland), August 2000.

Claude Tadonki, et Bernard
Philippe,
Méthodologie de conception d'algorithmes efficaces pour le
produit tensoriel,
CARI2000,
Tananarive (Madagascar), Octobre 2000.

Patrice Quinton, Claude
Tadonki, et Maurice Tchuente,
Un échéancier systolique et son
utilisation dans l'ATM,
CARI2000,
Tananarive (Madagascar), Octobre 2000.

Claude Tadonki,
Complexité
des ordonnancements canoniques et dérivation d'architecture,
Rencontres Francophones de Parallelisme Renpar13,
Paris (France), Avril 2001
( get it! ).

Claude Tadonki,
A
Recursive Method for Graph Scheduling,
International Symposium on
Parallel and Distributed Computing (SPDC),
Iasi, Romania, July 2002 ( get it! ).

R. Ndoundam, C. Tadonki, and M. Tchuente,
Parallel chip firing game associated with ncube orientation,
International Conference on Computational Science,
ICCS04 (LNCS/Springer), Krakow, Poland, June
2004 .

T. Saidani, J. Falcou, C. Tadonki, L. Lacassagne, and D. Etiemble,
Algorithmic Skeletons within an Embedded Domain Specific Language for the CELL Processor,
Parallel Architectures and Compilation Techniques (PACT),
PACT09, Raleigh, North Carolina (USA), September 1216, 2009. (pdf)

C. Tadonki, G. Grosdidier, and O. Pene,
An efficient CELL library for Lattice Quantum Chromodynamics,
International Workshop on Highly Efficient Accelerators and Reconfigurable Technologies (HEART) in conjunction with the 24th ACM International Conference on Supercomputing (ICS), pp. 6771,
Epochal Tsukuba, Tsukuba, Japan, June 14, 2010. (ACM Computer Architecture News)

C. Tadonki, L. Lacassagne
T. Saidani, J. Falcou, K. Hamidouche,
The Harris algorithm revisited on the CELL processor ,
International Workshop on Highly Efficient Accelerators and Reconfigurable Technologies (HEART) in conjunction with the 24th ACM International Conference on Supercomputing (ICS), pp. 97100,
Epochal Tsukuba, Tsukuba, Japan, June 14, 2010. (ACM Computer Architecture News)

C. Tadonki,
Ring pipelined algorithm for
the algebraic path problem on the CELL Broadband Engine,
Workshop on Applications for Multi and Many Core Architectures (WAMMCA 2010) in conjunction with the International Symposium on Computer Architecture and
High Performance Computing (SBAC PAD 2010),
Petropolis, Rio de Janeiro, Brazil, October 2730, 2010. (IEEE digital library)  abstract  slides  pdf  code

C. Tadonki,
Large Scale Kronecker Product on Supercomputers,
2nd Workshop on Architecture and MultiCore Applications (WAMCA 2011) in conjunction with the International Symposium on Computer Architecture and
High Performance Computing (SBAC PAD 2011),
Vitoria, Espirito Santo, Brazil, October 2629, 2011. (IEEE digital library)  abstract  slides  pdf  code

D. Barthou, G. Grosdidier, M. Kruse, O. Pene and C. Tadonki,
QIRAL: A High Level Language for Lattice QCD Code Generation,
Programming Language Approaches to Concurrency and CommunicationcEntric Software (PLACES'12) in conjunction with the European joint Conference on Theory & Practice of Software (ETAPS),
Tallinn, Estonia, March 24April 1, 2012.

C. Tadonki,
Basic parallel and distributed computing curriculum,
Second NSF/TCPP Workshop on Parallel and Distributed Computing Education (EduPar'12) in conjunction with the 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS),
Shanghai, China, May 2125, 2012.

C. Tadonki, L. Lacassagne, E. Dadi, M. Daoudi
Acceleratorbased implementation of the Harris algorithm,
5th International Conference on Image Processing (ICISP 2012),
Agadir, Morocco, June 2830, 2012.

P.L. Caruana and C. Tadonki
Seamless Parallelism in MATLAB,
Parallel Distributed Computing and Networks,
Innsbruck, Austria, Feb 1618, 2014.

F. Meyer, C. Tadonki, and F. Irigoin
Dendrogram Based Algorithm for Dominated Graph Flooding,
International Conference on Computational Science (ICCS 2014),
Cairns, Australia, June 1012, 2014.

A. Susungi, A. Cohen, and C. Tadonki,
More Data Locality for Static Control Programs on NUMA Architectures,
7th International Workshop on Polyhedral Compilation Techniques (IMPACT 2017),
Stockholm, Sweden, January 23, 2017.

C. Tadonki,
Scalable NUMAAware WilsonDirac on Supercomputers,
International Conference on High Performance Computing & Simulation (HPCS 2017),
Genoa, Italy, July 1721, 2017.
 A. Susungi, N. A. Rink, J. Castrillon, I. Huismann, A. Cohen, C. Tadonki, J. Stiller, J. Frohlich,
Towards Compositional and Generative Tensor Optimizations,
16th International Conference on Generative Programming: Concepts & Experience (GPCE 2017), Vancouver, Canada, October 2324 2017.
 N. A. Rink, A. Susungi, J. Castrillon, I. Huismann, A. Cohen, . Stiller, and C. Tadonki,
CFDlang: Highlevel code generation for highorder methods in fluid dynamics,
International Workshop on Real World Domain Specific Languages 2018 (RWDSL 2018) in conjunction with the CGO'18 international symposium on Code Generation and Optimisation, DOI10.1145/3183895.3183900, Vienna, Austria, February 24, 2018.

O. Haggui, C. Tadonki, F. Sayadi, B. Ouni,
Evaluation of an OpenMP Parallelization of LucasKanade on a NUMAManycore,
9th Workshop on Architecture and MultiCore Applications (WAMCA 2018) in conjunction with the 30th International Symposium on Computer Architecture and
High Performance Computing (SBAC PAD 2011),
Ecole Nationale Superieur de Lyon, Lyon, France, September 2427, 2018.
 A. Susungi, N. A. Rink, A. Cohen, J. Castrillon, C. Tadonki,
Metaprogramming for CrossDomain Tensor Optimizations,
17th International Conference on Generative Programming: Concepts & Experience (GPCE 2017)  (copy of the paper), Boston  Massachusetts, USA, November 56 2018.
 O. Haggui, C. Tadonki, F. Sayadi, B. Ouni,
Efficient GPU Implementation of LucasKanade through OpenACC,
14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP 2019), Prague, Czech Republic, February 2527, 2019.
 O. Haggui, C. Tadonki, F. Sayadi, B. Ouni,
Memory Efficient Deployment of an Optical Flow Algorithm on GPU Using OpenMP,
20th International Conference on Image Analysis AND Processing ( ICIAP 2019), Trento, Italy, 913 September, 2019.
2. Operation Research (algorithm, modeling, method, tool)
Overview. The main concern here is operation research and convex optimization. My work on this topic covers the design and implementation of efficient solvers for both continuous optimization and discrete optimization. The connection between the two universes through geometric, analytic, and algebraic techniques (global optimization, semidefinite programming, spectral theory, ...) is something which makes the topic very exciting, since this synergy has proven to be a good way of tackling difficult combinatorial problems.

L. Drouet, A. Dubois, A. Haurie and C. Tadonki,
A MARKALLite Model for Sustainable Urban Transportation,
Optimization days, Montreal, Canada, May, 2003.

Claude Tadonki,
ProxAccpm: A convex optimization solver,
International Symposium on Mathematical Programing,
ISMP2003, Copengagen, Danmark, August
2003
( get it! ).

O. Briant, C. Lemarechal,K.
Monneris,N. Perrot,C. Tadonki,F.
Vanderbeck,J.P. Vial,C. Beltran,P. Meurdesoif,
Comparison of various approaches for column
generation,
Eigth Aussois Workshop on Combinatorial Optimization, 59 january 2004.

Claude Tadonki and JeanPhilippe Vial,
Efficient algorithm for linear pattern separation,
International Conference on Computational Science,
ICCS04 (LNCS/Springer), Krakow, Poland, June
2004 .

Cesar Beltran, Claude Tadonki, JeanPhilippe Vial,
SemiLagrangian relaxation ,
Computational Management Science Conference and Workshop on Computational Econometrics and Statistics,
Link, Neuchatel, Switzerland, April
2004 .

Claude Tadonki, Cesar Beltran and JeanPhilippe
Vial ,
Portfolio management with integrality constraints,
Computational Management Science Conference and Workshop on Computational Econometrics and Statistics,
Link, Neuchatel, Switzerland, April
2004 .

C. Beltran, C. Tadonki and J.Ph. Vial,
The pmedian problem solved by semiLagrangian
relaxation,
First Mathematical Programming Society International Conference on Continuous Optimization (ICCOPT
I), Troy, USA, August 24, 2004.
3. Scientific and Technical Computing (Sensors network, power aware computing, program comprehension, data analysis, image processing)
Overview. This group is related to specialized algorithms, program optimization and data analysis. My investigations on sensors networks mainly focus on the network topology (disk graph) and the cooperation among sensors (distributed algorithms). Concerning the power aware computing topic, the problem is to reduced the energy dissipated by the execution of a given program, particularly in a context where the energy is a critical resource (embedded systems). My contribution includes combinatorial and analytical methodologies to achieve the task of modeling energy complexity and how to reschedule the algorithm accordingly. Regarding data refinement, this is related to statistical and approximation approaches to improve the matching between experimental data and the model. Further steps in experimental research are sensitive to this agreement.

Claude Tadonki, Mitali Singh, Jose Rolim and Viktor K. Prasanna,
Combinatorial Techniques for Memory Power State Scheduling in Energy Constrained
Systems,
Workshop on Approximation and Online Algorithms
(WAOA),
WAOA2003 (LNCS/Springer), Budapest, Hungary, September
2003 .

Claude Tadonki and Jose Rolim ,
An analytical model for energy minimization,
III Workshop on Efficient and Experimental Algorithms,
WEA04 (LNCS/Springer), Angra dos Reis, Rio de Janeiro, Brazil, May
2004.

Claude Tadonki ,
Universal Report: A Generic Reverse Engineering Tool
,
12th IEEE International Workshop on Program
Comprehension,
IWPC 2004 (IEEE), University of Bari, Bari, Italy , June
2004 .

Claude Tadonki and Jose Rolim,
An integer
programming heuristic for the dual power management problem in wireless
sensor networks,
2nd International Workshop on Managing Ubiquitous Communications and Services,
MUCS2004, Dublin, Ireland,
December 13, 2004.

Claude Tadonki,
Refinement experiments with RADDAM data,
EMBL bilateral meeting,
Hamburg, Germany,
June 2628, 2006.

Claude Tadonki,
Offline settings in
wireless networks,
3rd International Symposium on Computational Intelligence and Intelligent Informatics,
ISCIII2007, Agadir, Morocco,
March 2830, 2007.

E. Dadi, M. Daoudi, C. Tadonki
3D Shape Retrieval using Bagoffeature method basing on local codebooks,
5th International Conference on Image Processing (ICISP 2012),
Agadir, Morocco, June 2830, 2012.

E. Dadi, M. Daoudi, C. Tadonki
Fast 3D shape retrieval method for classified databases,
International Conference on Complex Systems (ICCS'12),
Agadir, Morocco, November 56, 2012.

A. Leite, C. Tadonki, C. Eisenbeis, T. Raiol, M.E. Walter, and A. de Melo
Excalibur: An Autonomic Cloud Architecture for Executing Parallel Applications,
Fourth International Workshop on Cloud Data and Platforms (CloudDP 2014),
Amsterdam, Netherlands, April 13, 2014.

A. Leite, C. Tadonki, C. Eisenbeis, and A. de Melo
A Finegrained Approach for Power Consumption Analysis and Prediction,
International Conference on Computational Science (ICCS 2014),
Cairns, Australia, June 1012, 2014.

A. F. Leite, V. Alves, G. N. Rodrigues, C. Tadonki, C. Eisenbeis, A. C. M. A. de Melo
Automating Resource Selection and Configuration in Interclouds through a Software Product Line Method,
8th IEEE International Conference on Cloud Computing, CLOUD 2015, New York City, NY, USA, June 27  July 2, 2015.

Y. Samadi, M. Zbakh, C. Tadonki
Comparative study between Hadoop and Spark based on Hibench benchmarks,
2nd International Conference on Cloud Computing Technologies and Applications (CloudTech 2016), Marrakesh, Morocco, 2426 May, 2016.

A. F. Leite, V. Alves, G. N. Rodrigues, C. Tadonki, C. Eisenbeis, A. C. M. A. de Melo
ADohko: An Autonomic System for Provision, Configuration, and Management of Inter
Cloud Environments based on a Software Product Line Engineering Method,
IEEE International Conference on Cloud and Autonomic Computing, CICCAC 2016, Augsburg, Germany, September 1216, 2016.

P. Kiepas, J. Kozlak, C. Tadonki and C. Ancourt,
Profilebased Vectorization for MATLAB,
5th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming ARRAY 2018, Philadelphia, USA  June 19, 2018.

Yassir Samadi, Mostapha Zbakh, and C. Tadonki,
Workflow Scheduling Issues and Techniques in Cloud Computing: A Systematic Literature Review,
Cloud Computing and Big Data: Technologies, Applications and Security, Zbakh, M., Essaaidi, M., Manneback, P., Rong, C. (Eds.), ISBN 9783319977195, Springer, 2018.

Y. Samadi, M. Zbakh, C. Tadonki
Analyzing fault tolerance mechanism of Hadoop Mapreduce under different type of failures,
4th International Conference on Cloud Computing Technologies and Applications (CloudTech 2018), Brussels  Belgium, 2628 November, 2018.

Patryk Kiepas (MINES ParisTech / PSL University), Corinne Ancourt, C. Tadonki, and Jarosław Koźlak(AGH University of Science and Technology, Kraków)
Using performance event profiles to deduce an execution model of MATLAB with JustInTime compilation,
32nd Workshop on Languages and Compilers for Parallel Computing (LCPC 2019), ATLANTA  USA, OCTOBER 2224, 2019.

L. Bouhouch, M. Zbakh, C. Tadonki
Data Migration  Cloudsim Extension,
3rd International Conference on Big Data Research (ICBDR 2019), Paris  France, 2022 November, 2019.
3rd International Conference on Big Data Research (ICBDR 2019), Nov 2019, Paris, France
The Kronecker product, also called tensor product, is
a fundamental matrix algebra operation, which is widely
used as a natural formalism to express a convolution of
many interactions or representations. Given a set of matrices,
we need to multiply their Kronecker product by a
vector. This operation is a critical kernel for iterative algorithms,
thus needs to be computed efficiently. In a previous
work, we have proposed a cost optimal parallel algorithm
for the problem, both in terms of floating point computation
time and interprocessor communication steps. However, the
lower bound of data transfers can only be achieved if we
really consider (local) logarithmic broadcasts. In practice,
we consider a communication loop instead. Thus, it becomes
important to care about the real cost of each broadcast.
As this local broadcast is performed simultaneously
by each processor, the situation is getting worse on a large
number of processors (supercomputers). We address the
problem in this paper in two points. In one hand, we propose
a way to build a virtual topology which has the lowest
gap to the theoretical lower bound. In the other hand, we
consider a hybrid implementation, which has the advantage
of reducing the number of communicating nodes. We illustrate
our work with some benchmarks on a large SMP
8Core supercomputer.