INVESTIGACIÓN EN (JNIC2021 LIVE) - Manuel A. Serrano - Eduardo Fernández-Medina Cristina Alcaraz - Noemí de Castro - Guillermo Calvo - UCLM

Página creada Andreo Mercadal
 
SEGUIR LEYENDO
INVESTIGACIÓN EN (JNIC2021 LIVE) - Manuel A. Serrano - Eduardo Fernández-Medina Cristina Alcaraz - Noemí de Castro - Guillermo Calvo - UCLM
EDITORES:
  Manuel A. Serrano - Eduardo Fernández-Medina
Cristina Alcaraz - Noemí de Castro - Guillermo Calvo

        Actas de las VI Jornadas Nacionales
                   (JNIC2021 LIVE)

          INVESTIGACIÓN EN
Investigación en Ciberseguridad
 Actas de las VI Jornadas Nacionales
         ( JNIC2021 LIVE)

     Online 9-10 de junio de 2021
   Universidad de Castilla-La Mancha
Investigación en Ciberseguridad
  Actas de las VI Jornadas Nacionales
          ( JNIC2021 LIVE)

       Online 9-10 de junio de 2021
    Universidad de Castilla-La Mancha

                Editores:
           Manuel A. Serrano,
       Eduardo Fernández-Medina,
             Cristina Alcaraz
            Noemí de Castro
            Guillermo Calvo

                Cuenca, 2021
© de los textos e ilustraciones: sus autores
© de la edición: Universidad de Castilla-La Mancha

Edita: Ediciones de la Universidad de Castilla-La Mancha.

Colección JORNADAS Y CONGRESOS n.º 34
© de los textos: sus autores.
© de la edición: Universidad de Castilla-La Mancha.
© de los textos e ilustraciones: sus autores
© de la edición: Universidad de Castilla-La Mancha
Edita: Ediciones Esta
                 de la Universidad
                       editorial esdemiembro
                                     Castilla-Lade
                                                 Mancha
                                                   la UNE, lo que garantiza la difusión y
comercialización de sus publicaciones a nivel nacional e internacional
Colección JORNADAS
Edita: Ediciones     de YlaCONGRESOS
                           Universidad de n.º Castilla-La
                                              34          Mancha.

Colección JORNADAS Y CONGRESOS n.º 34
I.S.B.N.: 978-84-9044-463-4
          Esta editorial es miembro de la UNE, lo que garantiza la difusión y comercialización de sus publi-
          caciones a nivel nacional e internacional.
D.O.I.: http://doi.org/10.18239/jornadas_2021.34.00

                 Esta editorial es miembro de la UNE, lo que garantiza la difusión y
I.S.B.N.: 978-84-9044-463-4
comercialización      de sus publicaciones a nivel
                                                nacional e internacional
D.O.I.: http://doi.org/10.18239/jornadas_2021.34.00

I.S.B.N.:
Esta obra 978-84-9044-463-4
          se encuentra bajo una licencia internacional Creative Commons CC BY 4.0.
D.O.I.: http://doi.org/10.18239/jornadas_2021.34.00
Cualquier forma de reproducción, distribución, comunicación pública o transformación de esta
Esta obra se encuentra bajo una licencia internacional Creative Commons CC BY 4.0.
obra no incluida en la licencia Creative Commons CC BY 4.0 solo puede ser realizada con la
Cualquier forma de reproducción, distribución, comunicación pública o transformación de esta obra no incluida
autorización
en               expresa
   la licencia Creative    de los CC
                        Commons   titulares,  salvopuede
                                      BY 4.0 solo    excepción     prevista
                                                         ser realizada con la por la ley. Puede
                                                                              autorización         Vd.
                                                                                           expresa de losacceder
                                                                                                          titulares, al
texto    completo    de la licencia  en este  enlace:
salvo excepción prevista por la ley. Puede Vd. acceder al texto completo de la licencia en este enlace: https://
https://creativecommons.org/licenses/by/4.0/deed.es
creativecommons.org/licenses/by/4.0/deed.es

Esta obra se encuentra bajo una licencia internacional Creative Commons CC BY 4.0.
Hecho en España (U.E.) – Made in Spain (E.U.)
Hecho   en España
Cualquier  forma de(U.E.)  – Made indistribución,
                    reproducción,     Spain (E.U.)comunicación pública o transformación de esta
obra no incluida en la licencia Creative Commons CC BY 4.0 solo puede ser realizada con la
autorización expresa de los titulares, salvo excepción prevista por la ley. Puede Vd. acceder al
texto completo de la licencia en este enlace:
https://creativecommons.org/licenses/by/4.0/deed.es

Hecho en España (U.E.) – Made in Spain (E.U.)
Bienvenida del Comité Organizador

    Tras la parada provocada por la pandemia en 2020, las VI Jornadas Nacionales de Investiga-
ción en Ciberseguridad ( JNIC) vuelven el 9 y 10 de Junio del 2021 con energías renovadas, y por
primera vez en su historia, en un formato 100% online. Esta edición de las JNIC es organizada
por los grupos GSyA y Alarcos de la Universidad de Castilla-La Mancha en Ciudad Real, y
con la activa colaboración del comité ejecutivo, de los presidentes de los distintos comités de
programa y del Instituto Nacional de Ciberseguridad (INCIBE). Continúa de este modo la
senda de consolidación de unas jornadas que se celebraron por primera vez en León en 2015 y
le siguieron Granada, Madrid, San Sebastián y Cáceres, consecutivamente hasta 2019, y que,
en condiciones normales se habrían celebrado en Ciudad Real en 2020.
    Estas jornadas se han convertido en un foro de encuentro de los actores más relevantes en
el ámbito de la ciberseguridad en España. En ellas, no sólo se presentan algunos de los trabajos
científicos punteros en las diversas áreas de ciberseguridad, sino que se presta especial atención
a la formación e innovación educativa en materia de ciberseguridad, y también a la conexión
con la industria, a través de propuestas de transferencia de tecnología. Tanto es así que, este año
se presentan en el Programa de Transferencia algunas modificaciones sobre su funcionamiento
y desarrollo que han sido diseñadas con la intención de mejorarlo y hacerlo más valioso para
toda la comunidad investigadora en ciberseguridad.
   Además de lo anterior, en las JNIC estarán presentes excepcionales ponentes (Soledad
Antelada, del Lawrence Berkeley National Laboratory, Ramsés Gallego, de Micro Focus y
Mónica Mateos, del Mando Conjunto de Ciberdefensa) mediante tres charlas invitadas y se
desarrollarán dos mesas redondas. Éstas contarán con la participación de las organizaciones
más relevantes en el panorama industrial, social y de emprendimiento en relación con la ciber-
seguridad, analizando y debatiendo el papel que está tomando la ciberseguridad en distintos
ámbitos relevantes.
    En esta edición de JNIC se han establecido tres modalidades de contribuciones de inves-
tigación, los clásicos artículos largos de investigación original, los artículos cortos con investi-
gación en un estado más preliminar, y resúmenes extendidos de publicaciones muy relevantes
y de alto impacto en materia de ciberseguridad publicados entre los años 2019 y 2021. En el
caso de contribuciones de formación e innovación educativa, y también de transferencias se
han considerado solamente artículos largos. Se han recibido para su valoración un total de 86

                                                 7
Bienvenida del Comité Organizador

contribuciones organizadas en 26, 27 y 33 artículos largos, cortos y resúmenes ya publicados,
de los que los respectivos comités de programa han aceptado 21, 19 y 27, respectivamente. En
total se ha contado con una ratio de aceptación del 77%. Estas cifras indican una participación
en las jornadas que continúa creciendo, y una madurez del sector español de la ciberseguridad
que ya cuenta con un volumen importante de publicaciones de alto impacto.
    El formato online de esta edición de las jornadas nos ha motivado a organizar las jornadas
de modo más compacto, distinguiendo por primera vez entre actividades plenarias (charlas
invitadas, mesas redondas, sesión de formación e innovación educativa, sesión de transfe-
rencia de tecnología, junto a inauguración y clausura) y sesiones paralelas de presentación de
artículos científicos. En concreto, se han organizado 10 sesiones de presentación de artículos
científicos en dos líneas paralelas, sobre las siguientes temáticas: detección de intrusos y gestión
de anomalías (I y II), ciberataques e inteligencia de amenazas, análisis forense y cibercrimen,
ciberseguridad industrial, inteligencia artificial y ciberseguridad, gobierno y riesgo, tecnologías
emergentes y entrenamiento, criptografía, y finalmente privacidad.
    En esta edición de las jornadas se han organizado dos números especiales de revistas con
elevado factor de impacto para que los artículos científicos mejor valorados por el comité de
programa científico puedan enviar versiones extendidas de dichos artículos. Adicionalmente, se
han otorgado premios al mejor artículo en cada una de las categorías. En el marco de las JNIC
también hemos contado con la participación de la Red de Excelencia Nacional de Investigación
en Ciberseguridad (RENIC), impulsando la ciberseguridad a través de la entrega de los premios
al Mejor Trabajo Fin de Máster en Ciberseguridad y a la Mejor Tesis Doctoral en Ciberseguridad. Tam-
bién se ha querido acercar a los jóvenes talentos en ciberseguridad a las JNIC, a través de un CTF
(Capture The Flag) organizado por la Universidad de Extremadura y patrocinado por Viewnext.
    Desde el equipo que hemos organizado las JNIC2021 queremos agradecer a todas aquellas
personas y entidades que han hecho posible su celebración, comenzando por los autores de
los distintos trabajos enviados y los asistentes a las jornadas, los tres ponentes invitados, las
personas y organizaciones que han participado en las dos mesas redondas, los integrantes de
los distintos comités de programa por sus interesantes comentarios en los procesos de revisión
y por su colaboración durante las fases de discusión y debate interno, los presidentes de las
sesiones, la Universidad de Extremadura por organizar el CTF y la empresa Viewnext por
patrocinarlo, los técnicos del área TIC de la UCLM por el apoyo con la plataforma de comu-
nicación, los voluntarios de la UCLM y al resto de organizaciones y entidades patrocinadoras,
entre las que se encuentra la Escuela Superior de Informática, el Departamento de Tecnologías
y Sistemas de Información y el Instituto de Tecnologías y Sistemas de Información, todos ellos
de la Universidad de Castilla-La Mancha, la red RENIC, las cátedras (Telefónica e Indra)
y aulas (Avanttic y Alpinia) de la Escuela Superior de Informática, la empresa Cojali, y muy
especialmente por su apoyo y contribución al propio INCIBE.

                                            Manuel A. Serrano, Eduardo Fernández-Medina
                                                            Presidentes del Comité Organizador
                                                                                 Cristina Alcaraz
                                                 Presidenta del Comité de Programa Científico
                                                                                Noemí de Castro
                   Presidenta del Comité de Programa de Formación e Innovación Educativa
                                                                        Guillermo Calvo Flores
                                           Presidente del Comité de Transferencia Tecnológica

                                                 8
Índice General

Comité Ejecutivo..............................................................................................       11
Comité Organizador........................................................................................           12
Comité de Programa Científico.......................................................................                 13
Comité de Programa de Formación e Innovación Educativa...........................                                    15
Comité de Transferencia Tecnológica...............................................................                   17

Comunicaciones
Sesión de Investigación A1: Detección de intrusiones y gestión de anomalías I                                        21
Sesión de Investigación A2: Detección de intrusiones y gestión de anomalías II                                       55
Sesión de Investigación A3: Ciberataques e inteligencia de amenazas.............                                     91
Sesión de Investigación A4: Análisis forense y cibercrimen.............................                             107
Sesión de Investigación A5: Ciberseguridad industrial y aplicaciones..............                                  133
Sesión de Investigación B1: Inteligencia Artificial en ciberseguridad...............                                157
Sesión de Investigación B2: Gobierno y gestión de riesgos..............................                             187
Sesión de Investigación B3: Tecnologías emergentes y entrenamiento en
ciberseguridad...................................................................................................   215
Sesión de Investigación B4: Criptografía..........................................................                  235
Sesión de Investigación B5: Privacidad.............................................................                 263
Sesión de Transferencia Tecnológica................................................................                 291
Sesión de Formación e Innovación Educativa..................................................                        301

Premios RENIC..............................................................................................         343

Patrocinadores.................................................................................................     349

                                                                 9
Comité Ejecutivo

Juan Díez González                 INCIBE
Luis Javier García Villalba        Universidad de Complutense de Madrid
Eduardo Fernández-Medina Patón     Universidad de Castilla-La Mancha
Guillermo Suárez-Tangil            IMDEA Networks Institute
Andrés Caro Lindo                  Universidad de Extremadura
Pedro García Teodoro               Universidad de Granada. Representante
                                   de red RENIC
Noemí de Castro García             Universidad de León
Rafael María Estepa Alonso         Universidad de Sevilla
Pedro Peris López                  Universidad Carlos III de Madrid

                                    11
Comité Organizador

Presidentes del Comité Organizador
   Eduardo Fernández-Medina Patón       Universidad de Castilla-la Mancha
   Manuel Ángel Serrano Martín          Universidad de Castilla-la Mancha

Finanzas
   David García Rosado                  Universidad de Castilla-la Mancha
   Luis Enrique Sánchez Crespo          Universidad de Castilla-la Mancha

Actas
  Antonio Santos-Olmo Parra             Universidad de Castilla-la Mancha

Difusión
   Julio Moreno García-Nieto            Universidad de Castilla-la Mancha
   José Antonio Cruz Lemus              Universidad de Castilla-la Mancha
   María A Moraga de la Rubia           Universidad de Castilla-la Mancha

Webmaster
  Aurelio José Horneros Cano            Universidad de Castilla-la Mancha

Logística y Organización
  Ignacio García-Rodriguez de Guzmán    Universidad de Castilla-la Mancha
  Ismael Caballero Muñoz-Reja           Universidad de Castilla-la Mancha
  Gregoria Romero Grande                Universidad de Castilla-la Mancha
  Natalia Sanchez Pinilla               Universidad de Castilla-la Mancha

                                       12
Comité de Programa Científico

Presidenta
   Cristina Alcaraz Tello                 Universidad de Málaga

Miembros
  Aitana Alonso Nogueira                  INCIBE
  Marcos Arjona Fernández                 ElevenPaths
  Ana Ayerbe Fernández-Cuesta             Tecnalia
  Marta Beltrán Pardo                     Universidad Rey Juan Carlos
  Carlos Blanco Bueno                     Universidad de Cantabria
  Jorge Blasco Alís                       Royal Holloway, University of London
  Pino Caballero-Gil                      Universidad de La Laguna
  Andrés Caro Lindo                       Universidad de Extremadura
  Jordi Castellà Roca                     Universitat Rovira i Virgili
  José M. de Fuentes García-Romero
  de Tejada                               Universidad Carlos III de Madrid
  Jesús Esteban Díaz Verdejo              Universidad de Granada
  Josep Lluis Ferrer Gomila               Universitat de les Illes Balears
  Dario Fiore                             IMDEA Software Institute
  David García Rosado                     Universidad de Castilla-La Mancha
  Pedro García Teodoro                    Universidad de Granada
  Luis Javier García Villalba             Universidad Complutense de Madrid
  Iñaki Garitano Garitano                 Mondragon Unibertsitatea
  Félix Gómez Mármol                      Universidad de Murcia
  Lorena González Manzano                 Universidad Carlos III de Madrid
  María Isabel González Vasco             Universidad Rey Juan Carlos I
  Julio César Hernández Castro            University of Kent
  Luis Hernández Encinas                  CSIC
  Jorge López Hernández-Ardieta           Banco Santander
  Javier López Muñoz                      Universidad de Málaga
  Rafael Martínez Gasca                   Universidad de Sevilla
  Gregorio Martínez Pérez                 Universidad de Murcia

                                     13
David Megías Jiménez               Universitat Oberta de Cataluña
Luis Panizo Alonso                 Universidad de León
Fernando Pérez González            Universidad de Vigo
Aljosa Pasic                       ATOS
Ricardo J. Rodríguez               Universidad de Zaragoza
Fernando Román Muñoz               Universidad Complutense de Madrid
Luis Enrique Sánchez Crespo        Universidad de Castilla-La Mancha
José Soler                         Technical University of Denmark-DTU
Miguel Soriano Ibáñez              Universidad Politécnica de Cataluña
Victor A. Villagrá González        Universidad Politécnica de Madrid
Urko Zurutuza Ortega               Mondragon Unibertsitatea
Lilian Adkinson Orellana           Gradiant
Juan Hernández Serrano             Universitat Politécnica de Cataluña

                              14
Comité de Programa de Formación e Innovación Educativa

Presidenta
   Noemí De Castro García                Universidad de León

Miembros
   Adriana Suárez Corona                 Universidad de León
   Raquel Poy Castro                     Universidad de León
   José Carlos Sancho Núñez              Universidad de Extremadura
   Isaac Agudo Ruiz                      Universidad de Málaga
   Ana Isabel González-Tablas Ferreres   Universidad Carlos III de Madrid
   Xavier Larriva                        Universidad Politécnica de Madrid
   Ana Lucila Sandoval Orozco            Universidad Complutense de Madrid
   Lorena González Manzano               Universidad Carlos III de Madrid
   María Isabel González Vasco           Universidad Rey Juan Carlos
   David García Rosado                   Universidad de Castilla - La Mancha
   Sara García Bécares                   INCIBE

                                          15
Comité de Transferencia Tecnológica

Presidente
   Guillermo Calvo Flores        INCIBE

Miembros
   José Luis González Sánchez    COMPUTAEX
   Marcos Arjona Fernández       ElevenPaths
   Victor Villagrá González      Universidad Politécnica de Madrid
   Luis Enrique Sánchez Crespo   Universidad de Castilla – La Mancha

                                        17
Sesión de Investigación B4: Criptografía
                                              http://doi.org/10.18239/jornadas_2021.34.54

 Homomorphic SVM Inference for Fraud Detection
A. Vázquez-Saavedra ID , G. Jiménez-Balsa ID , J. Loureiro-Acuña ID M. Fernández-Veiga ID , A. Pedrouzo-Ulloa ID
                                    Gradiant                                             atlanTTic Research Center, Universidade de Vigo
            Carretera de Vilar, 56, Vigo, Pontevedra, 36214, Spain                           E.E. Telecomunicación, Vigo 36310, Spain
               {avsaavedra, gjimenez, jloureiro}@gradiant.org                              mveiga@det.uvigo.es, apedrouzo@gts.uvigo.es

   Abstract—Nowadays, cloud computing has become a very                       medical data, religious information, etc. This type of data
promising solution for almost all companies, as it offers the pos-            is protected by law and, as companies could be exposed to
sibility of saving costs by outsourcing computation on-demand.                different sanctions, they are forced to guarantee a certain
However, some companies deal with private information, which
must be protected before outsourcing. Banks, whose financial                  level of protection. Even so, although conventional crypto-
information is highly sensitive, are one remarkable example of                graphic techniques allow for secure storage in the cloud,
this problem. Their typical processes must be run on their                    data protection is not so easy if some sort of computation
systems for security and regulation reasons, which impedes                    has to be applied on the data. Thus, in view of all these
to take advantage of the scalability and flexibility benefits                 shortcomings, companies which handle sensitive data should
introduced by the cloud. A relevant example on which we
focus in this work is the case of fraud detection systems, for                seemingly avoid cloud-based solutions.
which we propose the use of modern lattice-based homomorphic                     Precisely, the field of privacy-preserving machine learning
encryption for its secure execution. To this end, we implement                (PPML) [1] deals with the different security and privacy
and validate the performance of a homomorphic SVM (Support                    threats appearing in machine learning (ML). Actually, paying
Vector Machine) classifier for secure fraud detection, showing
                                                                              attention to the scenario of cloud computing, and among
the feasibility of securely outsourcing fraud detection inference.
                                                                              the broad set of available tools inside PPML, homomorphic
  Index Terms—Support Vector Machines, Fraud Detection,                       encryption techniques, which enable secure processing on
Homomorphic Encryption, Lattice-based Cryptography                            encrypted data, seem to be a perfect fit for secure outsourcing.
  Type of contribution: Ongoing research
                                                                              A. Our Contributions
                         I. I NTRODUCTION
   The impact of cloud computing on industry and also end                        Our scenario is set inside a banking fraud context [2], on
users is difficult to estimate: many aspects of everyday life                 which banks make use of fraud detection systems that require
have been transformed by the omnipresence of software run-                    a large amount of resources to operate. Bank transactions do
ning on cloud networks. On the one hand, cloud computing al-                  not follow a uniform time distribution, so a good solution
lows companies to optimise costs and increase their offerings,                for banks is to outsource their fraud detection systems, which
without having the need of purchasing and managing all the                    allows them to take advantage of the flexibility and scalability
hardware and software. This allows them to launch globally-                   provided by the cloud.
available apps and online services without having to spend                       Our main contribution is to implement and showcase the
resources on the platform on which they will run. On the other                feasibility of a solution based on homomorphic cryptography
hand, end users can access these applications immediately                     (see Section II), which enables to securely outsource the
and without any specific requirements, as all these services                  fraud detection systems. In particular, we have implemented
are easily available on the Internet. Unfortunately, not all                  a SVM (Support Vector Machine) inference algorithm in the
companies can benefit from the cloud computing approach.                      encrypted domain (see Sections III and IV); being this classi-
   A major disadvantage which is present in cloud-based                       fier tailored for bank fraud detection. Our homomorphic SVM
applications is their underlying security. Actually, the use of               classifier allows a bank to make secure encrypted predictions
cloud-based services always leads to the storage of informa-                  in an untrusted environment.
tion in third party systems, which could be often considered                     In order to test its feasibility: (1) we have designed a proof
as untrusted environments. Although, in general, this is not a                of concept based on a client-server model (see Section IV)
problem for many applications, the situation is considerably                  and, (2) we have evaluated the runtime and classification
aggravated in those use cases which deal with especially                      performance of the system (see Section V).
privacy-sensitive information, on which security is a priority                   Notation and Structure: We represent vectors and matrices
and must be maximised.                                                        by boldface lowercase and uppercase letters, respectively.
   The list of related use cases encompasses companies that                   Polynomials are denoted with regular lowercase letters, ignor-
must handle sensitive information, such as financial and                      ing the polynomial variable (e.g., a instead of a(z)). Finally,
                                                                              the Hadamard product of two vectors is a ◦ b.
This work was funded by the Ayudas Cervera para Centros Tecnológicos,           The rest of the paper is organized as follows: Section II
grant of the Spanish Centre for the Development of Industrial Technology      briefly reviews the used homomorphic encryption scheme.
(CDTI) under the project EGIDA (CER-20191012). Also funded by the Agen-       Section III details the bank fraud detection scenario, the
cia Estatal de Investigación (Spain) and the European Regional Development
Fund (ERDF) under project RODIN (PID2019-105717RB-C21), and by the            dataset and the SVM classifier. Section IV introduces the
Xunta de Galicia and ERDF under project ED431G2019/08.                        design of our system and the homomorphic classifier. Finally,

                                                                        245
Sesión de Investigación B4: Criptografía

                               TABLE I
           H IGH LEVEL DESCRIPTION OF THE CKKS SCHEME [9]                                                 banks could outsource this information securely and take
 Parameters: Let Rq [z] be the quotient polynomial ring Zq [z]/(1 + z n ). Then, ciphertexts from the     advantage of the benefits provided by the cloud.
 ECKKS scheme are composed of two polynomial elements from Rq [z], and plaintexts belong to the
 set CP such that P = n/2. Finally, q and n (for simplicity we omit some internal parameters) are
 chosen in terms of the security parameter λ.                                                             A. Dataset
                        Input: Security parameter λ
 ECKKS .SecKeyGen
                        Output: Secret key sk                                                                The dataset used for this paper is called IEEE-CIS Fraud
 ECKKS .PubKeyGen
                        Input: Secret key sk
                        Output: Public key pk, evaluation key evk and rotation key rtk
                                                                                                          Detection [10]. It is composed of real-world e-commerce
 ECKKS .Enc
                        Input: Plaintext m ∈ CP                                                           transactions provided by Vesta, a leader payment service com-
                        Output: A ciphertext c = (c0 , c1 ) ∈ Rq2 [z]
                        Input: A ciphertext c = (c0 , c1 ) ∈ Rq2 [z] encrypting a plaintext m ∈ CP
                                                                                                          pany. The dataset is divided into 4 files train transaction.csv,
 ECKKS .Dec
                        Output: A plaintext m ∈ CP                                                       train identity.csv, test transaction.csv and test identity.csv.
                        Input: Ciphertexts c and c encrypting, respectively, m and m
 ECKKS .Add             Output: A ciphertext c = (c         2                        
                                                      0 , c1 ) ∈ Rq [z] encrypting m + m = m ∈
                                                                                                          However, we only use the files train transaction.csv and
                        CP                                                                                train identity.csv, as the other two are unlabeled and we could
                        Input: A plaintext m ∈ CP and a ciphertext c encrypting m
 ECKKS .LinHadMult      Output: A ciphertext c = (c        2                        
                                                     0 , c1 ) ∈ Rq [z] encrypting m ◦ m = m ∈
                                                                                                          not use the samples they contain to test the results of the
                        CP
                        Input: Evaluation key evk, and ciphertexts c and c encrypting, respectively,
                                                                                                          inference. Consequently, these two training files are combined,
 ECKKS .HadMult
                        m and m                                                                          finally obtaining a dataset with 590540 rows and 434 features.
                        Output: A ciphertext c = (c                                
                                                     0 , c1 ) ∈ Rq [z] encrypting m ◦ m = m ∈
                                                                 2                               

                        CP
                        Input: A ciphertext c encrypting m ∈ CP                                           B. Dataset pre-processing
 ECKKS .Rot             Output: A ciphertext c = (c0 , c1 ) ∈ Rq2 [z] encrypting m ∈ CP , which is
                        the result of applying a rotation over the components of m                           Given the large dimensionality of the training dataset, an
                                                                                                          adequate pre-processing is required before training. As our
Section V evaluates the performance of our system, and                                                    main objective in this work focuses on implementing the SVM
Section VI discusses some future work lines.                                                              inference function in the encrypted domain, we have followed
                                                                                                          the guidelines for data pre-processing and feature selection
     II. P RELIMINARIES : H OMOMORPHIC E NCRYPTION
                                                                                                          proposed in [10]. According to their recommendations:
   Homomorphic encryption appears as a very promising tool                                                   • The number of features is reduced from 434 to 115.
for secure processing [3]. One relevant example is the Paillier                                              • The columns with missing values are filled with the
cryptosystem [4], which has been used in a broad range of                                                      mean, mode or the most frequent category among the
different applications [5]. It presents a group homomorphism                                                   present values in the corresponding column.
which enables additions between two encrypted values, and
multiplications between a ciphertext and a plaintext.                                                     C. SVM linear kernel
   Consequently, Paillier could be a perfect candidate to                                                    We have chosen a Support Vector Machine (SVM) clas-
implement our proposed encrypted detector for bank fraud                                                  sifier due to its good performance for the fraud detection
detection. However, modern lattice-based cryptosystems out-                                               scenario [10]. SVMs correspond to a type of binary classifiers
perform Paillier in several aspects: (1) more complex applica-                                            which, given a set of training samples, maximize the gap
tions are possible as they present a ring homomorphism which                                              between the two considered groups (e.g., 1 fraud vs −1 non-
enables multiplication between two ciphertexts (e.g., the train-                                          fraud in our scenario).
ing of a SVM [6]), and (2) better performance, as Paillier is                                                Specifically, in this work we are interested in the expression
slower even when considering only linear operations [7], [8].                                             of the SVM classifier:
   In view of the above points, we make use of the CKKS                                                                                                
scheme [9], which is a lattice-based cryptosystem especially                                                                   L                     
adapted to work with approximate arithmetic operations.                                                                                               
                                                                                                                          sign     yi αi K(xi , x) + b,              (1)
                                                                                                                               i=1                    
A. A concrete homomorphic encryption scheme                                                                                                        
                                                                                                                                          score
   We give a brief description of the CKKS scheme ECKKS in
Table I. It is defined as a conventional public-key encryption                                            where x is the input sample, and the rest of parameters are
scheme E = {SecKeyGen, PubKeyGen, Enc, Dec}, but ex-                                                      obtained through the SVM training: b is the bias, αi are the
tended with Add, HadMult, LinHadMult and Rot procedures.                                                  dual coefficients, and xi and yi are, respectively, the support
For further details of the scheme we refer the reader to [9].                                             vectors together with their associated label (1 or −1). We refer
                                                                                                          to [11] for more details on the training of SVMs.
                                     III. U SE CASE                                                          However, due to the limitations of homomorphic encryption
   Banks use fraud detection systems that require a large                                                 schemes, two additional points must be considered:
amount of computational resources. This results in significant                                               • The sign(·) function can be very costly to be homomor-
costs for banks to keep their systems up and running. One                                                      phically computed. As it does not introduce a relevant
possible approach for banks is to outsource these systems to                                                   overhead, we have moved it to the client side, which
the cloud where, in addition to saving costs, they would obtain                                                calculates it after decryption.
more flexible systems.                                                                                       • Among the possible choices for the kernel K(·, ·) func-
   However, banking information is sensitive and traditional                                                   tion, we make use of a linear kernel K(u, v) = u · v;
techniques only allow to work with data in clear, what can be a                                                mainly due to its simplicity and good behaviour for
security problem. Consequently, we propose to implement the                                                    the fraud detection problem [10]. Additionally, it brings
evaluation phase of an important machine learning algorithm                                                    about an important efficiency advantage thanks to the fact
such as Support Vector Machines (SVM) in the encrypted                                                         that the inner product is distributive over vector addition,
domain. Therefore, by directly working with encrypted data,                                                    which enables to considerably simplify the classifier.

                                                                                                    246
Sesión de Investigación B4: Criptografía

  Consequently, taking into account the above changes, our            Microsoft Seal [12] and Lattigo [13] libraries. For running the
homomorphic classifier turns out to be:                               homomorphic inference, the following steps must be followed:
                                                                      1) The Client authenticates to the server
                    
         score =        yi αi xi ·   x      + b.   (2)              2) The Client creates a key pair and sends its public part
                     i             input sample                            to the server
                             
                    linear SVM model
                                                                        3) The Client packs and encrypts the data samples that
                                                                           wants to infer and sends them to the server
For more details on its homomorphic computation and the dif-            4) The Server retrieves the SVM model and public keys
ferent packing methods, we refer the reader to Section IV-B.               from database
            IV. SVM H OMOMORPHIC I NFERENCE                             5) The Server does the Homomorphic SVM Inference and
                                                                           sends the result to the client
   So as to showcase the potential and practicability of homo-          6) The Client decrypts the result and shows it to the user
morphic encryption for PPML, and more specifically, for the
encrypted execution of SVM inference, we have implemented             B. Data packing
a client-server prototype. Starting from the use case, bank              As the CKKS scheme [9] allows for homomorphic approx-
fraud detection with a SVM, the needed homomorphic oper-              imate additions and Hadamard products between encrypted
ations for both encryption/decryption and for the encrypted           vectors (see Section II), finding the best way of packing
inference using a linear kernel have been implemented.                the data before encryption is fundamental to optimize the
                                                                      performance of our encrypted SVM classifier.
A. Design
                                                                         With this aim, we have explored two different packing
   The design of the prototype follows a client-server approach       methods [2], which we denote in this work as column packing
to emulate the client and cloud side for an scenario of               and flattened packing respectively. Let X be a matrix of size
outsourced computation in an untrusted environment. The               N × M where N represents the number of input samples
client is responsible of encryption and decryption, so the input      we want to query for detection, and M corresponds to the
data is protected from the moment it leaves the client. The           number of features for each input sample. We also assume
server is responsible of processing the encrypted data, and of        that the instantiated CKKS scheme has a packing capacity of
finally sending back to the client the encrypted result. Figure 1     P complex slots. We briefly describe these two methods next:
depicts the main components of the secure SVM prototype.                 • Column packing: It encodes the columns of X separately
                                                                            in different ciphertexts. This packing naturally fits into
                                                                            the SIMD (Single Instruction, Multiple Data) paradigm,
                                                                            which makes it convenient for those cases where the
                                                                            number of input samples N in the client query is large.
                                                                         • Flattened packing: We represent the matrix X as
                                                                            a reshape on which all the rows are concatenated
                                                                            into a “flattened” vector of length N · M such as
                                                                            [row1 (X), . . . , rowN (X)]. Then, we encrypt all the rowi
                                                                            blocks using the minimal possible number of ciphertexts,
                                                                            i.e., each ciphertext has  M
                                                                                                        P
                                                                                                            different rows from X.
                                                                         Contrarily to the former, the use of a flattened packing
                                                                      optimizes the storage in those cases on which N is small
                                                                      in relation to the ciphertext packing capacity P .
                                                                         Homomorphic inference: Regarding the concrete algorithms
                    Fig. 1.   System architecture                     in the encrypted domain, there are also some important
   As shown in Figure 1, the prototype is composed of two             differences for each packing:
different clients (one is web based and the other one command            • Inference with column packing: It follows the same

line based) and one server. All 3 components make use of                    structure as its counterpart in clear. We homomorphically
a database and of a cryptographic library that implements                   compute Eq. (2) by means of both ECKKS .LinHadMult
the needed cryptographic primitives for each component. The                 and ECKKS .Add primitives. It is worth noting that, as a
clients database stores the user homomorphic cryptographic                  different score is calculated by default in parallel for P
keys and the transactions data samples that are classified                  complex slots, we directly obtain the inference for P
with the Homomorphic SVM Inference. The server database                     different input samples in each output ciphertext.
stores users authentication details, users public homomorphic            • Inference with flattened packing: It requires a more cum-

cryptographic keys and parameters, and the SVM model. The                   bersome algorithm on which not only ECKKS .LinHadMult
clients cryptographic library implements the generation of                  and ECKKS .Add primitives are needed, but also
cryptographic keys and parameters, and also the data pack-                  ECKKS .Rot must be applied to relocate the partial results
ing/unpacking and encryption/decryption primitives, while the               of the inner product. In this case each output ciphertext
server cryptographic library implements the encrypted kernel                contains  MP
                                                                                           different score values.
evaluation primitives. In order to implement the primitives for          Tables II and III include a summary of the existing trade-
encrypted processing, the cryptographic libraries make use of         offs between both packing methods. Note that the flattened

                                                                247
Sesión de Investigación B4: Criptografía

                             TABLE II
            C OMPUTATIONAL COST FOR EACH PACKING METHOD                                             VI. C ONCLUSIONS AND F UTURE W ORK
      Method                                  Ciphertext operations                        This work shows that homomorphic encryption is a key
 Column packing                             NP
                                                 (M mult. + M add.)                    enabler technology for migrating highly sensitive process to
                                     
 Flattened packing              N
                                P      (1 mult. + log2 M  add. + log2 M  rot.)     the cloud. Our proof of concept prototype demonstrates the
                               M
                                                                                        feasibility of running the fraud detection inference in the un-
                          TABLE III                                                     trusted domain, while maintaining the prediction performance
        S TORAGE REQUIREMENTS FOR EACH PACKING METHOD                                   and also with a feasible execution performance. As future
                           Method               # of ciphertexts
                                                                                        work, we envision several extensions for our prototype:
                    Column packing                  M N                                  • A more complete comparison considering other alter-
                                                     NP 
                    Flattened packing                 P 
                                                                                              native cryptographic schemes/techniques as Paillier [4],
                                                     M
                                                                                              BFV [14], and functional encryption [15]
                                                                                           • Deploy the server in a real cloud environment in order
packing also requires to generate the rotation key matrices                                   to measure the instantiation costs in a realistic scenario.
rtk which are used for the homomorphic slot rotations.                                     • Protect the model in the execution environment. This
                                                                                              corresponds to the case on which the server infrastructure
                   V. I MPLEMENTATION R ESULTS                                                provider and the model owner are not the same entity.
                                                                                           • Extend the prototype considering more complex kernels;
   In this section we show the results obtained in our imple-
                                                                                              e.g., polynomial and RBF (Radial Basis Function).
mentation. All the experiments were conducted using a test-
                                                                                           • The implementation in the untrusted domain of the
bed composed of a physical device with a CPU i7-4710HQ,
                                                                                              sign(·) function. Currently, there exist mechanisms for its
12GB of RAM and Ubuntu Desktop 20.04.
                                                                                              homomorphic approximation with CKKS [16]. Note this
   In the first place, we evaluate the quality of the inference of
                                                                                              adds an additional degree of protection to the model, as
our implementation against the inference in the clear. These
                                                                                              it is harder for the client reverse engineering the model.
results are shown in Table IV. In order to test the classification
performance, we have considered three different metrics:                                                              R EFERENCES
  •   Accuracy: % of correct predictions.                                                [1] M. Al-Rubaie and J. M. Chang, “Privacy-preserving machine learning:
  •   Precision: % of correctly detected positives among all                                 Threats and solutions,” IEEE Secur. Priv., vol. 17, no. 2, pp. 49–58,
                                                                                             2019.
      detected positives.                                                                [2] A. Vázquez-Saavedra, “Study and applications of homomorphic
  •   Recall: % of correctly detected positives among all                                    encryption algorithms to privacy preserving svm inference for a
      positives.                                                                             bank fraud detection context,” Master’s thesis, Telecommunication
                                                                                             Engineering School, University of Vigo, 2021. [Online]. Available:
                                                                                             http://castor.det.uvigo.es:8080/xmlui/handle/123456789/533
                                TABLE IV
                                                                                         [3] Homomorphic encryption standardization. [Online]. Available: https:
            I NFERENCE RESULTS : M ACHINE L EARNING CONTEXT
                                                                                             //homomorphicencryption.org/
        Library      Packing             Accuracy    Precision     Recall                [4] P. Paillier, “Public-key cryptosystems based on composite degree resid-
         Plain            -               96.6%       5.44%        46.8%                     uosity classes,” in EUROCRYPT, ser. LNCS, vol. 1592. Springer, 1999,
        SEAL         Flattened            96.4%       4.39%        32.5%                     pp. 223–238.
        SEAL         Column               96.6%       3.72%        70.4%                 [5] R. L. Lagendijk, Z. Erkin, and M. Barni, “Encrypted signal processing
        Lattigo      Flattened            96.6%       3.72%        70.4%                     for privacy protection: Conveying the utility of homomorphic encryption
        Lattigo      Column               96.6%       3.72%        70.4%                     and multiparty computation,” IEEE Signal Process. Mag., vol. 30, no. 1,
                                                                                             pp. 82–105, 2013.
                                                                                         [6] S. Park, J. Byun, J. Lee, J. H. Cheon, and J. Lee, “He-friendly algorithm
                              TABLE V                                                        for privacy-preserving SVM training,” IEEE Access, vol. 8, pp. 57 414–
              I NFERENCE RESULTS : PERFORMANCE CONTEXT                                       57 425, 2020.
                   (N = 147635, M = 114, P = 4096)                                       [7] A. Pedrouzo-Ulloa, J. R. Troncoso-Pastoriza, and F. Pérez-González,
  Library      Packing          PS time (ms)        PC time (ms)      Size (KB)              “Number theoretic transforms for secure signal processing,” IEEE
  SEAL         Flattened            4.03                3.66             31.3                Trans. Inf. Forensics Secur., vol. 12, no. 5, pp. 1125–1140, 2017.
  SEAL         Column               0.12                1.69             14.0            [8] J. R. Troncoso-Pastoriza, A. Pedrouzo-Ulloa, and F. Pérez-González,
  Lattigo      Flattened            2.34                16.8             41.9                “Secure genomic susceptibility testing based on lattice encryption,” in
  Lattigo      Column               0.03                8.29             18.7                IEEE ICASSP. IEEE, 2017, pp. 2067–2071.
                                                                                         [9] J. H. Cheon, A. Kim, M. Kim, and Y. S. Song, “Homomorphic
                                                                                             encryption for arithmetic of approximate numbers,” in ASIACRYPT, ser.
  Another important aspect that we have assessed is the                                      LNCS, vol. 10624. Springer, 2017, pp. 409–437.
runtime performance. In this case, Table V includes:                                    [10] Z. Wu. (2020) Kaggle — IEEE Fraud Detection (EDA). [Online].
                                                                                             Available: https://blog.csdn.net/Tinky2013/article/details/104215953
  •   Server (PS ) runtime: the required time to perform the                            [11] I. Steinwart and A. Christmann, “Support vector machines,” in Infor-
      inference function in the server.                                                      mation science and statistics, 2008.
                                                                                        [12] “Microsoft SEAL (release 3.0),” http://sealcrypto.org, Oct. 2018, mi-
  •   Client (PC ) runtime: total time taken by the client to                                crosoft Research, Redmond, WA.
      encode/decode + encrypt/decrypt the different samples.                            [13] C. Mouchet, J.-P. Bossuat, J. Troncoso-Pastoriza, and J. Hubaux,
  •   Size: size of the information sent from client to server.                              “Lattigo: a multiparty homomorphic encryption library in go,” 2020.
                                                                                        [14] J. Fan and F. Vercauteren, “Somewhat practical fully homomorphic
   Note that our results consider the execution of several                                   encryption,” IACR Cryptol. ePrint Arch., vol. 2012, p. 144, 2012.
inferences in parallel, so the included runtimes are normalized                         [15] T. Marc, M. Stopar, J. Hartman, M. Bizjak, and J. Modic, “Privacy-
                                                                                             enhanced machine learning with functional encryption,” in ESORICS,
to measure the estimated runtime for a single inference of each                              ser. LNCS, vol. 11735. Springer, 2019, pp. 3–21.
type of packing. As the transmission times have not been taken                          [16] E. Lee, J. Lee, J. No, and Y. Kim, “Minimax approximation of sign
into account, we include the size of transmitted information                                 function by composite polynomial for homomorphic comparison,” IACR
                                                                                             Cryptol. ePrint Arch., vol. 2020, p. 834, 2020.
per inference.

                                                                                  248
También puede leer