GDPR – Federated Search Technology Pattern

The EU General Data Protection Regulation requires that European Citizen’s personally identifiable information is controlled, secured, available-on-request and able to be erased by 25 May 2018. Additionally customers have the right to easily change their marketing contact preferences within a reasonable time. For most organisations this is a challenge when they are dealing with legacy systems with departmental controls and the only data warehouse is provisioned for analytical purposes only.

One way of achieving compliance is to create a enterprise wide Data Lake containing all the organisations customer information sourced from different business unit’s operational systems and core systems such as email, intranet and shared drives. The Data Lake also contains all the original systems meta-data (data about the data) plus provenance information such as back-links to the systems of record in the individual business units. These back-links to the operational systems allow the right to be deleted to be exercised if requested by a customer. The meta-data contains the creation and last accessed dates from the system of record along with security information to allow the correct access controls to be applied to the Data Lake.

Ensuring a single view of the customer is much easier if all the enterprise data is within the Data Lake including third party SaaS (Software-as-a-service) providers such as Salesforce or Microsoft Dynamics. The addition of Master Data Management technologies from Informatica or IBM can provide data cleansing either before the data reaches the Data Lake or during a Search on the Data Lake. The Data Lake model also allows enterprises to resolve the issues around marketing contact preferences which can be
ifferent in each customer relationship management system or account. Allowing the customer to change their contact status or contact channels becomes easier if they can be found in the Data Lake.

However creating Data Lakes can be very challenging if the security or data models are heavily embedded within the operational systems or local jurisdictional systems have to be used for access control and monitoring.

The alternative model is Federated Search which for some organisations is a better solution. The Federated Search also allows the minimal amount of sharing as it uses a ‘merge on query’ approach to inter-departmental data which allows potentially greater compliance with ‘privacy-by-design’ constraints on the systems. Additionally a Federated Search can cross the organisational boundary into external data processor’s systems in real time.

The Federated Search model requires each departmental operational system to provide a full text search index either re-using an existing index technology or deploying a bespoke search capability using, for instance, Apache Solr or ElasticSearch. An Enterprise Search Service provides a central service or portal from which queries can be made. The Enterprise Search Services cascades any customer lookup queries down to departmental federated query engines which then searches for the data in their local index. If customer data is found then a specific query is constructed on the operational system. The returned data is correlated, matched and linked to provide a single view of the customers data. The actual search uses the security credentials of the user requesting the information of the Enterprise Search Services so the security controls and logging are preserved. In addition the business unit data owner retains real-time control over access to the data and can see the data access patterns within their existing context. Another benefit is the local index protects the operational system from unexpected load or logging as the resulting queries from the federated queries engines can be optimised for extracting specific information and not searching.

From a delivery perspective the Federated Search option means the organisation is not running a big programme at the centre of the organisation with the issues of communication, governance, funding and additional dependencies on an already stretched enterprise. The individual business units have the freedom to define the indexing technologies and subsequent queries and only need to comply with a well defined API for data query and security authentication and authorisation information. The system owner for the Active Directory (or equivalent identity and access management service) is not required to implement consolidation of permissions from various systems. The Security Operations Centre does not need to take on new feeds from a new system and try to correlate them with the existing operational system to determine access patterns.

The central technology programme is therefore responsible for defining the Customer Search API, the Federation Services and the API definitions alongside an on-boarding plan which can meet the speed of the overall organisation and the individual business units.

Data Lakes are a powerful technology for organisation to deploy, however with the impending deadline for GDPR compliance (25 May 2018) looming some organisations may need to take a more expedient approach.

For more information about GDPR, please see the UK Information Comissioner’s Office website:

I encourage you to watch the video and provide feedback in the comments for suggestions, improvements, alternative approaches and critique.

Technology Security Lamination

The key theme for protecting IT systems from unauthorised access is to offer multiple layers of protection in terms of people, technology and physical environment. This is known as defence-in-depth when referring to technology, separation-of-concerns when referring to people and compartmentalization when referring to the physical environment. Ultimately all these techniques resemble lamination as applied to bulletproof glass or car windshields where multiple layers of different materials are used each with different physical characteristics of strength, hardness and brittleness which are stronger in a composite form.

Defence in depth is the use of different technologies to offer layers of protection which protect some aspect of the system such as vulnerable protocols, incorrect content, hard to resolve bugs or enabling a single person to compromise the entire system. Such technologies are firewalls (web, layer-7 or network), DMZs (network zones between two firewalls), content inspecting proxies (anti-malware and data loss prevention) and Virtual Private Networks. They work with IDAM (Identity and Access Management) solutions which also include authorisation, authentication, auditing and logging.

Separation of concerns for people involves the dividing roles between developers, administrators (including DevOps) and security operations staff. The developers design and write code which is turn deployed and managed by administrations and the previous two roles are monitoring by security operations staff. The logging and monitoring provided by the IDAM solutions needs de-duplication, correlation and analysis for behavioural changes to avoid intrusion and infiltration by new techniques (zero day threats) which emerge or are deliberately engineered to overcome the defence in depth solutions. The people aspects of security need a strong security policy, good background checks on candidates, regular training against social engineering and a culture of continuous improvement.

Physical security is still very important aspect of the overall security defence regime and technologies such as strong encryption only mitigates the risks and does not end them. Strongrooms, multiple doors, biometric security, multiple physical sites, CCTV, intrusion detection alarms and in some cases TEMPEST/SCIF techniques are the foundation of good information management. Managing and monitoring physical security in the same way as digital security is paramount to achieving the right level of control. Cloud environments enable the shift some of this responsibility to third parties for Storage, Compute, Network and Backup who are constantly improving through achieving ISO27001, PCI-DSS, HIPAA and FedRAMP certification which benefits all their customers.

Technology security lamination using different technologies/techniques/people at each layer provides the best approach to meet the challenge of continual improvement in the arms race that is Cyber-Security.