by Craig Stewart, SVP Product SnapLogic
As the pace of AI uptake increases, it’s clear that it will have a considerable impact on the future of business, finance and our wider society. While flourishing AI innovation is a positive thing, we must also manage the potential risks and do our part to ensure that it advances in a responsible way.
In fact, our new research shows that ethical and responsible AI development is a top concern, with 94% of IT leaders believing that more attention needs to be paid to corporate responsibility and ethics in the application of AI.
Much evidence shows that the insights AI offers can be highly beneficial, however we must also recognise its limits in providing perfect answers. Data quality, security and privacy concerns are real and thus the AI regulation debate will continue.
A key threat to effective use of AI is the phenomenon of AI bias. This occurs when an algorithm delivers prejudiced results based on incorrect assumptions in the AI development process. These are often ‘built-in’ through the unconscious preferences of the human being who created the process or selected the training data. They can also reflect issues in the data gathering stage where weighting procedures cause incorrect conclusions to be made about certain data sets.
As we become increasingly reliant on AI, it’s essential to eliminate biases as much as possible because it can often cause undesirable decisions and outcomes. Legal cases have already taken place where groups have forced the disclosure of how algorithmic processes take decisions. Many have won compensation when either the algorithm or the underlying data has been found to introduce bias. A recent example involved teachers who were not paid performance bonuses, and won damages when it was realised that the algorithm assessing eligibility for the bonus did not take into account class sizes – a factor found to be highly significant in pupil attainment.
As we become increasingly reliant on AI, it’s essential to eliminate biases as much as possible because it can often cause undesirable decisions and outcomes.
How does AI perpetuate bias?
Bias can enter the system at any stage of the learning process and it’s not always related purely to training data. It can emerge at any time throughout the deep learning process, whether that includes collecting data, setting objectives or preparing the data for training or operation.
The most commonly acknowledged bias concerns the initial process of collecting, selecting and cleaning data. Here bias can arise in training data if decisions around rejecting outliers, or data that is perceived as irrelevant, is not tested and then accidentally introduces prejudices. This can result in certain factors being mistakenly favoured by the AI in place of others that could be more relevant to the desired outcome.
Take for example a growing male-dominated business looking to use AI to screen candidates. If the AI was trained on the CVs and employment data of current successful employees, it is likely that it will develop a bias towards selecting male applicants for interviews as they fit the pattern of the company as it exists. Simple fixes like removing the sex of the employees from the training data may not work as the AI algorithm may identify patterns of male-dominated hobbies as indicators of desirable employees, for example.
Secondly, in setting objectives for a deep learning model (i.e. what the designers want it to achieve) the objective needs to be set in context in order for recommendations to be correctly computed. For example, if the objective is to increase profits without context and boundaries set relating to maintaining customer satisfaction, the output will be skewed. The AI could seek to achieve the goal by making short term decisions that achieve the objective of profit, at the expense of the long term viability of the business.
The most commonly acknowledged bias concerns the initial process of collecting, selecting and cleaning data.
Lastly, bias can be introduced during the stage where data is prepared for processing. This often results in certain attributes for the algorithms being prioritised over others. The choice of which attributes should be considered or ignored will have a significant impact on the accuracy a model can predict. It’s therefore imperative to grade and rank them correctly. It’s also important to avoid dropping data that is hard to process. Designing a data pipeline that can handle exceptions well is essential to ensure there is sufficient data for good training outcomes.
The above processes highlight that in many cases, bias can easily leak into the system. It’s often only discovered when a system goes fully live, by which time the issue can become far more difficult to address. Therefore, testing the system against expectations as it develops and involving a diverse group of stakeholders in the evaluation is critical.
So how can we mitigate bias within AI?
There are many examples of how the industry is working towards addressing the bias conundrum. Most have involved revisiting and updating data after the event when bias is discovered, and it’s often the human element (or the personalities that feed into the underlying systems) which were the source of bias.
Once the human factor is addressed, developers need to thoroughly check the underlying data. This is to ensure the data is fully representative of all factors that could inform the business decision where a lack of or erroneous data might impact the algorithm in a negative way. As an example, would rejecting data from subjects who did not include a mobile phone number or email address matter? That decision may make future sales contact easier, but could it mean that a generational bias had been introduced skewing the analysis?
It’s important to ensure that the algorithms that feed into an AI system’s underlying data minimise bias as much as possible. Those involved in AI research and implementation have had considerable success in addressing the bias challenge. Many have created algorithms that can effectively detect and reduce bias. Progress has also been made in the regulatory environment to help mitigate the potentially negative effects of bias on AI.
The EU’s GDPR regulation, for example, gives consumers a right to explanation on how automatic decisions have been made based on their data. How much impact this has had to date is unclear, but those rights, and the penalties for not recognising them, should be a factor in ensuring efforts are made to design out bias.
Testing the system against expectations as it develops and involving a diverse group of stakeholders in the evaluation is critical.
A problem from the top
There’s also an argument that we need to rethink AI and how it’s often approached from the top of the global status hierarchy. It is too heavily influenced by the biases of first-world cultures. This results too often in AI systems outputs reflecting those biases to the detriment of the less well off in society. Recent real world examples include the high profile facial recognition issues of a certain blue chip tech company. Situations like this should be addressed to ensure the inclusion of a widely diversified spread of society. Incorporating inputs from a wider, more globalised and diverse data set can address this.
Additionally, building AI models from original data can help to eliminate bias. This allows far more scope for ideas and actual evidence to feed into AI systems that can evolve to offer insights beyond the typical first-world perspective. This inversion of the scientific principle, looking for patterns of interest to explore rather than testing a hypothesis, remains a powerful application of data science for identifying new behaviours and groups of customers.
This data driven approach would also build far more flexibility and responsiveness into AI systems and open them up to a far more comprehensive and diversified set of global considerations and imperatives.
Breaking down the data silos
However to be data-driven, and make use of the greatest possible diversity of data, it is essential to focus on developing systems that break down data silos and enable the integration of the broadest possible range of data sets. The tools already exist to rapidly configure integrations between a huge range of systems – without heavy lifting by software developers, so data teams have this diverse data within their grasp.
Businesses need to consider new sources of data and new ways of interpreting and understanding it. Ultimately those who invest in ensuring they can access a wide diverse pool of data will benefit the most by identifying the true value of AI.