The 100 Shadows of Algorithm

The 100 Shadows of Algorithms After Reading

In this book, David Sumpter, an applied mathematician and author of One Hundred Shadows of Algorithms, explains how algorithms are analyzed in a casual tone and gives his statistical perspective on how they affect us and whether AI can replace humans.

An algorithm that analyzes and influences people.

When it comes to today's algorithmic analysis of human beings, it actually started with Google, which pioneered the business model of realizing advertisements from Internet traffic, collecting data from the words we search and the websites we visit, and eventually the data knows us better than we know ourselves, so the algorithms can decide which relevant advertisements to push. Later on, Facebook invented the Like button, which is able to collect our preferences and interactions.

The author has given a more detailed explanation of the mathematical model in the book. The main point can be summarized as follows: mathematically, if we want to classify people, we have to represent them as points in space with different dimensions (e.g. gender is one dimension, height is one dimension...). In the real world, there are countless dimensions, but after mathematical analysis, we can extract the most effective dimensions for categorization from the current data, leaving us with a few hundred dimensions. So our ratings on the hundreds of most important dimensions define the advertisements we will see, or more sensationally, what kind of people we are in the algorithm's mind.

The 100 Shadows of Algorithms: Discrimination, Inaccuracy, and the Stratosphere

Discrimination

Algorithms, which started out as a means of collecting data, have also been criticized for discrimination. In fact, to put it plainly, it is not the algorithm that is discriminatory, but the data that are taken as input. When 80% of the engineers in the input data are male, the algorithm naturally thinks that the engineers are more likely to be male, and if the recruitment algorithm recommends more male candidates, it may make it more difficult for females to get into the position of engineers. The nature of the problem is familiar with statistics, according to the original data to see the two things have mathematical correlation, does not mean that it is a causal relationship.

Inaccuracy

Prediction algorithms use statistical results to make predictions, and in order to use statistical results to make predictions, there is a prerequisite that the number of people to be predicted must be large enough. If it is applied to a single person, there will be a lot of inaccuracies. This is in fact the common limitation of all predictive algorithms.

Stratospheric Problems

The concept of "stratosphere" is probably not unfamiliar to you by now, and one of the chapters in this book talks about the so-called "bubble" which refers to the same thing. The creation of bubbles comes from the visibility formula used in social networking sites: the more you interact with a person/topic, the more likely you are to see related messages.

In fact, there is nothing wrong with the concept itself, even when there is no internet, people still have a stratosphere. However, in recent years, it has been found that the messages broadcasted on the Internet have been deeply influenced by this algorithm. How can I put it? For example, if someone writes a fake message that doesn't make any sense, and many people leave comments criticizing him, that's an interaction. The person who left the message is more likely to see the message because the algorithm thinks he is "interested". Over time, users are less likely to see messages that are not interactive.

The problem of fake news

Similar to the problem of the stratosphere, fake news mainly refers to news with untrue contents. Some news is deliberately fake to capitalize on the popularity of the news and make it into the mainstream of some people's pages. The problem with fake news is that it reinforces false memories. There is a term "Mandela effect", which refers to the fact that people can be fooled by their own false memories. If you don't have a strong memory of an event, it's easier to create a false memory after seeing evidence of a falsehood a few times, and it doesn't occur to you to doubt the event. A common scenario is that the message is written in a sensational way to attract clicks, and after a few clicks, the uninformed are attracted to it, and the wall is filled with similar messages, and they believe it when they see it.

Algorithms are good or bad

In addition to showing us the limitations and problems of the algorithm, the author also spends a lot of time discussing reflections on these limitations. There are a lot of things that we should think about, so I'll just note my personal experience here.

Algorithms vs. people, do algorithms have bigger problems than people?

Algorithms are simply a reflection of the biases of the world we live in.

Few people think about this when they talk about problems. When algorithms are developed because they are more efficient than human labor, it is important to weigh the pros against the cons.

For example, if we take the discrimination problem as an example, since the "discrimination" phenomenon of the algorithm comes from the input data, it means that it is actually due to the fact that there is a widespread phenomenon of human discrimination in the society (at least in the scope of the data set used). So the natural question to ask is, is algorithmic discrimination worse than human discrimination? Statistically, at least, algorithms do not seem to be worse than people. And since algorithms are fast, this may mean that if we only talk about the accuracy and speed of human and algorithmic judgment, algorithms may still be better.

Nevertheless, it is difficult for statistics to give us direct answers to qualitative questions. If we leave everything to the algorithms, will we not create a worse vicious circle of discrimination? This part of the question is not addressed in this book, and I believe it is an interesting topic for social science research.

Algorithmic limitations or human weaknesses?

Improvements are unlikely to come from a rigorous methodology, so they mostly rely on the modeler's own intellectual skills.

The book points out a very important point, which is also easily overlooked by non-professionals: when doing statistics, because there are always certain limitations in the data, the person who executes the statistical method always needs to find ways to fine-tune and correct these flaws. As a reader of data analysis for a living, this is common knowledge to me, but I believe that many people do not really understand this point. It's a common phenomenon in stock market analysis or election analysis, where investors try to make the data as "common sense" as possible, within reason.

These algorithms are based on the notion that we can learn from the recommendations and decisions of others.

Many websites with recommendation features such as Netflix and Amazon utilize the same concept: using other people's recommendations to help us make decisions. The need for these features comes from human nature: we don't want to see too much information. So the website gives us a few choices and we pick the ones we like. We are easily "controlled" by the choices offered.

As a result, various ways to optimize and increase exposure have been created. Due to the herd-like nature of human beings, if a product or a movie gets a lot of attention when it first comes out, it will easily appear at the top of the charts. It's scary at first glance, but when you think about it, it seems like people were like this before the internet.

Is the stratosphere that bad?

The stratosphere problem sounds serious, but the authors present a different view of the study. Some have looked at whether the filtering caused by the stratosphere is actually affecting users, but have found that statistically there is very little effect. This means that there is an effect, but it is not so great as to change the world (e.g. election results, etc.). Another study that roughly calculated how many people remembered fake news before an election also showed that it was only one or two, and they were unlikely to believe it.

Despite what these two studies say, I personally have reservations about whether the evidence is strong enough because there are only two studies in this area mentioned in the book. The first one is mainly about the impact on conservative and liberal voters, and it seems that the theme of the study is not focused on the number of "ill-informed people". The second is only a "rough" calculation. After all, as I mentioned earlier, one has to look very carefully to see if the statistics and the methodology used are consistent with the message the author is trying to convey. I haven't read this type of paper in its entirety, so I think the seriousness of the stratosphere issue needs to be analyzed in more depth.

Will machines overtake humans?

The last half of the book discusses machine learning. Much of the explosion in machine learning and AI in recent years has come from a technique called convolutional neural networks. This technology is based on neural-like networks, which are multi-layered models. Data is segmented into many input components, which are put into a model, and each layer of the model performs operations to arrive at the final answer. For example, an image is segmented into pixels and then fed into the model, resulting in a classification of "dog" or "cat". The key point of this technique is that the model will adjust its parameters according to the "consequences" of the judgment, so each time the model is executed, it will change a bit until it becomes more and more correct. Technically speaking, a machine still needs a human to tell it a lot of things, and a lot of machine learning research is trying to minimize the things that humans need to tell the model.

To summarize, there are two main challenges that must be addressed for machine learning to become truly human.

  • Is it possible for a model to be retrained for every problem?
  • Is there any way for the model to do this so that a person needs to tell the model his target function?

To put it simply, we can say that "machines don't have the ability to decide for themselves what they want to learn". If the model can develop into a generalized generalist, and has the ability to determine its own goals, it may actually become very close to humans. And then, because the machine learns faster, the AI will outperform the human. However, the authors seem to be strongly of the opinion that it's still a long way off before AI can outperform humans, because we haven't seen the solution yet.

Summary of Insights

This is a book that is written in a very relaxed manner, and the degree of popularization is a bit deep in some places, but it is not difficult to understand when you read it seriously. In recent years, due to the general elections in various countries (especially in the United States), more and more people believe that social networking sites and other interested parties are using various kinds of big data to influence people's decisions and cause social division, but few people really pay attention to all the theories behind. Reading 100 Shadows of Algorithms is a good way to understand the common online phenomena that have affected our daily lives to a greater or lesser extent over the past two decades. If you don't know anything about recommended algorithms or machine learning, and you're concerned about how fake news affects you, this is a great book to read.

Lastly, the words quoted by the author are worth pondering over:

The real threat is not that computer intelligence has increased dramatically, but that we are using the tools available to us to benefit only a few rather than to improve the lives of the many, and that we are only interested in providing stewardship for the super-rich rather than solving the problems of the masses.


Thank you for reading this post. If you like my post, please follow up withFacebook Fan Specialist,Twitter,IGThe

Leave a ReplyCancel reply