Tag Archives: cluster analyses

The tradeoff

That is at times the question and the BBC is introducing us to a hell of a tradeoff. The story (at https://www.bbc.com/news/articles/c0kglle0p3vo) is giving us ‘Meta considers charging for ad-free Facebook and Instagram in the UK’, the setting is not really a surprise. On April 10th 2018 we were clearly given “Senator, we run ads” and we all laughed. Congress is trying to be smart over and over again and Mark Zuckerberg was showing them the ropes. Every single time. There was little or no question on this on how they were making money. Yet now the game changes. You see, in the past Facebook (say META) was the captain of their data vessel. A system where they had the power and the collective security of our data in hands. There was no question on any setting and even I was in the assumption that they had firm hands on a data repository a lot larger than the vault if the Bank of England. That was until Cambridge Analytica and in March 2018 their business practices were shown the limelight and it also meant that Facebook no longer had control of their ship of data, which meant that their ‘treasure’ was fading. 

So now we get “Facebook and Instagram owner Meta is considering a paid subscription in the UK which would remove adverts from its platforms. Under the plans, people using the social media sites could be asked to pay for an ad-free experience if they do not want their data to be tracked.” It makes perfect sense that under the guise of no advertising, the mention of paid services make perfect sense. This is given to us via the setting of “It comes as the company agreed to stop targeting ads at a British woman last week following a protracted legal battle.” I don’t get it, the protracted legal battle seems odd as this was the tradeoff for a free service. Is this a woke thing? You get a free service and the advertising is the process for this. As such I do not get the issue of “Guidance issued by the regulator in January states that users must be presented with a genuine free choice.” This makes some kind of sense, so it is either pay for the service or suffer the consequences of advertising. And lets be clear the value of META relies on targeted advertising. What is the use of targeting everyone for a car ad when it includes the 26% of the people who do not have a drivers license. There is the addition that these people need to have an income of over $45,000 to afford the 2025 Lexus RX $90,350 which is about 30%. We can (presumptively) assume that this get us a population of about 20%-25%, so does it make any sense for Lexus to address the 100% whilst only one in four or one in five is optionally in the market? Makes no sense does it? As such META needs to rely on as much targeted advertising as it can. And as you can see, The advertising model, known as “consent or pay”, has become increasingly popular. And at some point they were giving the people “But it reduced its prices and said it would provide a way for users not willing to pay to opt to see adverts which are “less personalised”, in response to regulatory concerns.” That is partially acceptable, but I have a different issue. You see, I foresee issues with “less personalised”, apart from gambling sites, there is a larger concern that even as Facebook (or META) isn’t capturing some data. There is the larger fear that some will offer some services and now care about capturing collected data. For example sites outside the EU (or UK). Sites in China and Russia like their social sites that collect this data and optionally sell it to META. You see, there is as I currently see it no defense on this. Like in the 90’s when American providers made some agreement, but some of them did not qualify the stage of what happened to the data backups and those were not considered, when they were addressed it was years later and the data had left the barn (almost everywhere). 

There is a fear (a personal fear) that the so called captains of industry have not considered (I reckon intentionally) the need of replacing and protecting aggregated data and aggregated results. Which allows for a whole battery of additional statistics. Another personal fear is the approach to data and what they laughingly call AI. It is hard to set a stage, but I will try. 

To get this I will refer to a program called SPSS (now IBM Statistics) so called {In SPSS, cluster analysis groups similar data points into clusters, while discriminant analysis classifies data points into pre-defined groups based on predictor variables.}

So to get data points into a grouping like income to household types, this is a cluster analyses.

And to get household types onto data points like income to household types, is called a discriminant analyses. Now as I personally see it (I am definitely not a statistician) If one direction is determined, the other one should always fail. It is a one direction solution. So a cluster analyses is proven, a discriminant analyses to income ill always fail and vice versa. Now with NIP (Near Intelligent Parsing, which is what these AI firms do) They will try to set a stage to make this work. And that is how the wheels come of the wagon and we get a whole range of weird results. But now as people set the stage for contributing to third party parsing and resource aggregation, I feel that a dangerous setting could evolve and there is no defense against that. As I see it, the ‘data boys’ need to isolate the chance of us being aggregated through third parties and as I see it META needs to be isolated from that level of data ‘intrusion’. A dangerous level of data to say the least.

There is always a downside to a tradeoff and too many aren’t aware of the downside of that tradeoff. So have a great day and try to have a half cup of good coffee (data boys get that old premise)

Leave a comment

Filed under Finance, IT, Media, Science

When one door closes

Yes, that is the stage I find myself in. However I could say when one door closes someone gets to open the window. Yet, even as I am eager to give you that story now, I will await the outcome of Twitter (who blocked my account) and the outcome there will support the article. Which is nice because it makes for an entertaining story. It did however make me wonder on a few parts. You see AI does not exist. It is machine learning and deeper learning and that is an issue for the following reasons.

Deep learning requires large amounts of data. Furthermore, the more powerful and accurate models will need more parameters, which, in turn, require more data. Once trained, deep learning models become inflexible and cannot handle multitasking.

This leads to: 

Massive Data Requirement. As deep learning systems learn gradually, massive volumes of data are necessary to train them. This gives us a rather large setting, as people are more complex, it will require more data to train them and the educational result is as many say an inflexible setting. I personally blame the absence of shallow circuits, but what do I know? There is also the larger issue of paraphrasing. There is an old joke. The joke goes “Why can a program like SAP never succeed?” “Because it is about a stupid person with stress, anxiety and pain” until someone teaches that system that SAP is also a medical term for Stress, Anxiety and Pain” and until we understand that ‘sap’ in the urban dictionary as a stupid person, or a foolish and gullible person the joke falls flat. 

And that gets me to my setting (I could not wait that long). The actor John Barrowman hinted that he will be in the new Game of Thrones series (House of the Dragon), he did this by showing an image of the flag of House Stark. 

I could not resist and asked him whether we will see his head on a pike and THAT got thrown from Twitter (or taken from the throne of Twitter). Yet ANYONE who followed Game of Thrones will know that Sean Bean’s head was placed on a pike at the end of season 1, as such I thought it was funny and when you think if it, it is. But that got me banned. So was this John Barrowman who felt threatened? I doubt that, but I cannot tell because the reason of why this tweet caused the block is currently unknown. If it is machine learning and deeper learning we see its failure. Putting ones head on a pike could be threatening behaviour, but it came from a previous tweet and the investigator didn’t get it, the system didn’t get it or the actor didn’t do his homework. I leave it up to you to figure it out. Optionally my sense of humour sucks, that to is an option. But if you see the emoji’s after the text you could figure it out. 

High Processing Power. Another issue with deep learning is that it demands a lot of computational power. This is another side. With each iteration of data the demand increases. If you did statistics in the 90’s you would know that CLUSTER analyses had a few setbacks, the memory needs being one of them, it resulted in the creation of QUICKCLUSTER something that could manage a lot more data. So why use the cluster example?

Cluster analyses is a way of grouping cases of data based on the similarity of responses to several variables. There are two types of measure: similarity coefficients and dissimilarity coefficients. And especially in the old days, memory was hard to get and it needs to be done in memory. And here we see the first issue. ‘the similarity of responses to several variables’ and here we determine the variables of response. But in the SAP example, the response is depending on someone with medical knowledge and one with urban knowledge of English, and if these are two different people, the joke quickly falls flat, especially when these two elements do not exchange information. In my example of John Barrowman WE ALL assume that he does his homework (he has done this in so many instances, so why not now), so we are willing to blame the algorithm, but did that algorithm see the image John Barrowman gave us all, does the algorithm know the ins and outs of Game of Thrones? All elements and I would jest (yes, I cannot stop) that these are all elements of dissimilarity, as such 50% of the cluster fails right of the bat and that gets us to…

Struggles With Real-Life Data. Yes, deeper learning struggles with real life data because it is given in the width of the field of observation. For example, if we were to ask a plumber, a butcher and a veterinarian to describe the uterus of any animal we get three very different answers and there is every chance that the three people do not understand the explanation of the other two. A real life example of real life settings and that is before paraphrasing comes into play, it merely makes the water a lot more muddy.

Black Box Problems. And here the plot thickens. You see at the most basic level, “black box” just means that, for deep neural networks, we don’t know how all the individual neurons work together to arrive at the final output. A lot of times it isn’t even clear what any particular neuron is doing on its own. Now I tend to call this: “A precise form of fuzzy logic” and I could be wrong on many counts, but that is how I see it. You see why did deeper learning learn it like this? It is an answer we will not ever get. It becomes too complex and now consider “a black box exists due to bizarre decisions made by intermediate neurons on the way to making the network’s final decision. It’s not just complex, high-dimensional non-linear mathematics; the black box is intrinsically due to non-intuitive intermediate decisions.” There is no right, no wrong. It is how it is and that is how I see what I now face, the person or system just doesn’t get it for whatever reason and a real AI could have seen a few more angles and as it grows it will see all the angles and get the right conclusion faster and faster. A system on machine learning or deeper learning will never get it, it will get more and more wrong because it is adjusted by a person and if that person misses the point the system will miss the point too, like a place like Gamespot, all flawed because a conclusion came based on flawed information. This is why we have no AI, because the elements of shallow circuits and quantum computing are still in their infancy. But salespeople do not care, the term AI sells and they need sales. This is why things go wrong, no one will muzzle the salespeople.

In the end shit happens, that is the setting but the truth of the matter is that too many people embrace AI, a technology that does not exist, they call it AI, but it is a fraction of AI and as such it is flawed, but that s a side they do not want to hear. It is a technology in development. This is what you get when the ‘fake it until you make it’ is in charge. A flaw that evolves into a larger flaw until that system buckles.

But it gave me something to write about, so it is not all a loss, merely that my Twitter peeps will have to do without me for a little while. 

Leave a comment

Filed under IT, movies, Science