Tag Archives: Machine learning

Ghost in the Deus Ex Machina

James Bridle is treating the readers of the Guardian to a spotlight event. It is a fantastic article that you must read (at https://www.theguardian.com/books/2018/jun/15/rise-of-the-machines-has-technology-evolved-beyond-our-control-?). Even as it starts with “Technology is starting to behave in intelligent and unpredictable ways that even its creators don’t understand. As machines increasingly shape global events, how can we regain control?” I am not certain that it is correct; it is merely a very valid point of view. This setting is being pushed even further by places like Microsoft Azure, Google Cloud and AWS we are moving into new territories and the experts required have not been schooled yet. It is (as I personally see it) the consequence of next generation programming, on the framework of cloud systems that have thousands of additional unused, or un-monitored parameters (read: some of them mere properties) and the scope of these systems are growing. Each developer is making their own app-box and they are working together, yet in many cases hundreds of properties are ignored, giving us weird results. There is actually (from the description James Bridle gives) an early 90’s example, which is not the same, but it illustrates the event.

A program had windows settings and sometimes there would be a ghost window. There was no explanation, and no one could figure it out why it happened, because it did not always happen, but it could be replicated. In the end, the programmer was lazy and had created a global variable that had the identical name as a visibility property and due to a glitch that setting got copied. When the system did a reset on the window, all but very specific properties were reset. You see, those elements were not ‘true’, they should be either ‘true’ or ‘false’ and that was not the case, those elements had the initial value of ‘null’ yet the reset would not allow for that, so once given a reset they would not return to the ‘null’ setting but remain to hold the value it last had. It was fixed at some point, but the logic remains, a value could not return to ‘null’ unless specifically programmed. Over time these systems got to be more intelligent and that issue had not returned, so is the evolution of systems. Now it becomes a larger issue, now we have systems that are better, larger and in some cases isolated. Yet, is that always the issue? What happens when an error level surpasses two systems? Is that even possible? Now, moist people will state that I do not know what I am talking about. Yet, they forgot that any system is merely as stupid as the maker allows it to be, so in 2010 Sha Li and Xiaoming Li from the Dept. of Electrical and Computer Engineering at the University of Delaware gave us ‘Soft error propagation in floating-point programs‘ which gives us exactly that. You see, the abstract gives us “Recent studies have tried to address soft errors with error detection and correction techniques such as error correcting codes and redundant execution. However, these techniques come at a cost of additional storage or lower performance. In this paper, we present a different approach to address soft errors. We start from building a quantitative understanding of the error propagation in software and propose a systematic evaluation of the impact of bit flip caused by soft errors on floating-point operations“, we can translate this into ‘A option to deal with shoddy programming‘, which is not entirely wrong, but the essential truth is that hardware makers, OS designers and Application makers all have their own error system, each of them has a much larger system than any requires and some overlap and some do not. The issue is optionally speculatively seen in ‘these techniques come at a cost of additional storage or lower performance‘, now consider the greed driven makers that do not want to sacrifice storage and will not handover performance, not one way, not the other way, but a system that tolerates either way. Yet this still has a level one setting (Cisco joke) that hardware is ruler, so the settings will remain and it merely takes one third party developer to use some specific uncontrolled error hit with automated assumption driven slicing and dicing to avoid storage as well as performance, yet once given to the hardware, it will not forget, so now we have some speculative ‘ghost in the machine’, a mere collection of error settings and properties waiting to be interacted with. Don’t think that this is not in existence, the paper gives a light on this in part with: “some soft errors can be tolerated if the error in results is smaller than the intrinsic inaccuracy of floating-point representations or within a predefined range. We focus on analysing error propagation for floating-point arithmetic operations. Our approach is motivated by interval analysis. We model the rounding effect of floating-point numbers, which enable us to simulate and predict the error propagation for single floating-point arithmetic operations for specific soft errors. In other words, we model and simulate the relation between the bit flip rate, which is determined by soft errors in hardware, and the error of floating-point arithmetic operations“. That I can illustrate with my earliest errors in programming (decades ago). With Borland C++ I got my first taste of programming and I was in assumption mode to make my first calculation, which gave in the end: 8/4=2.0000000000000003, at that point (1991) I had no clue about floating point issues. I did not realise that this was merely the machine and me not giving it the right setting. So now we all learned that part, we forgot that all these new systems all have their own quirks and they have hidden settings that we basically do not comprehend as the systems are too new. This now all interacts with an article in the Verge from January (at https://www.theverge.com/2018/1/17/16901126/google-cloud-ai-services-automl), the title ‘Google’s new cloud service lets you train your own AI tools, no coding knowledge required‘ is a bit of a giveaway. Even when we see: “Currently, only a handful of businesses in the world have access to the talent and budgets needed to fully appreciate the advancements of ML and AI. There’s a very limited number of people that can create advanced machine learning models”, it is not merely that part, behind it were makers of the systems and the apps that allow you to interface, that is where we see the hidden parts that will not be uncovered for perhaps years or decades. That is not a flaw from Google, or an error in their thinking. The mere realisation of ‘a long road ahead if we want to bring AI to everyone‘, that in light of the better programmers, the clever people and the mere wildcards who turn 180 degrees in a one way street cannot be predicted and there always will be one that does so, because they figured out a shortcut. Consider a sidestep

A small sidestep

When we consider risk based thinking and development, we tend to think in opposition, because it is not the issue of Risk, or the given of opportunity. We start in the flaw that we see differently on what constitutes risk. Even as the makers all think the same, the users do not always behave that way. For this I need to go back to the late 80’s when I discovered that certain books in the Port of Rotterdam were cooked. No one had figured it out, but I recognised one part through my Merchant Naval education. The one rule no one looked at in those days, programmers just were not given that element. In a port there is one rule that computers could not comprehend in those days. The concept of ‘Idle Time’ cannot ever be a linear one. Once I saw that, I knew where to look. So when we get back to risk management issues, we see ‘An opportunity is a possible action that can be taken, we need to decide. So this opportunity requires we decide on taking action and that risk is something that actions enable to become an actual event to occur but is ultimately outside of your direct control‘. Now consider that risk changes by the tide at a seaport, but we forgot that in opposition of a Kings tide, there is also at times a Neap tide. A ‘supermoon’ is an event that makes the low tide even lower. So now we see the risk of betting beached for up to 6 hours, because the element was forgotten. the fact that it can happen once every 18 months makes the risk low and it does not impact everyone everywhere, but that setting shows that once someone takes a shortcut, we see that the dangers (read: risks) of events are intensified when a clever person takes a shortcut. So when NASA gives us “The farthest point in this ellipse is called the apogee. Its closest point is the perigee. During every 27-day orbit around Earth, the Moon reaches both its apogee and perigee. Full moons can occur at any point along the Moon’s elliptical path, but when a full moon occurs at or near the perigee, it looks slightly larger and brighter than a typical full moon. That’s what the term “supermoon” refers to“. So now the programmer needed a space monkey (or tables) and when we consider the shortcut, he merely needed them for once every 18 months, in the life cycle of a program that means he merely had a risk 2-3 times during the lifespan of the application. So tell me, how many programmers would have taken the shortcut? Now this is the settings we see in optional Machine Learning. With that part accepted and pragmatic ‘Let’s keep it simple for now‘, which we all could have accepted in this. But the issue comes when we combine error flags with shortcuts.

So we get to the guardian with two parts. The first: Something deeply weird is occurring within these massively accelerated, opaque markets. On 6 May 2010, the Dow Jones opened lower than the previous day, falling slowly over the next few hours in response to the debt crisis in Greece. But at 2.42pm, the index started to fall rapidly. In less than five minutes, more than 600 points were wiped off the market. At its lowest point, the index was nearly 1,000 points below the previous day’s average“, the second being “In the chaos of those 25 minutes, 2bn shares, worth $56bn, changed hands. Even more worryingly, many orders were executed at what the Securities Exchange Commission called “irrational prices”: as low as a penny, or as high as $100,000. The event became known as the “flash crash”, and it is still being investigated and argued over years later“. In 8 years the algorithm and the systems have advanced and the original settings no longer exist. Yet the entire setting of error flagging and the use of elements and properties are still on the board, even as they evolved and the systems became stronger, new systems interacted with much faster and stronger hardware changing the calculating events. So when we see “While traders might have played a longer game, the machines, faced with uncertainty, got out as quickly as possible“, they were uncaught elements in a system that was truly clever (read: had more data to work with) and as we are introduced to “Among the various HFT programs, many had hard-coded sell points: prices at which they were programmed to sell their stocks immediately. As prices started to fall, groups of programs were triggered to sell at the same time. As each waypoint was passed, the subsequent price fall triggered another set of algorithms to automatically sell their stocks, producing a feedback effect“, the mere realisation that machine wins every time in a man versus machine way, but only toward the calculations. The initial part I mentioned regarding really low tides was ignored, so as the person realises that at some point the tide goes back up, no matter what, the machine never learned that part, because the ‘supermoon cycle’ was avoided due to pragmatism and we see that in the Guardian article with: ‘Flash crashes are now a recognised feature of augmented markets, but are still poorly understood‘. That reason remains speculative, but what if it is not the software? What if there is merely one set of definitions missing because the human factor auto corrects for that through insight and common sense? I can relate to that by setting the ‘insight’ that a supermoon happens perhaps once every 18 months and the common sense that it returns to normal within a day. Now, are we missing out on the opportunity of using a Neap Tide as an opportunity? It is merely an opportunity if another person fails to act on such a Neap tide. Yet in finance it is not merely a neap tide, it is an optional artificial wave that can change the waves when one system triggers another, and in nano seconds we have no way of predicting it, merely over time the option to recognise it at best (speculatively speaking).

We see a variation of this in the Go-game part of the article. When we see “AlphaGo played a move that stunned Sedol, placing one of its stones on the far side of the board. “That’s a very strange move,” said one commentator“, you see it opened us up to something else. So when we see “AlphaGo’s engineers developed its software by feeding a neural network millions of moves by expert Go players, and then getting it to play itself millions of times more, developing strategies that outstripped those of human players. But its own representation of those strategies is illegible: we can see the moves it made, but not how it decided to make them“. That is where I personally see the flaw. You see, it did not decide, it merely played every variation possible, the once a person will never consider, because it played millions of games , which at 2 games a day represents 1,370 years the computer ‘learned’ that the human never countered ‘a weird move’ before, some can be corrected for, but that one offers opportunity, whilst at the same time exposing its opponent to additional risks. Now it is merely a simple calculation and the human loses. And as every human player lacks the ability to play for a millennium, the hardware wins, always after that. The computer never learned desire, or human time constraints, as long as it has energy it never stops.

The article is amazing and showed me a few things I only partially knew, and one I never knew. It is an eye opener in many ways, because we are at the dawn of what is advanced machine learning and as soon as quantum computing is an actual reality we will get systems with the setting that we see in the Upsilon meson (Y). Leon Lederman discovered it in 1977, so now we have a particle that is not merely off or on, it can be: null, off, on or both. An essential setting for something that will be close to true AI, a new way of computers to truly surpass their makers and an optional tool to unlock the universe, or perhaps merely a clever way to integrate hardware and software on the same layer?

What I got from the article is the realisation that the entire IT industry is moving faster and faster and most people have no chance to stay up to date with it. Even when we look at publications from 2 years ago. These systems have already been surpassed by players like Google, reducing storage to a mere cent per gigabyte and that is not all, the media and entertainment are offered great leaps too, when we consider the partnership between Google and Teradici we see another path. When we see “By moving graphics workloads away from traditional workstations, many companies are beginning to realize that the cloud provides the security and flexibility that they’re looking for“, we might not see the scope of all this. So the article (at https://connect.teradici.com/blog/evolution-in-the-media-entertainment-industry-is-underway) gives us “Cloud Access Software allows Media and Entertainment companies to securely visualize and interact with media workloads from anywhere“, which might be the ‘big load’ but it actually is not. This approach gives light to something not seen before. When we consider makers from software like Q Research Software and Tableau Software: Business Intelligence and Analytics we see an optional shift, under these conditions, there is now a setting where a clever analyst with merely a netbook and a decent connection can set up the work frame of producing dashboards and result presentations from that will allow the analyst to produce the results and presentations for the bulk of all Fortune 500 companies in a mere day, making 62% of that workforce obsolete. In addition we see: “As demonstrated at the event, the benefits of moving to the cloud for Media & Entertainment companies are endless (enhanced security, superior remote user experience, etc.). And with today’s ever-changing landscape, it’s imperative to keep up. Google and Teradici are offering solutions that will not only help companies keep up with the evolution, but to excel and reap the benefits that cloud computing has to offer“. I take it one step further, as the presentation to stakeholders and shareholders is about telling ‘a story’, the ability to do so and adjust the story on the go allows for a lot more, the question is no longer the setting of such systems, it is not reduced to correctly vetting the data used, the moment that falls away we will get a machine driven presentation of settings the machine need no longer comprehend, and as long as the story is accepted and swallowed, we will not question the data. A mere presented grey scale with filtered out extremes. In the end we all signed up for this and the status quo of big business remains stable and unchanging no matter what the economy does in the short run.

Cognitive thinking from the AI thought the use of data, merely because we can no longer catch up and in that we lose the reasoning and comprehension of data at the high levels we should have.

I wonder as a technocrat how many victims we will create in this way.



Leave a comment

Filed under Finance, IT, Media, Science

Data illusions

Yesterday was an interesting day for a few reasons; one of the primary reasons was an opinion piece in the Guardian by Jay Watts (@Shrink_at_Large). Like many article I considered to be in opposition, yet when I reread it, this piece has all kinds of hidden gems and I had to ponder a few items for an hour or so. I love that! Any piece, article or opinion that makes me rethink my position is a piece well worth reading. So this piece called ‘Supermarkets spy on them now‘ (at https://www.theguardian.com/commentisfree/2018/may/31/benefits-claimants-fear-supermarkets-spy-poor-disabled) has several sides that require us to think and rethink issues. As we see a quote like “some are happy to brush this off as no big deal” we identify with too many parts; to me and to many it is just that, no big deal, but behind the issues are secondary issues that are ignored by the masses (en mass as we might giggle), yet the truth is far from nice.

So what do we see in the first as primary and what is behind it as secondary? In the first we see the premise “if a patient with a diagnosis of paranoid schizophrenia told you that they were being watched by the Department for Work and Pensions (DWP), most mental health practitioners would presume this to be a sign of illness. This is not the case today.” It is not whether this is true or not, it is not a case of watching, being a watcher or even watching the watcher. It is what happens behind it all. So, when we recollect that dead dropped donkey called Cambridge Analytics, which was all based on interacting and engaging on fear. Consider what IBM and Google are able to do now through machine learning. This we see in an addition to a book from O’Reilly called ‘The Evolution of Analytics‘ by Patrick Hall, Wen Phan, and Katie Whitson. Here we see the direct impact of programs like SAS (Statistical Analysis System) in the application of machine learning, we see this on page 3 of Machine Learning in the Analytic Landscape (not a page 3 of the Sun by the way). Here we see for the government “Pattern recognition in images and videos enhance security and threat detection while the examination of transactions can spot healthcare fraud“, you might think it is no big deal. Yet you are forgetting that it is more than the so called implied ‘healthcare fraud‘. It is the abused setting of fraud in general and the eagerly awaited setting for ‘miscommunication’ whilst the people en mass are now set in a wrongly categorised world, a world where assumption takes control and scores of people are now pushed into the defence of their actions, an optional change towards ‘guilty until proven innocent’ whilst those making assumptions are clueless on many occasions, now are in an additional setting where they believe that they know exactly what they are doing. We have seen these kinds of bungles that impacted thousands of people in the UK and Australia. It seems that Canada has a better system where every letter with the content: ‘I am sorry to inform you, but it seems that your system made an error‘ tends to overthrow such assumptions (Yay for Canada today). So when we are confronted with: “The level of scrutiny all benefits claimants feel under is so brutal that it is no surprise that supermarket giant Sainsbury’s has a policy to share CCTV “where we are asked to do so by a public or regulatory authority such as the police or the Department for Work and Pensions”“, it is not merely the policy of Sainsbury, it is what places like the Department for Work and Pensions are going to do with machine learning and their version of classifications, whilst the foundation of true fraud is often not clear to them, so you want to set a system without clarity and hope that the machine will constitute learning through machine learning? It can never work, that evidence is seen as the initial classification of any person in a fluidic setting is altering on the best of conditions. Such systems are not able to deal with the chaotic life of any person not in a clear lifestyle cycle and people on pensions (trying to merely get by) as well as those who are physically or mentally unhealthy. These are merely three categories where all kind of cycles of chaos tend to intervene with their daily life. Those are now shown to be optionally targeted with not just a flawed system, but with a system where the transient workforce using those methods are unclear on what needs to be done as the need changes with every political administration. A system under such levels of basic change is too dangerous to get linked to any kind of machine learning. I believe that Jay Watts is not misinforming us; I feel that even the writer here has not yet touched on many unspoken dangers. There is no fault here by the one who gave us the opinion piece, I personally believe that the quote “they become imprisoned in their homes or in a mental state wherein they feel they are constantly being accused of being fraudulent or worthless” is incomplete, yet the setting I refer to is mentioned at the very end. You see, I believe that such systems will push suicide rates to an all-time high. I do not agree with “be too kind a phrase to describe what the Tories have done and are doing to claimants. It is worse than that: it is the post-apocalyptic bleakness of poverty combined with the persecution and terror of constantly feeling watched and accused“. I believe it to be wrong because this is a flaw on both sides of the political aisle. Their state of inaction for decades forced the issue out and as the NHS is out of money and is not getting any money the current administration is trying to find cash in any way that they can, because the coffers are empty, which now gets us to a BBC article from last year.

At http://www.bbc.com/news/election-2017-39980793, we saw “A survey in 2013 by Ipsos Mori suggested people believed that £24 out of every £100 spent on benefits was fraudulently claimed. What do you think – too high, too low?
Want to know the real answer? It is £1.10 for every £100
“. That is the dangerous political setting as we should see it; the assumption and believe that 24% is set to fraud when it is more realistic that 1% might be the actual figure. Let’s not be coy about it, because out of £172.3bn a 1% amount still remains a serious amount of cash, yet when you set it against the percentage of the UK population the amount becomes a mere £25 per person, it merely takes one prescription to get to that amount, one missed on the government side and one wrongly entered on the patients side and we are there. Yet in all that, how many prescriptions did you the reader require in the last year alone? When we get to that nitty gritty level we are confronted with the task where machine learning will not offer anything but additional resources to double check every claimant and offense. Now, we should all agree that machine learning and analyses will help in many ways, yet when it comes to ‘Claimants often feel unable to go out, attempt voluntary work or enjoy time with family for fear this will be used against them‘ we are confronted with a new level of data and when we merely look at the fear of voluntary work or being with family we need to consider what we have become. So in all this we see a rightful investment into a system that in the long run will help automate all kinds of things and help us to see where governments failed their social systems, we see a system that costs hundreds of millions, to look into an optional 1% loss, which at 10% of the losses might make perfect sense. Yet these systems are flawed from the very moment they are implemented because the setting is not rational, not realistic and in the end will bring more costs than any have considered from day one. So in the setting of finding ways to justify a 2015 ‘The Tories’ £12bn of welfare cuts could come back to haunt them‘, will not merely fail, it will add a £1 billion in costs of hardware, software and resources, whilst not getting the £12 billion in workable cutbacks, where exactly was the logic in that?

So when we are looking at the George Orwell edition of edition of ‘Twenty Eighteen‘, we all laugh and think it is no great deal, but the danger is actually two fold. The first I used and taught to students which gets us the loss of choice.

The setting is that a supermarket needs to satisfy the need of the customers and the survey they have they will keep items in a category (lollies for example) that are rated ‘fantastic value for money‘ and ‘great value for money‘, or the top 25th percentile of the products, whatever is the largest. So in the setting with 5,000 responses, the issue was that the 25th percentile now also included ‘decent value for money‘. So we get a setting where an additional 35 articles were kept in stock for the lollies category. This was the setting where I showed the value of what is known as User Missing Values. There were 423 people who had no opinion on lollies, who for whatever reason never bought those articles, This led to removing them from consideration, a choice merely based on actual responses; now the same situation gave us the 4,577 people gave us that the top 25th percentile only had ‘fantastic value for money‘ and ‘great value for money‘ and within that setting 35 articles were removed from that supermarket. Here we see the danger! What about those people who really loved one of those 35 articles, yet were not interviewed? The average supermarket does not have 5,000 visitors, it has depending on the location up to a thousand a day, more important, when we add a few elements and it is no longer about supermarkets, but government institutions and in addition it is not about lollies but Fraud classification? When we are set in a category of ‘Most likely to commit Fraud‘ and ‘Very likely to commit Fraud‘, whilst those people with a job and bankers are not included into the equation? So we get a diminished setting of Fraud from the very beginning.

Hold Stop!

What did I just say? Well, there is method to my madness. Two sources, the first called Slashdot.org (no idea who they were), gave us a reference to a 2009 book called ‘Insidious: How Trusted Employees Steal Millions and Why It’s So Hard for Banks to Stop Them‘ by B. C. Krishna and Shirley Inscoe (ISBN-13: 978-0982527207). Here we see “The financial crisis appears to be exacerbating fraud by bank employees: a new survey found that 72 percent of financial institutions say that in the last 12 months they have experienced a case of data theft by one of their workers“. Now, it is important to realise that I have no idea how reliable these numbers are, yet the book was published, so there will be a political player using this at some stage. This already tumbles to academic reliability of Fraud in general, now for an actual reliable source we see KPMG, who gave us last year “KPMG survey reveals surge in fraud in Australia“, with “For the period April 2016 to September 2016, the total value of frauds rose by 16 percent to a total of $442m, from $381m in the previous six month period” we see number, yet it is based on a survey and how reliable were those giving their view? How much was assumption, unrecognised numbers and based on ‘forecasted increases‘ that were not met? That issue was clearly brought to light by the Sydney Morning Herald in 2011 (at https://www.smh.com.au/technology/piracy-are-we-being-conned-20110322-1c4cs.html), where we see: “the Australian Content Industry Group (ACIG), released new statistics to The Age, which claimed piracy was costing Australian content industries $900 million a year and 8000 jobs“, yet the issue is not merely the numbers given, the larger issue is “the report, which is just 12 pages long, is fundamentally flawed. It takes a model provided by an earlier European piracy study (which itself has been thoroughly debunked) and attempts to shoe-horn in extrapolated Australian figures that are at best highly questionable and at worst just made up“, so the claim “4.7 million Australian internet users engaged in illegal downloading and this was set to increase to 8 million by 2016. By that time, the claimed losses to piracy would jump to $5.2 billion a year and 40,000 jobs” was a joke to say the least. There we see the issue of Fraud in another light, based on a different setting, the same model was used, and that is whilst I am more and more convinced that the European model was likely to be flawed as well (a small reference to the Dutch Buma/Stemra setting of 2007-2010). So not only are the models wrong, the entire exercise gives us something that was never going to be reliable in any way shape or form (personal speculation), so in this we now have the entire Machine learning, the political setting of Fraud as well as the speculated numbers involved, and what is ‘disregarded’ as Fraud. We will end up with a scenario where we get 70% false positives (a pure rough assumption on my side) in a collective where checking those numbers will never be realistic, and the moment the parameters are ‘leaked’ the actual fraudulent people will change their settings making detection of Fraud less and less likely.

How will this fix anything other than the revenue need of those selling machine learning? So when we look back at the chapter of Modern Applications of Machine Learning we see “Deploying machine learning models in real-time opens up opportunities to tackle safety issues, security threats, and financial risk immediately. Making these decisions usually involves embedding trained machine learning models into a streaming engine“, that is actually true, yet when we also consider “review some of the key organizational, data, infrastructure, modelling, and operational and production challenges that organizations must address to successfully incorporate machine learning into their analytic strategy“, the element of data and data quality is overlooked on several levels, making the entire setting, especially in light of the piece by Jay Watts a very dangerous one. So the full title, which is intentionally did not use in the beginning ‘No wonder people on benefits live in fear. Supermarkets spy on them now‘, is set wholly on the known and almost guaranteed premise that data quality and knowing that the players in this field are slightly too happy to generalise and trivialise the issue of data quality. The moment that comes to light and the implementers are held accountable for data quality is when all those now hyping machine learning, will change their tune instantly and give us all kinds of ‘party line‘ issues that they are not responsible for. Issues that I personally expect they did not really highlight when they were all about selling that system.

Until data cleaning and data vetting gets a much higher position in the analyses ladder, we are confronted with aggregated, weighted and ‘expected likelihood‘ generalisations and those who are ‘flagged’ via such systems will live in constant fear that their shallow way of life stops because a too high paid analyst stuffed up a weighting factor, condemning a few thousand people set to be tagged for all kind of reasons, not merely because they could be optionally part of a 1% that the government is trying to clamp down on, or was that 24%? We can believe the BBC, but can we believe their sources?

And if there is even a partial doubt on the BBC data, how unreliable are the aggregated government numbers?

Did I oversimplify the issue a little?



Leave a comment

Filed under Finance, IT, Media, Politics, Science

The sting of history

There was an interesting article on the BBC (at http://www.bbc.com/news/business-43656378) a few days ago. I missed it initially as I tend to not dig too deep into the BBC past the breaking news points at times. Yet there it was, staring at me and I thought it was rather funny. You see ‘Google should not be in business of war, say employees‘, which is fair enough. Apart from the issue of them not being too great at waging war and roughing it out, it makes perfect sense to stay away from war. Yet is that possible? You see, the quote is funny when you see ‘No military projects‘, whilst we are all aware that the internet itself is an invention of DARPA, who came up with it as a solution that addressed “A network of such [computers], connected to one another by wide-band communication lines [which provided] the functions of present-day libraries together with anticipated advances in information storage and retrieval and [other] symbiotic functions“, which let to ARPANET and became the Internet. So now that the cat is out of the bag, we can continue. The objection they give is fair enough. When you are an engineer who is destined to create a world where everyone communicates to one another, the last thing you want to see is “Project Maven involves using artificial intelligence to improve the precision of military drone strikes“. I am not sure if Google could achieve it, but the goal is clear and so is the objection. The BBC article show merely one side, when we go to the source itself (at https://www.defense.gov/News/Article/Article/1254719/project-maven-to-deploy-computer-algorithms-to-war-zone-by-years-end/), in this I saw the words from Marine Corps Colonel Drew Cukor: “Cukor described an algorithm as about 75 lines of Python code “placed inside a larger software-hardware container.” He said the immediate focus is 38 classes of objects that represent the kinds of things the department needs to detect, especially in the fight against the Islamic State of Iraq and Syria“. You see, I think he has been talking to the wrong people. Perhaps you remember the project SETI screensaver. “In May 1999 the University of California launched SETI@Home. SETI stands for the” Search for Extraterrestrial Intelligence,” Originally thought that it could at best recruit only a thousand or so participants, more than a million people actually signed up on the day and in the process overwhelmed the meager desktop PC that was set aside for this project“, I remember it because I was one of them. It is in that trend that “SETI@Home was built around the idea that people with personal computers who often leave them to do something else and then just let the screensaver run are actually wasting good computing resources. This was a good thing, as these ‘idle’ moments can actually be used to process the large amount of data that SETI collects from the galaxy” (source: Manilla Times), they were right. The design was brilliant and simple and it worked better than even the SETI people thought it would, but here we now see the application, where any android (OK, IOS too) device created after 2016 is pretty much a supercomputer at rest. You see, Drew Cukor is trying to look where he needs to look, it is a ‘flaw’ he has as well as the bulk of all the military. You see, when you look for a target that is 1 in 10,000, so he needs to hit the 0.01% mark. This is his choice and that is what he needs to do, I am merely stating that by figuring out where NOT to look, I am upping his chances. If I can set the premise of illuminating 7,500 false potential in a few seconds, his job went from a 0.01% chance to 0.04%, making his work 25 times easier and optionally faster. Perhaps the change could eliminate 8,500 or even 9,000 flags. Now we are talking the chances and the time frame we need. You see, it is the memo of Bob Work that does remain an issue. I disagree with “As numerous studies have made clear, the department of defense must integrate artificial intelligence and machine learning more effectively across operations to maintain advantages over increasingly capable adversaries and competitors,“. The clear distinction is that those people tend to not rely on a smartphone, they rely on a simple Nokia 2100 burner phone and as such, there will be a complete absence of data, or will there be? As I see it, to tackle that, you need to be able to engage is what might be regarded as a ‘Snippet War‘, a war based on (a lot of) ‘small pieces of data or brief extracts‘. It is in one part cell tower connection patterns, it is in one part tracking IMEI (International Mobile Equipment Identity) codes and a part of sim switching. It is a jumble of patterns and normally getting anything done will be insane. Now what happens when we connect 100 supercomputers to one cell tower and mine all available tags? What happens when we can disseminate these packages and let all those supercomputers do the job? Merely 100 smart phones or even 1,000 smart phones per cell tower. At that point the war changes, because now we have an optional setting where on the spot data is offered in real time. Some might call it ‘the wet dream’ of Marine Corps Col. Drew Cukor and he was not ever aware that he was allowed to adult dream to that degree on the job, was he?

Even as these people are throwing AI around like it is Steven Spielberg’s chance to make a Kubrick movie, in the end it is a new scale and new level of machine learning, a combination of clustered flags and decentralised processing on a level that is not linked to any synchronicity. Part of this solution is not in the future, it was in the past. For that we need to read the original papers by Paul Baran in the early 60’s. I think we pushed forward to fast (a likely involuntary reaction). His concept of packet switching was not taken far enough, because the issues of then are nowhere near the issues of now. Consider raw data as a package and the transmission itself set the foundation of the data path that is to be created. So basically the package becomes the data entry point of raw data and the mobile phone processes this data on the fly, resetting the data parameters on the fly, giving instant rise to what is unlikely to be a threat and optionally what is), a setting where 90% could be parsed by the time it gets to the mining point. The interesting side is that the container for processing this could be set in the memory of most mobile phones without installing stuff as it is merely processing parsed data, not a nice, but essentially an optional solution to get a few hundred thousand mobiles to do in mere minutes what takes a day by most data centres, they merely receive the first level processed data, now it is a lot more interesting, as thousands are near a cell tower, that data keeps on being processed on the fly by supercomputers at rest all over the place.

So, we are not as Drew states ‘in an AI arms race‘, we are merely in a race to be clever on how we process data and we need to be clever on how to get these things done a lot faster. The fact that the foundation of that solution is 50 years old and still counts as an optional way in getting things done merely shows the brilliance of those who came before us. You see, that is where the military forgot the lessons of limitations. As we shun the old games like the CBM 64, and applaud the now of Ubisoft. We forget that Ubisoft shows to be graphically brilliant, having the resources of 4K camera’s, whilst those on the CBM-64 (Like Sid Meier) were actually brilliant for getting a workable interface that looked decent as they had the mere resources that were 0.000076293% of the resources that Ubisoft gets to work with me now. I am not here to attack Ubisoft, they are working with the resources available, I am addressing the utter brilliance of people like Sid Meier, David Braben, Richard Garriott, Peter Molyneux and a few others for being able to do what they did with the little they had. It is that simplicity and the added SETI@Home where we see the solutions that separates the children from the clever Machine learning programmers. It is not about “an algorithm of about 75 lines of Python code “placed inside a larger software-hardware container.”“, it is about where to set the slicer and how to do it whilst no one is able to say it is happening whilst remaining reliable in what it reports. It is not about a room or a shopping mall with 150 servers walking around the place, it is about the desktop no one notices who is able to keep tabs on those servers merely to keep the shops safe that is the part that matters. The need for brilliance is shown again in limitations when we realise why SETI@Home was designed. It opposes in directness the quote “The colonel described the technology available commercially, the state-of-the-art in computer vision, as “frankly … stunning,” thanks to work in the area by researchers and engineers at Stanford University, the University of California-Berkeley, Carnegie Mellon University and Massachusetts Institute of Technology, and a $36 billion investment last year across commercial industry“, the people at SETI had to get clever fast because they did not get access to $36 billion. How many of these players would have remained around if it was 0.36 billion, or even 0.036 billion? Not too many I reckon, the entire ‘the technology available commercially‘ would instantly fall away the moment the optional payoff remains null, void and unavailable. $36 billion investment implies that those ‘philanthropists’ are expecting a $360 billion payout at some point, call me a sceptic, but that is how I expect those people to roll.

The final ‘mistake’ that Marine Corps Col. Drew Cukor makes is one that he cannot be blamed for. He forgot that computers should again be taught to rough it out, just like the old computers did. The mistake I am referring to is not an actual mistake, it is more accurately the view, the missed perception he unintentionally has. The quote I am referring to is “Before deploying algorithms to combat zones, Cukor said, “you’ve got to have your data ready and you’ve got to prepare and you need the computational infrastructure for training.”“. He is not stating anything incorrect or illogical, he is merely wrong. You see, we need to realise the old days, the days of the mainframe. I got treated in the early 80’s to an ‘event’. You see a ‘box’ was delivered. It was the size of an A3 flatbed scanner, it had the weight of a small office safe (rather weighty that fucker was) and it looked like a print board on a metal box with a starter engine on top. It was pricey like a middle class car. It was a 100Mb Winchester Drive. Yes, 100Mb, the mere size of 4 iPhone X photographs. In those days data was super expensive, so the users and designers had to be really clever about data. This time is needed again, not because we have no storage, we have loads of it. We have to get clever again because there is too much data and we have to filter through too much of it, we need to get better fast because 5G is less than 2 years away and we will drown by that time in all that raw untested data, we need to reset our views and comprehend how the old ways of data worked and prevent Exabyte’s of junk per hour slowing us down, we need to redefine how tags can be used to set different markers, different levels of records. The old ways of hierarchical data was too cumbersome, but it was fast. The same is seen with BTree data (a really antiquated database approach), instantly passing through 50% data in every iteration. In this machine learning could be the key and the next person that comes up with that data solution would surpass the wealth of Mark Zuckerberg pretty much overnight. Data systems need to stop being ‘static’, it needs to be a fluidic and dynamic system, that evolves as data is added. Not because it is cleverer, but because of the amounts of data we need to get through is growing near exponentially per hour. It is there that we see that Google has a very good reason to be involved, not because of the song ‘Here come the drones‘, but because this level of data evolution is pushed upon nearly all and getting in the thick of things is when one remains the top dog and Google is very much about being top dog in that race, as it is servicing the ‘needs’ of billions and as such their own data centres will require loads of evolution, the old ways are getting closer and closer to becoming obsolete, Google needs to be ahead before that happens, and of course when that happens IBM will give a clear memo that they have been on top of it for years whilst trying to figure out how to best present the delays they are currently facing.

Leave a comment

Filed under IT, Media, Military, Science