E2 - Machine Learning for the Rest of Us Artwork

The Technology Sounding Board

The Technology Sounding Board is a forum for thinkers, consumers and evangelists of technology in the Enterprise to get together and discuss new ideas...the good, the bad and the ugly!

All Episodes

The Technology Sounding Board

E2 - Machine Learning for the Rest of Us

September 12, 2022 • Michael R. Gilbert • Episode 2

Send us a text

A layman's guide to the world of Machine Learning, how it can be leveraged in the Enterprise, and a peek at the risks of proceeding without practical expertise, deliberate purpose and sound guiding principles.

Michael R. Gilbert: 0:01

A few years ago, my world and the world of machine learning collided, and with it came a whole host of new concepts and words that were really quite confusing: AI, Machine Learning, Deep Learning, Neural Networks, Stochastic Gradient Descent, things I couldn't even pronounce. And what I really wanted at the time was a gentle introduction in plain English, that would explain to me what all these things were, how they fit together, and what it could mean to me. And I really couldn't find it. If you're in that situation, and you suddenly need to know about machine learning and aren't an expert. Well, let's talk about it. Hello, and welcome to The Technology Sounding Board. I'm your host, Michael R. Gilbert. And in this episode, we're gonna talk about Machine Learning. Now, this is gonna be a little bit different from some of our other episodes, there's no guest on this, this is just my viewpoint, and it's really meant to be a setup for some episodes that we will have in the future, when we're talking to some real world practitioners using ML in their space to drive improvements in their enterprises. But, as I outlined in the beginning, there's a lot going on in this space and it's quite confusing. If it's not something that you do on a day to day basis, it can be hard to understand what is what, what matters, and how it all hangs together. So what I want to do is I want to lay out what this means, what the terminology means, what ML is, what it isn't, and what it can do for you. And as I said in the title, this is meant to be machine learning for the rest of us, no PhD required, I promise you, there will be no math, and we will keep it in plain English where ever we possibly can. So with that, let's begin. If you want to define Machine Learning, we should probably start by asking what is learning? And it may seem like a silly question. But when you think about it, it's it's kind of harder to actually lock down, then you first think. So I'll put it out there that learning isn't memorization. Now what do I mean by that? Let's start by thinking about the the old card game we've all played when we were younger, the memory card game. And in case you haven't, just imagine a set of cards, let's say 64 cards, and each card could have one of 32 pictures. So for every picture, it'll have a card, a twin card that had the same picture. We put all these cards out on the floor, face down and if we're an organized sort of people, we might put it in an eight by eight grid and if not, maybe it's all Higgledy Piggledy, doesn't really matter. And each person takes a turn and the first person turns over a card, and then tries to pick another card by turning that over to see if it matches. If it doesn't match, then both cards get turned back again. And it's the next player's turn. If it does match, you get to keep those cards and you get to try again. The aim of the game is, obviously, to find all the pairs. And at the end of the game, the person with the most pairs wins. Now clearly, memory helps you win this game, being able to remember where either something you turned over or something your opponents turned over was, so that when you find its twin, in your next round, you can remember where the original was and turn it over. But memory will not help you win the game in concept, right? Winning one game and memorizing all the pair's doesn't make you any more likely to win or lose the next game. We haven't learned anything. We simply memorized. Now, for sure, having a memory is important. Otherwise, we can't learn anything, but it's insufficient. So what is learning? I'm going to propose that learning is really about pattern recognition, about recognizing patterns when we see them remembering those patterns, so that we can then view the data in the world around us and infer new knowledge from the patterns that we've learned. So for example, if you remember that far back, you'll remember that when we teach kids to multiply, usually we make them learn the times table, two times two is four, three times two is six, four times two is eight, etc., etc., etc. It's effective. They learn how to multiply in the sense that they can apply the table. But the table gets very large very quickly and remembering all that is difficult. Very soon, whether we teach them the tricks or not, the child will figure out that there are certain patterns in the data. Let's take the nine times table for example. Two times nine is 18. One and eight add up to nine. Three times nine is 27, two and seven, add up to nine. Well, wait a minute, is that always true? And so we can test these and see that, all the way up to 10, this works out very nicely. So we don't need to remember the nine times table anymore, we can figure it out, we'll take whatever multiple of nine, were being asked to give, let's say five, we take one off, it becomes four, we find the number that makes for up to nine, which is five, so that must be 45. And suddenly, we got much, much faster. And as well as being faster, we're no longer having to literally memorize huge amounts of data. And so that's another way we could look at learning, to say, in effect, it's data compression. we're taking, instead of just remembering everything we could ever remember, we're remembering a set of rules, a set of patterns about that data, which can stand in place of all that data. And so we can infer what's going on. Well great. Okay, so if we have a handle on what learning might be, what does it mean for machine learning? Well, that would be teaching a machine to be able to recognize patterns and infer new data from old data. Okay, but isn't that what every computer program does? We, when we write a program, we have encoded an algorithm, which will take input data in and give some output data, infer the right answer from the input data we were given. And we're not storing all of this in some massive lookup table. It's an algorithm. And so that's another difference I want to draw here. There's a difference in this world of between learning and teaching. Computer programming, writing computer programs, we can think of as being teaching the computer, how to recognize that pattern, how to infer the new data, what we want to do is have the computer learn that for itself, figure it out on its own. Look at the data and then infer a pattern, we don't want have to write the code. So machine learning, in a way, is a way of automating the production of programming. So that we can get machines to make themselves smarter. Now that touches on another subject, which is one of

Artificial Intelligence: 7:15

Machine Learning, ML, Artificial Intelligence, AI. And you see AI and ML used almost interchangeably in a lot of literature. Are they the same thing? Well, again, I'm gonna argue no, they aren't right? Artificial Intelligence is about the idea of making a computer behave in a more human way about allowing them to be able to deal with a world of imprecision, to be able to be able to recognize things in natural language where the language itself doesn't necessarily make complete structured sense, possibly even to be able to emote, so that they can relate to us. And we can relate to them in more natural ways. Examples today, obviously, the personal assistants, Alexa and Siri, come to mind. But actually, this concept has been around for a long time, really started in earnest, I would say in the late 50s, and made some big progress in the 60s, the first commercial success of artificial intelligence was in the 1980s. And in that time, artificial intelligence really meant expert systems. And these were systems that could, again, take a set of rules that have been created, that were created by actually listening to experts, by writing down how they understood the world, and then letting it interpret the data that it sees to come up with new information. And though there were some commercial successes in the 80's, they did not take off the way we thought they would. And that's because actually encapsulating these rules turned out to be really hard. You have to spend a lot of time with experts, and you only you only know what you know, you can only grow what can be defined. And so I think we could say that machine learning was a prerequisite for artificial intelligence to explode, we have to find a way of generating vast amounts of knowledge, vast amounts of new learning for these things to sit on top of. And that's the area that machine learning looks at, which helps and supports artificial intelligence, isn't the same thing. And indeed, machine learning can be useful even without artificial intelligence. I think of artificial intelligence as being sort of replacing humans in the decision process. Whereas machine learning is really in decision support in human processes. And that's my perspective. I'm not sure that everyone necessarily would agree with it. But I think it's a good way to slice the line between what is AI and what is ML? Okay, having drawn a line between AI and ML, we're not really going to discuss AI any further. The focus here is on the ML and machine learning side of things. One last point I want to make before we get into the details of it is there are two or two forms of machine learning that we can look at, supervised and unsupervised. Now, supervised is where most of the work is really being done, unsupervised is what we would really like it to be, the difference? Well, I'd like to be able to say, look, here's some data I've collected from the world around me, tell me something interesting, tell me something I didn't know, that I can use. And the example that we can use in this space that is quite commonly used is clustering. So we've got a website, we do a lot of e-commerce, and we've got a lot of information about what our customers actually buy, their behavior on the site and what have you. And we want to build a marketing campaign to target them for things that might be of interest to them. Marketing campaigns are expensive. So we don't want to create a huge number of them. On the other hand, the more specific a marketing campaign is to the end user , the target, the more effective it's going to be. So I want to be able to say, Okay, I've got enough budget for five different marketing campaigns, let's take this data. And let's split these customers into five groups that are themselves different in some meaningful way. And, you know, hey, let me know what that meaningful way is. So I can then build something that targets those groups, which will hopefully be more successful than just a broadcast, but allows me to sort of cut it down into size, a more meaningful chunk. And we can do that today that clustering algorithms are pretty easy. And that can be done. That's not where we are, in most levels of machine learning. Most levels of machine learning are really in the supervised model. And what that's about is kind of more like the the traditional sort of teaching by giving you some some data, asking you to answer some homework, grading your homework, and then allowing you to get better based on the things that you got right on the homework and the things you didn't get right on that homework. And the more we do it, the better you get. That's supervised learning. And that's really where we are today in the state of the state of the art. Okay, to briefly recap where we are, we talked about the fact that machine learning is about getting a computer to build inference patterns from data that it sees I kind of program itself, that it isn't AI, and helps AI but it isn't AI. And we're not talking about that here. And that specifically, we are looking about supervised learning, versus unsupervised learning I how we help the machine learn by giving it some kind of graded answers to its initial guesses. Well, let's talk about how we do that. And let's again go back to well, what would we do as humans? And can we take that as a model and use that for helping a computer get to the same place? Let's think about what we would do if we wanted to make inference based on data that we could observe, and how would we approach it. Let's take a really simple example. Let's say we commute to work. And obviously this example refers to a time back when we actually did commute to work. But let's suppose we can still remember what it's like to get up in the morning, have a breakfast, get in the car drive to work and try to be there for a certain time. We don't want to get there too early, that wastes our time. And we definitely don't want to be there late. But the problem is that actually the time it takes to get from here to there is variable, it would be nice somehow to be able to predict how long it's going to take. And so we might guess that actually the time it takes to travel is dependent on the amount of traffic on the road, the more traffic on the roads, the longer it's going to take. And the less traffic, the quicker it's going to be. Okay, sounds reasonable. So we could, we could just take a number of trips. And we could record on a plain old piece of graph paper. And we could record on the y axis, how long it took us and on the x axis some measure of how much traffic there was. And what we would see after a while is that these dots that we put on the graph kind of line up. Now, ideally, they would line up in a perfectly straight line. And we could just draw the line to connect all the missing dots, then we could say for any given amount of traffic, we simply read off the line from the x axis up to the Y axis. And that's going to tell us exactly down to the second how long it's going to take to get to work. It doesn't work that way of course, we know that, but we'll see something that is a cluster of dots, which generally gets higher as it goes further up, ie the more traffic we get, the more the dot cluster goes up. And we'll squint at that a bit. And we'll sort of draw an imaginary line that cuts more or less through the middle of it and that line will give us an estimate. And that estimate will be frankly, good enough. And so, hey, we just learned how to infer on a pattern. Let's see if we can translate that same idea into allowing a computer to do it. It turns out, of course, that actually, that's pretty easy to do the phrase for this is a regression. And as humans it helps for us to visualize it. And that's why we draw it on a piece of graph paper. But clearly, the computer doesn't need the visualization. It's easy for it to relate one set of numbers to another set of numbers. The difficulty, the complication, shall we say, is in how we make that straight line appear between the dots. Now, as humans, like I said, we could squint at it and say, well, that fits about right. But that looks about right, or TLAR, as we used to call it doesn't formalize well, to interpret for a computer, how are we going to get more precise than TLAR? What we now need is the concept of an error function. And what do I mean by that? So we have this line, this imaginary line that we think is the best fit through the cloud of dots. And what we can do is we could measure how far away from that line each dot is. Now there are various rules that we could use mean squared error, and L1 errors, etc, etc, it doesn't really matter. As long as we have a reason for using a particular error, we'll use that. So let's just say we're going to simply just measure the distance of each dot away from that line, take the absolute value, add them all up and average it. And that's going to be a measure of how good or bad the line is, we can move that line around. And as we do, that error function will either go up or go down. And if we keep moving in the direction that goes down, we will eventually find the lowest point it could be and that's going to be the best line, it won't be a perfect fit. If it was a perfect fit, they would all line up in a straight line on their own, and it would be obvious, but it's going to be the best fit. And that's the concept we're looking for. So this is going to come up again and again. And again, in machine learning that what we're really trying to do is take an approximation to fit a line to some data, and then find something that minimizes the error or minimizes the loss, as we would say sometimes in that approximation. And that's what we're going to do. So before I go any further in loss functions, let's just add a little bit more complexity to our little problem here. So far, we've said we're trying to estimate the time it's going to take to commute based on the level of traffic on the road. And it's not a straight line, and the dots are more or less sort of lining up, but they're not really lining up, big cloud. How could we get more accurate? Well, we can get more accurate by adding more data points by more information about the problem. So let's say well, the weather probably has some influence on how long it takes for the commute. And let's just put a number down between one is really good weather and 10 is really awful weather sort of, I don't know, Hurrican level nasty weather, if we then sort of want to plot, not only the time it took to go, but the amount of traffic was on the road and the weather that was on that day, we can't do that in two dimensions. But being human, we can understand things in three dimensions. And we can represent a picture that shows that data in three dimensions using a little bit of art. And now what we see is hills and peaks and valleys as opposed to just straight lines. But hopefully, we get a reasonably straight interpretation of this now in three dimensions. The beauty about allowing machines to do this is because they don't need any visualization because this is all about data to them. It doesn't matter how many dimensions there are, they could do this with 100 data points this, they could do this with 1000 data points with a million data points. There's no way we could draw that. So we can no longer use visualizations to represent it. But the computer doesn't care. It wasn't using the visualizations anyway. And so we start to get into some of the power that machine learning has versus how we humans would try to tackle this problem. Okay, so we've now kind of got the idea that in order to help a computer to learn to infer from the data, the challenge really is going to be how's it going to minimize the loss, it can make a guess we can we can calculate an error on that guess. How does it minimize a loss? To answer that, I want to paint a picture for you. And we're going to use a sort of three dimensional problem just like we talked about before. But this time, imagine that actually you've been out for a walk in the hills. And what the task before you now is, is to get back to base camp, back to where you started. And you started conveniently at the lowest point in the hills in the valley at the very, very lowest point. This is obviously representing minimizing a in this case loss function. In order to minimize the loss function, ie to get back to base camp, all you got to do is walk down the hill to the bottom. Well, let's make it just a little bit more complicated and say, you're actually shrouded in fog. Now you don't know which way to go, you can't see the base station. So how are you going to get there? One technique you might use would be to reach out with one foot and sort of tap the ground around you and try to find the steepest gradient, ie the thing that goes down the furthest, and take a small step in that direction. And then you might stop there and tap with your foot again and find out from there, which is the steepest downward direction of steepest gradient, and take another small step. And if you did this repeatedly, eventually, you'd get to the bottom, you get back to base camp, you would have minimized, in this case minimizes your altitude, but what we're representing here, of course, is minimizing our loss function. That how we're going to teach the computer to do this. And there's a name for this technique. And this technique is called gradient descent. And you can see where the name comes from, you may often hear Stochastic gradient descent, that's just a variant of the same thing where we randomize some of the input data that we use, rather than use all the input data. And that tends to be faster, but doesn't matter. It's a detail. Gradient Descent is the concept, stochastic gradient descent, or SGD is the technique, the variant of the technique that gets used the most often in this process. So how does the computer know which way is the steepest way down the slope? That's a mathematical technique called differentiation, which will take a formula and give you the slope that that formula is giving out. And I'm not going to get into differentiation in this discussion, thankfully. But it's, it's something that's doable. In most cases. There are some problems for which the function can't be differentiated. And then we would have to find other techniques to do that. But in the vast majority of cases, that's how we solve that problem. Okay, so our algorithms looking pretty good. Now, we collect some data. And for each element of data for each of these rows of data, we make sure we have a label on the data to say what the right answer would be. We let the computer basically guess how the data might map to the answer. And in reality, it doesn't matter how good or bad the individual guess is, because it can then go through and see how well it did it creates a loss function to define okay, how do I measure how good or bad this is, and then it uses the algorithm SGD, or something, some variant like it, in order to find the way to minimize the loss function, and it's going to get better and better and better. And so we've solved machine learning, right? No problems? Well, now there is one slight snag that we have yet to deal with, which is, bizarrely, it can get too good. So you may have heard of the problem of overfitting. What does that mean? Now again, go back to the human world. And imagine that we're teaching a class of students how to solve physics problems. And we're doing this by giving them a test bank of 1000 questions. And for each of these questions, they are given the right answer. And we say okay, go learn from these, try to guess what the answers are, when you get it right, great. When you get it wrong, adjust your, your guessing algorithms, so you get better and better and better. And soon enough, they will be able to answer any question they given from this bank of 1000 and get them right. But it's entirely possible, that they'll get it right because they're simply remembering the answer to every question. And if they do that, then when they are faced with an exam with questions that were not in the test bank, they won't be able to get it right, because they won't be able to generalize what they have learned from the specific data into the real world. And that's a real danger in machine learning. How do we solve that problem? Well, there are techniques to make the most out of smaller sets of data. And I'm not going to get into the details and the weeds of that they certainly are. But the basic principle is much, much larger training set. At some point, the data being fed in becomes so large, that it's too large for the model to remember the data, and it starts to extract the general answer. And you know, this is true again, when you when you look at humans, if you gave us the set of 1000 questions, it's possible they could remember all of the questions, who gave them 100,000 It's no way they can do it. Now they just have to start generalizing, and picking out the rules that will help them succeed. And now that brings into the one of the core drivers behind what's the cost and expense behind machine learning. If it's as easy as we just said it is why is it so expensive, and why do we need such well educated people who cost us a large amount of money to do it? Number one reason is the size of the data getting hold of enough data, and remember, this data just can't be something you pull at random, it has to be labeled data. So we have to know what the right answer is. That's an exercise in and of itself. Now, if we take the fact that we're going to chew through an awfully large amount of data, and we're going to do through it in an iterative process, where we're going to get our answer, we're going to measure our homework, and we're going to improve, you can see that depending on the problem we're trying to solve, this can take a long time, too. So you've got a lot of expensive equipment, and you've got a lot of people tied up for a long time trying to chew data down, these things, collectively, can lead to the difficulty. And the third problem I want to talk about very, very briefly, is the nature of the problem. And its relation to the amount of data that we need. And I don't mean here, the number of examples. But the amount of indicators. In our really, really easy example, we were looking at two data points, how much traffic was on the road. And what's the weather like? Let's suppose that we're trying to do some analysis on on photographs, facial recognition. And let's say we're using 4k photographs, which are not uncommon today, then we are talking about something in the region of 8 million data points. And each of those data points actually has three different colors if we're talking about a color photograph, and it's 8.3 million per, which means were 25 million, basically 25 million data points, we're looking at the analyze a full color 4K a picture, that a lot of data points. Some problems are related in terms of how long it takes to compute in an order n, which means the larger this number of data points is, the larger the time taken some related to n squared, well, literally, as the number goes up, the time goes up by the square of the number. And if you start to square 25 million, you get to a very big number. Some problems are related not to the square, or even the cube bigger numbers like that. But the exponential of it. So there's some constant, like, say 10 to the power of n, well, 10 to the power of 25 million. I don't know what that number is, but it's so large that it wouldn't matter how many computers you had, you could run all those computers for as long as the universe has existed, and you still wouldn't solve the problem, they're essentially intractable. So there are limits to what this can do. And they're, the problems themselves have to be in what we call a polynomial space, in order to be able to actually solve them. And even then they can take some serious time. Before we get into what we can and can't do with Machine Learning. I want to talk very briefly about Deep Learning. Now, what is Deep Learning? And how does it relate to Neural Networks, which seem to come up with a conversation anytime Deep Learning comes up? And how does all that relate to Machine Learning as we've just been discussing it. And in order to discuss Deep Learning, we really want to take a moment and think of what we've just been talking about as being shallow learning, with a caveat that just about no one in the universe uses the phrase shallow learning. Maybe they should, but they don't. What do I mean by that? Well, we're making a couple of really important assumptions that we haven't really made explicit yet, in what we've done so far. So let's go back to our example, where we're trying to estimate the time it takes to commute to work based on a couple of variables. We've said, Well, we're going to look at the traffic density. And we're going to look at the weather. And we're going to use those two, to predict the outcome. Number one, we said that, collectively, they're going to relate to the outcome via some kind of line, which is a straight line. It's a linear problem, as we would say. And we accept that it really isn't a linear problem, but it's going to be straight enough that our approximation is going to be useful. Well, that's one assumption that may or may not hold out. And two we're, we're basically saying we can leap to the answer from looking directly at the data. And in this case, we can. We can see the traffic density and we can see the weather. And that will give us a good indication of the expected commute time. But real world problems that we as humans solve all the time, are often a lot more complex than that. So if you were to do something as simple, simple in terms of human capabilities, of looking at a series of pictures that we've taken, and identifying which of those pictures contained a cat or a dog, you would find that very, very easy and maybe even sorting between here's a whole lot of pictures and some of them are cats and some of them are Dogs, which ones are which? Trivial. When you think about what you're really doing, however, is actually quite complex. You're identifying that an area in this photograph represents an animal. And you're doing that by saying, Okay, well, this is a texture that looks like fur, this is a texture that looks like skin. And this is a texture that looks like scales, whatever. And here are some shapes that look like eyes. And this is a shape that looks like a mouth and has a shape that looks like a nose. And so we can also relate the distance between the eyes and the nose and the mouth, and say, Okay, well, that geometry turns up in cats a lot. And so this looks like it's a cat versus a slightly different geometry looks like it's a dog. So we didn't just look at the picture, and make the decision based on the data as we saw it, we recognize the pattern in the data. And we recognize the pattern in the pattern of patterns. And we recognize the pattern and the pattern, the pattern, the patterns, and so on and so forth, we built up a whole level of reasoning from the raw data into multiple layers, in order to get to the conclusion. And these multiple layers, that's what we're referring to when we're talking about deep learning, being able to lay one on top of the other on top of the next to get deeper into understanding. So neural networks are a technique that AI is currently the probably the predominant, I mean, almost certainly the predominant technique used in deep learning for how to construct this layer of learning on top of layer of learning on top of layer of learning. And in neural networks, we're doing the same sort of processing that we were doing before the idea of constructing a linear reasoning from a layer of data, in order to give us an output, then we're feeding that output into another linear layer, and then feeding that out into another linear layer. And so we're using the same techniques that we've just done with stacking them on top of each other. But in a neural network, we do something very interesting, which is to insert in between each of these linear layers, a nonlinear layer. And what do I mean by that? And why is this interesting? Well, okay, so a nonlinear function. And there are lots to choose from. But we're going to talk about, for example, the one probably most commonly used is a ReLU, R- E - L -U, which stands for rectified linear unit, in case you ever wanted to know, but you probably never, ever want to know anyway. What that is very simple, right? For any input, if that input is negative, the answer is zero. So if I send in minus five to a ReLU, I'm gonna get zero, if I send in minus 10, to a ReLU, I get zero. For any input that's greater than zero, I'm going to get that output. So if I send in three, the answer is three, if I send in 10, the answer is 10. Now, this is a very, very simple function, but it does something very, very interesting. It basically turns into a switch, any number that is out of my original first layer of line that comes out as negative is going to disappear, it's going to turn off, and any number that stays positive is going to stay on. And as we go through these layers, and between each layer, we have a switch, what we see sort of streaks of activity occurring between these layers, which looks very reminiscent of the streaks of activity we see as neurons in a in a real brain fire. And that's where it gets his name the neural network. What that does, mathematically is quite astonishing. If we now stack these layers together with nonlinear layers in between them, we can now model any arbitrary function, it doesn't matter whether it's linear or curvy or so curvy, there aren't words to describe how curvy it is, we can model it. And that means we can answer arbitrary questions by using neural networks. Notwithstanding, we still have the same issues we had before. We need a lot of data. And we need a lot of processing power. And it can take a long time. And there are some problems for which it doesn't matter how much time we had, it will take more time than we have in the entire universe to answer. But the problem space is now much, much larger things we can solve with this technique are really complicated, like facial recognition, like identifying animals in a piece of scenery, for example. So I'm sure you can imagine that's where all the excitement is in the machine learning space these days in deep learning, specific neural networks and the things that they can drive. This really is at the cutting edge of our understanding how to construct neural networks so that they perform Well, is more art than science, I would say at the moment. And so it takes very clever people a lot of experience and a lot of sometimes just simply trial and error to to find the best approach. The last concept I want to talk about before I get on to applications of all of this is generative adversarial networks. And that's a mouthful, right? A GAN. Well, so the idea here is, what would happen if you took two of these neural networks, and played them off against each other. And an example that I'm going to give you is, imagine that you created a neural network, that's whose purpose was to be able to take in a picture and determine whether or not this is actually a piece of artwork by Picasso. Picked at random. And we train it just like we did before. We give it lots of pictures, and we label those pictures as yes, this is a Picasso or no, this isn't. And we include in our training set, some pictures of some really good fakes, but we label them as fakes. And it learns, and it gets pretty good at it. And soon enough, it can tell the difference between a Picasso and something that isn't a Picasso. So let's say we also create a generative network. What do I mean by that? This is something that's going to take picture as an input. And it's going to change that picture into a picture that looks like it was done by Picasso. Now, you can imagine that we might some make something just make random changes to the pictures, doesn't matter what the change is at this point. And it's going to feed its output into Picasso detector. And of course, when it first does it, the Picasso detector is gonna say no, that's not even close. Sorry, guys. That's not a real Picasso. But now we've got something that can generate labels as feed into our Picasso generator. And of course, we can do exactly the same techniques to change the way it's modifying the picture, and get closer and closer to something that passes thePicasso test. And initially, wouldn't get very close, but it might get maybe 99% certain that this is a fake rather than 100% Certain Well, that's in the right direction. So we'll make more of those types of changes, maybe then gets 95% certain that it's a fake rather than a real Picasso. That's the right direction. So we'll make more of those changes, and so on and so forth. And before long, we have trained this generative network to create pictures of the Picasso style that are so convincing, the Picasso detector can't tell it's a fake. Well does the story end there? Well, no, because then with my adversarial come? Great. So we can now generate good Picasso looking pictures, which we know are fakes. And so now we can create more training data for the Picasso detector, with lots of labeled data of good Picasso fakes, we can feed that into the Picasso detector. And we can retrain that, it gets better and better and better. And before long, it's no longer fooled by the fake generated pictures, it knows the difference between real ones and fake ones. And so does the story end there? Well, of course not, we can flip it around again. And we can train our generator against the new Picasso detector, and make that get better and better and better. So this is all a game? Not really no, obviously, what we're trying to do is create a better Picasso detector. And, okay, let's take the Picasso idea and translate this into some kind of fraud detection system, or some kind of spam detection system or some kind of computer attack system that's going to detect more and more attacks, you can see how having the computer generate more and more convincing attacks, more and more convincing spam, more and more convincing fraud simulations helps us get a better defense and a better protection. So that's the idea behind using computer learning, with one computer learning, being the adversary of the other, to in fact, improve both. So what does this mean for the enterprise? I think we can split this into two here. There are companies who, whose product basically, is the transformation of digital signals. And sometimes you might need to stretch it a little bit to understand that that's the business they're in. But it's not that hard to imagine. So, I'm delivering a podcast, for example. And obviously what I'm delivering here is digital speech, but I'm using products, which are recording signals from a microphone and turning that into something that is easier to listen to. And one of the tasks that we need to do is noise reduction, there will be background noise, maybe from the air conditioning system or printer or what have you in the area. And that doesn't need to be in the podcast, what you want to be able to focus on is just what I'm saying. In the past, what we've done is we've used all sorts of frequency band analyzers to take out certain frequencies that tend to be more problematic, where, oh I don't know, we might see some harm in the 50 hertz, or the 100 Hertz range, based on some kind of machinery doing something. And we'll dip that down so that it appears less. The problem is we're messing with the waveform. And so the result is lower noise, but also changes the actual speech. And sometimes it makes it harder to understand that it did when the noise was there. So now, we have products, which understand how to recognize human speech. Because they can recognize human speech, they can separate the sound into two streams, if you like, speech and not speech, and they can amplify the bit that's speech and suppress the bit that's not speech. And as if by magic, all the background noise disappears. That's a very specific use of machine learning systems to be able to driven from from neural networks in this case. Now, if you're in the business of producing software to process audio, this would be a technique that obviously you would want to invest in. Okay, but what if you arn't? What if you're, picking something a random, an online retailer? Should you be investing in machine learning expertise, building a team of people that probably quite expensive, with a lot of infrastructure, in order to do this? Let's take for example, fraud, right? We want to be able to maximize the number of customers we reach, we want to be able to minimize the amount of fraud that we have to deal with. Obviously, selling something to somebody where we're not going to get the money is bad business, we will we don't want to do that. But how do you tell the difference between something which is odd transaction, but still legitimate and something which is an odd transaction? Because actually, it's fraud? Clearly, we could use machine learning to use some deep algorithms to discover patterns in behavior that might separate out the fraud behavior from the non fraud behavior. So is it worth us investing in growing that kind of capability. And I'm going to suggest it probably isn't. What we want to be doing is investing in products that leverage this kind of capability, and possibly even products that can leverage this capability that can be automatically trained on our data to make them very specific to our business. But without having to reinvent without having to understand the processes that are being used to do that. And so that's the split, I think, is going to be very important understanding where in the value chain, your real application of technology is, is it part of something you actually sell something you monetize, in which case investing in adding this kind of technology to it is like you would with any new technology, part of the product chain analysis. If that's not your business, then you want to be looking for partners with expertise in this space to provide you with the technologies that you want. Nobody wants to build their own ERP, that's just not very clever. Same here, you want to be buying products that leverage the state of the art neural networks that are trainable on your data. Alright, so we did talk about some of the limits in what can be done. And we talked about the fact that those limits are generally driven by a), is the data and enough data available? And do we have labels for those data? So we can train it in the first place? Is the problem itself essentially tractable or not? And do we have the expertise in how to do this? That there are a whole classes of problems, we have to ask ourselves? Should we use these techniques to address there are some built in traps for us in this? And we have to remember that what machine learning is doing is teaching a computer how to look at data and make inferences based on the data we've given it and also the inferences we made on that data. It's not learning how to infer things from the real world. It's learning how to infer things that we have given it from a subset of the real world that we've already put our interpretation on. And why is that problematic? Well, if we are giving it biased data, it's going to end up encoding that as a biased set of assumptions. And that's going to cause us a lot of problems. It would be hard to overstate how serious that problem can be. To put it in perspective, let's talk a little bit about natural language processing. And a game we might play with a computer to see how well it has understood the text by doing the association game. For example, we might ask it, man is to King like woman is to blank, and we would ask it to fill in the blank. And actually, we get our algorithms, very, very good. And it will come up with the answer, Queen. Correct. But and you know, given that I've raised the question, there's going to be a but. Well, a 2019 report in the Association for computational linguistics was titled Black is to Criminal as Caucasian is to Police. Okay, the title is outrageous, and it's clearly the academic equivalent of clickbait. But it was pulled directly from an exercise done on the same kind of natural language processing algorithm that was trained on a corpus that almost everybody uses to train their natural language processing. It was really about how we might de-bias the information we're using to pass into our algorithm creations. But it underlines the point that, I mean, imagine, for example, that you're in the business of selling mortgages, and you're trying to use some machine learning to be able to determine who you should be giving these mortgages to and who you should avoid. If the data set on which it was trained is systematically skewed, in order to trust some people less than others, then you're going to do yourself some serious harm in multiple ways. I mean, one, obviously, let's just put aside from morally, it's just not right. But we have at least three issues that are just purely financial in front of us. One, we want to sell our product to anybody who's capable and willing to buy the product. So if we're avoiding a set of the market, erroneously, we're missing out on an opportunity. Two, that, hey, if it gets out that we are not treating a certain segment, a segment of the market fairly, and you know it will get out, the reputational damage could be could, it could destroy the company from where it goes on. And three, of course, we could be sued, or criminally prosecuted, least civilly prosecuted for the actions depending on which base that we're in. And yet, nothing that we have done was deliberately aimed at excluding any part of the population. It's because the datasets we were using had this in them. How do we fight that? And I gotta tell you, there's a lot of work being done. And you can imagine that a huge amount of research is being done on how we might de-bias this data. The results are not conclusive, shall we say. So this is a significant problem that we need to be aware of. And maybe these are areas where we're stepping in very cautiously. One of the problems of neural networks, and not all of the machine learning techniques suffer from this. But remember, neural networks are now making up the majority of where the progress is being made. One of the problems of neural networks is they're not explainable. So we can generate something which will come up with some inferences. But we can't really look inside the neural network to see how did it deduce what it deduced. And so it's very difficult to see if you want what it's thinking, and I use the word advisedly. But that makes it really difficult in order to know how trustworthy the answer is. And by trustworthy, I don't mean, is it correct? Is it inferring correct information from the data? But again, how skewed is the data? And what is the data actually telling us this actually, what it's telling us Correct or not correct? We gave you an example there. That's pretty outrageous. And, you know, obviously, that's going to come out one way or another. But what if that was a subtly different assumption that's baked into the data, but still not right, but would be financially damaging over the long term for the organization? And you just wouldn't know that one of the frightening things about deep learning algorithms, again, a lot of research is being put in at the moment on trying to make this more explainable. Trying to understand what's going on on the inside. But now, we need to be cautious about what we're aiming our neural network at, so that we don't fall into that trap. Now, avoiding the question of bias for a minute, then there's a question of the line between valuable and creepy. As the systems learn more and more about our behavior, they can tailor their response to us more and more precisely, which you can certainly see is very, very helpful. But it can also get quite a bit unnerving. That's an area that may change over time, at least to say, how well our customers except that kind of helpfulness or versus feel like it's walking the edge of being creepy. Because we get used to technology, we get used to the way things behave really quite quickly. But it's something we definitely want to be watching. And we want to be putting our foot in that water very carefully, and measuring the response in our customer base, and perhaps adjusting accordingly. So the goal for this episode was to outline what machine learning is, where it fits in the AI and machine learning, deep learning, neural networks spectrum, help you understand what some of those terms mean, and give you a very, very top level view of how this all hinges together, and maybe how it might be useful. And I hope we've done that. As I explained at the beginning, this is kind of a setup for a series of episodes we want to have in the future, where we start to speak to individual practitioners about how this is affecting their business, how that's being implemented in their world, and get some real world feedback on where we're going next. I hope you enjoyed this episode. And if you did, please leave us a review. If you didn't, please let us know either way, get involved in the discussion on the website at

https: 51:56

//thetechnologysoundingboard.com or leave us a review on your podcast streaming service of choice. Thank you very much for listening and see you next time.

People on this episode

Michael R. Gilbert

Host