Data Analysis Workshop Series

Introduction to Python

Instructor: Chengyin Eng, Data Science Consultant at Databricks
Chengyin Eng is a Data Science Consultant at Databricks where she implements data science solutions and delivers machine learning trainings to cross-functional clients. She received her M.S. in Computer Science from University of Massachusetts, Amherst. Prior to that, she completed her B.A. in Environmental Studies and Statistics at Mount Holyoke College and spent her college years applying statistical modeling techniques to tree research. Thereafter, she worked in the life insurance industry and provided pro-bono data science service to NGOs. Outside of data science, Chengyin enjoys reading, photography, and leafing through outdoor markets for food and craft.

Workshop Details

This workshop is part one of four in our Introduction to Data Analysis for Aspiring Data Scientists Workshop Series.

In this workshop, we will show you the simple steps needed to program in Python using a notebook environment on the free Databricks Community Edition. Python is a popular programming language because of its wide applications including but not limited to data analysis, machine learning, and web development. This workshop covers major foundational concepts necessary for you to start coding in Python, with a focus on data analysis. You will learn about different types of variables, for loops, functions, and conditional statements. No prior programming knowledge is required.

Who should attend this workshop:
Anyone and everyone, CS students and even non-technical folks are welcome to join. No prior programming knowledge is required. *Please note, If you have taken Python courses in the past, this may be too introductory for you.

What you need:
Sign up for Community Edition here and access the workshop presentation materials and sample notebooks here.

Video Transcript

that we’re bringing to you, Introduction to Data Analysis for Aspiring Data Scientists. So, this workshop is gonna be an Introduction to Python on Databricks.

Introduction to Data Analysis for Aspiring

And we have two more workshops coming up, so the next one, part two, is gonna be next week on April 15, and then the third part is Machine Learning on April 22. So, visit the Data Plus AI online meetup page and RSVP for those events so you get the notifications, we’d love for all of you to join us, especially if you find this workshop useful.

Upcoming online events

So, just a quick few reminders. So, we have a bunch of different kind of upcoming online meetups, interviews, workshops, tech talks, and we all launch them from the Data Plus AI online meetup group. So, I’ll drop these links in the chat and I’ll also include a follow up message with these URLs, so make sure to join that group so you can check out what’s upcoming, and then also, I know we have a bunch of folks that are joining us from YouTube Live, so thank you for that. And we always broadcast or we set reminders, so make sure to subscribe on YouTube and turn your notifications on. So, whenever we launch a new online offering, that syncs to YouTube, and so you can set a reminder for yourself to join us Live there if you’d like to. And then also to joining this Zoom broadcasts allows you to participate in the chat function and then also ask questions in the Q&A. And I’ll drop these links in the chat as well.

Workshop Resource Links:

And then I did just send out a message to everyone who RSVP these are two links that will be useful for this session, again, I’ll drop them in the chat in a few minutes once I’m handed over to Chengyin, but I just wanted to put these up here for a second if anybody wanted to try to grab them.

Meet your Instructor & Teaching Assista

So, now I’d like to have all of our instructor Chengyin and other TAs just to do a quick introduction, maybe just 30 seconds wave and say hi, we have a bunch of TAs that are gonna be answering Q&A questions, so, that’s where you’re gonna wanna drop your questions, and they’ll be helping throughout the presentation. So maybe Amir if we can just start with you, you wouldn’t mind saying a quick hello.

– Sure, can you hear me now? – Yeah. – Okay, hi everyone. My name is Amir Issaei, I’m a senior sata science consultant at Databricks. I work in the same team as Conor, Brooke and Chengyin. So, I spend my time about 40% training customers and 56% of the time, implementing machine learning solution. So I’m gonna be the TA as Karen mentioned, so if you have questions, just drop them in the chat and me and Conor and Brooke try to help to help you understand the things you’re asking or answering the questions. Thank you. – Thanks, Conor.

– Hi folks, my name is Conor Murphy and I also work with Amir, Brooke, Chengyin, and so we all work very closely together on consulting engagements for data science and then some training as well. And so if you have any questions, feel free to ping us we’re happy to answer them.

– My name is Brooke Wenig. I lead in machine learning practice team along with Chengyin, Amir and Conor, and just as a side note, Conor will be teaching the next session on pandas and Amir will be teaching the following session on machine learning, so I’m really excited this is really a whole team effort to put together this series for you.

– Thanks everyone, yeah, now I’ll pass it to Chengyin Eng who is our instructor for today. So I’ll stop sharing my screen and let you take it away. – That’s good. Hi, everyone, my name is Chengyin Eng, you can call me Chengyin. I think all my coworkers have introduced my responsibility pretty well, I work with customers on consulting projects, helping them to implement data science solutions and also deliver missionary trainings. So we have a packed schedule here, I’m just gonna go ahead and get started, I’m gonna go ahead and share my screen, oops sorry, that was the wrong screen.

You share, this one.

Okay. So to get started with this workshop, you will need a Community Edition account to log into the database platform. If you don’t have account yet, please go to this link databricks.comm/try-databricks, and you need to sign up to give in, to sign up for your account and you will receive an email to verify your account and that’s when you can sign up for, that’s when you can log into the Community Edition portal. So I’m gonna give all of you a few minutes over here to get set up, and if you don’t mind telling me when you are done, you can go ahead and go to the chat function and tell me you’re done then that’s when I know that I can proceed to logging into the Community Edition.

Cool, I’m just gonna give it a couple more minutes.

Yeah, so you can also see on my screen here, there’s a link, try-data bricks, Yep, we are using a Community Edition, so when you will sign up for a Community Edition account.

Okay, I think some of us have already completed the sign up session, the sign up process, don’t worry if you are lagging behind a little bit because the notebook will also be available later for you to work on that, on your own time. So let’s go ahead and log into the Community Edition, quick log in.

Okay, so if you are seeing this page it means that you have successfully logged into the data bricks platform, which is a cloud based big data platform. There is a bunch of icons here on the left hand side the side all over here, I’ll be introducing to you some of them during the session, but first, let’s go ahead and click on the cluster icon, which is this icon that looks like a graph or tree.

So during today’s session, rather than using your computer’s local computing resources, we’ll all be using the database computing resources. And so to do that, we will need to create a cluster first. So let’s go ahead and hit create cluster button, I’ll explain to you what a cluster means in a second, and what this details, means here as well, but for now, let’s put in our name to create a cluster, and you can hit create cluster.

So we go to the customer page, when you have successfully set up the cluster, you should have a spinning icon that tells you that their status is pending because it’s pending to set up, so if you click into this cluster, you can look at the configurations of this cluster. So what is the Databricks cluster? A Databricks cluster is a set of combination resources. The edition that we’re using now is Community Edition, which means that it is free, it comes with 15 gig of memory, and, you will also terminate automatically two hours after your last command as executed. So this Community Edition allows you to access a small cluster to run your code in notebook environment. So this Community Edition is good for prototyping simple applications, but it’s not meant for production.

So Community Edition is also using Amazon Web Services, AWS under the hood, so it will take a few minutes to spin up. So what it looks like on your page right now you should still be spinning, a spinning green icon. You will see that there is a Databricks runtime version over here, and what this means is that is one (mumbles) version is backed up by Apache Spark, and that’s all you need to know for now, in this introductory class, we will just be using regular Python so we won’t be getting into the nitty gritty of using distributed computing using this Spark. So that introduces what cluster is if you’re using computing resources on the tab.

So if you want to work together with your peers on your notebook, you can invite your peer to your workspace and that’s when a workspace icon comes in handy, so we can click on this icon workspace icon, you can see that there’s a shared and Users, my under the user, it should be just your email because you’re the only person in the workspace. And if you were to invite your friends to this workspace, you can click on this person icon on the top right hand corner, and click on admin console. That’s when you can add a user, to invite friends. So in this Community Edition, you can invite up to two users. So it means that in your work space you can have 3 user accounts altogether. So now I’m gonna show you how to access the notebook that we’re going to use during this session, lets go to the GitHub page, which is this, github.com/databricks/techtalk.

If you are not familiar with what a GitHub is, GitHub, it’s like a Google Drive, It keeps track of your code It can host code for you. I would encourage you to follow along what I’m doing, if you want to, you know, run code cells when, I conduct this session, but you can also choose to sit back and watch my demo after and play with the notebooks, in your own time.

So you can see that there is a bunch of folders here, all the future resources for the upcoming workshops will also be posted in, just get hot repo as well. For today’s workshop, you will be interacting with this folder, Introduction to Python, so let’s click into it, you’ll see a different files over here, there is a lab for you to do it in your time, so during today’s session, we’ll just be using Python fundamentals file, so let’s click into it and then let’s grab this link, command copy, command C to copy this link., you will also notice that this file has an extension, I-P-Y-N-B. It means that he’s a Jupyter Notebook, it can be, you can also access this notebook in your local Jupyter environment as well. But I’m gonna show you how you can import regular Jupyter Notebook into Databricks environment. So now that you have this link, copy it, and then go to this Databricks homepage again, and you can click on home button right here in a side bar, and you can see that your name, is up here and you can see there is a arrow for you to click on it, and you can see there is a drop down menu. So you can choose the create a notebook, create a new notebook or library or folder, but for today’s purpose, because we already have a link to import, so let’s go ahead and click import, and you can see that you can import either from file or from URL, so you can choose to download a file and upload it, but because we have a link, so I’m just gonna use URL and I’m gonna paste it in here and I click import.

So it should take you automatically to the notebook that you wanted to access, and beneath the title of this notebook, you will see there is a word here detach, and if you have a word, it says right now, if it’s not attached to a cluster, you select the cluster to attach. And you also notice that this icon looks familiar because it is the cluster icon that we just put into. So we are going to attach this notebook to a cluster by clicking on this dropdown and attach your cluster that you just set up to this notebook.

So what, it means by attaching a notebook to a cluster, means you can execute article on this notebook on this particular cluster. So when, you attach a notebook it means that the notebook is using the resources of this cluster. So in this session, we’ll be covering what numbers are, strings, variables, print statements, lists, for loops, functions, conditional statements and checking different types. There is a good reference sheet for you to bookmark, you’ll click into the show you here is a Python sheet, that shows you the different Python syntax, and there is another link to show you, official tutorial, if you want to, learn more about Python after this is developed by the Python developers, you can follow along their lessons as well. You may also notice that this notebook has a combination of text and also code and you might also wonder what a markdown cell is. if I double click into this cell, you can see that this all starts with percentage MD, MD stands for lockdown. So I can write my text and render it as text rather than code. So if I hit Shift + Enter, it will compile the cell and show that it’s a text. And you’ve also noticed that I started the headings with different pound size, like different, different number of concise, here the top header has only one pound sign, and then the second header has two pound sign, so the top level will be bigger. So that’s how you can write a markdown cell. You can also choose to run a cell by hovering over, you know the top part top right hand part of this cell, you can see that there is a delete button, minimize button, and also edit menu to, perform different operations on this notebook cell.

So now let’s go ahead and try to interact with Python on this Data bricks Notebook environment, so lets use this as a calculator, just real simple, we’re gonna use, we’re gonna just type in one plus one, you can either press Shift + Enter to run this cell, or you can also go to this top right corner over here to hit the run button, to hit run cell. Or you can just click on it, and it knows that you’ll need to run the cell. So you can see here that one plus one, yeah, so that was our first Python code in the Python environment. You can see that we have already, have our output, we know it’s two, so send me to check, that was correct. You can also interact with strings, for example if I type in ice cream and wrap it in quotes, it doesn’t matter if you use single quotes or double quotes, it would do the same thing, so I Shift + Enter, you can see that it is all putting ice cream for me. I can also choose to concatenate strings together, so you can see that here’s the first part of the string here is second part, and I’m going to concatenate them using the plus sign. So I’m gonna hit Shift + Enter again, you can see that it’s now printing ice cream is paradise for me, but we got to space that’s because it’s not smart enough to figure out that we need a space in between, so if you do wanna space, you need to press, you have to enter a space in there. So notice that when numbers, Python knows that you should add on numbers, but with strings, it would think that you want to concatenate those strings together.

So now let’s move on to variables. So a variable in Python really is just a named unit of data, you can assign your value, like this value for example, to any name that you want, but of course the more intuitive you name your variable, the more helpful it is for you, when you want to look back at your own code. Like for example, if I want to run this line, I like ice cream with the pound sign in front, this means that this type of code is commented up and to uncomment or comment of line of code, all you need to do is to press Command + / if you are in the mac environment. So if I run, try to run this cell is gonna give me an error, it says invalid syntax, because Python is gonna think that there is a variable that’s named, I like ice cream, and it’s gonna try to print, I and like ice and cream, you know, for the variable value. But it’s not going to work because we need to assign the value to a variable. And by assigning our value to a variable, all you need to do is to have an equal sign. And you can see that now I’m going to assign ice cream as the best food. So I’m gonna use equal sign to assign the value.

So we’re under cell, again, Shift + Enter, then you will see that the best food is now ice cream. So notice that I can also just update this best food variable whenever I want. Like for example, I can choose to update my best food to be, pizza and notice that here, I’m gonna show you that here I’m using double quotes here single quotes but they really work the same. So what best food would print is actually pizza, because Python remembers that the latest variable, the value is pizza rather than ice cream. So I’m gonna run this, you can see that it’s showing us pizza, rather than ice cream. Okay, because like ice cream better, so I’m just gonna uncomment itself and run the cell again, so that the best food is always gonna be ice cream.

So moving on, you can also choose to print statements. You may wonder why do we even need to specify print? Because, here I am not using print and I can print things just fine.

The utility of having a print statement is that you will force, it would automatically, it will force the Python to print out every statement that you want. Like, for example, if I were to, not have, you know this sign over here, and if I were to print this, it would only print best food because this is the latest line. So for me to print both lines, I need to add, print to the first line as well. So here, then you can see that it’s explicitly forcing, Python to print out both lines, rather than the last one.

You can also be more explicit about what you are printing, like for example, if you want to remind yourself, what best food is, and you don’t want to keep typing two lines, you can also choose to wrap this variable, within quotes. Still, we can see that how I’m, the style that I’m using is F formatting, so all I need to do is add F in front of codes and then wrap a variable that you want to print within the cody brackets, and then you can print this. So now you’re automatically knows that you should retrieve the best variable and you would print the statement correctly.

Now, let’s take a look at list. So I’m gonna try to make a list of what I think everybody would eat for breakfast this morning. So say that you went really fancy you had pancakes, eggs, and waffles, I’m gonna run this as a list, so you can see here that again, I’m using a concept of strings over here and I’m going to wrap them within the square brackets, this square brackets means that I can, now I’ll put a list and I’m gonna use this variable assignment, the equal sign, and we’re gonna assign this list of strings to become breakfast list.

So again, I can name this however I want, because I want myself to remember what the list actually contains, so I name it as breakfast list, I can also choose to add more items to these lists, for example, you are telling me, Oh, you actually had milk too, so I’m gonna append milk to this list as well. So there is an append function over here. So by print this, then you should see that this breakfast list should have four items, pancakes, eggs, waffles, and milk So let’s try to get the first breakfast element from this list. So we know that just by looking at this list, the first item is pancakes, but because everything applied on zero indexed, so the first element is at position zero. So if you want to get a first item, instead of saying the first, you know, using a number one, you are using number zero because its zero indexed. And you are using square brackets again to index into the list. So let’s run this cell, yes, I’m getting pancakes, so that is correct. So what if I want the last item from this list? All I need to do is add a minus and I want a last item because it’s counting from the very end, so it’s the last one, so it’s minus one, so we would get milk for me. I can also choose to print the second breakfast item and onward. And remember that everything played on zero index, so if I wanna second item, I will need one, and if I want everything else to include everything else, then you will need the sign colon as well. So second breakfast, item and onward. I’m gonna press Shift + Enter again, to get eggs, waffles and milk. So you can see that pancakes is excluded here because it is the first item.

So now let’s move on to conditioners.

So sometimes depending on conditions, we want to execute certain lines on logic. And we can control this by using the if, elif, else statements. So you can really think about conditioners as a type of branching statements. Like for example, if you have enough sleep, then you have enough energy for tomorrow. If not your feel tired. So it’s kinda like if, if something, if A and then or B and then or C or something like that. So we think of them as branching statements. So say that we want to print plural forms, plural forms are for food and say that just a really simple example so that we just wanna add as, to the end of a string if it doesn’t already have an s there to indicate that its a plural form. Say that now I changed my line and say that my best food is actually chocolate. And I’m checking, if this best food ends with S, here, you can see there’s another build in function that I’m using, and I’m just checking if chocolate has an S. If it does, then it’s gonna print it as S, if it does it, it’s gonna add S to the it. So here I’m expecting, the output to be chocolates, it was gonna go through this second line of logic and gonna add an S, gonna add an S as to this chocolate string.

We can also make if else statements a little bit more complicated by adding the elif.

So here, for example, I say that, the best food is ice cream and the ice cream cone is a thousand, and I’m saying that if best food is equal to ice cream, then I want a thousand cones of ice cream. If the best food is blank, then I actually want you to tell me what your favorite food is. And Ellis really is ice cream better? So let’s try this, so here, you can see that it’s printing out a thousand cones of ice cream. So say that if I can make all my mind, and I’m going to just leave it at blank, it’s going to go to if statement first and check if this equals to ice cream, but it doesn’t, so, it’s gonna jump to the next line and check if the best food is equal to blank, and it says, yes, it is equal to blank, so let’s screen the favorite food. And if I say it’s something else, then, it would tell me it would jump to the third line and says, Oh, really? Isn’t ice cream better? So we can check the equality of variables by using the double equal sign or If there is no equal, then you can use the exclamation point and equal sign. So you see here that I secretly already used the equality principle or concept by checking, you know if best food is equal to ice cream by using the double equal sign, so, you can now, I can check if ice cream is indeed the best food, but it’s not because I just assign it to a pizza.

So again, remember that with variable assignment, you will use one equal sign, but when you want to check equality, you use two equal sign or exclamation point and equal sign.

So now let’s move on to For Loops. For example, I really want to print out every breakfast item that we have had this morning. And for us it was really simple way to do that, because I would not want to do, I would not want you to say print waffle, print eggs, print, pancakes, that is really cumbersome, so much easier way to do it is to use loops, to repeat a block of code until a certain condition is satisfied. Like for example, here, a certain sequence is satisfied. So if for example here is the breakfast list, and I want to create every single food in a breakfast list. So all I need to do is that for food in breakfast list, I’m gonna print the food. So it’s going to know that it’s gonna iterate over the sequence, and it’s gonna print out the first, the first item first, and it’s gonna go to the second item, third item and fourth item.

So what if I want you to count the number of letters in each word? Again here, you can see that I’m incorporating different parts of Python, like concepts to, In this print statement. I’m using the variable and I’m using the X strings over here, and I can use this in a function called LAN, which really just means length, so you can count the number of letters in each word. So now it’s gonna look through this sequence of breakfast list and, you can check, you can see that it just executed everything, and all we need is just one line of code rather than four different lines to count, letters for every single food item.

So now let’s move on to Functions. So you may ask why do we even need functions? Because we honestly, this is already doing what we wanted. But functions is really helpful when we want something to be more repeatable, We want something to be more organized and they all accomplish the same task. So if I want to generalize, this print thing to a function to generalize to other breakfasts, to other lists, then I can build a function, by using the def keyword, def definition keyword, or define keyword, and then followed by some function name that you want, and again, the more intuitive you name a function, the easier it is for you to interact with it later. Because you just have to remember better. And then you’re gonna add a parameter name in it, rapid in the brackets apprentices over here, and followed by a colon. So you can see here I defined a really simple function that essentially it’s just the same thing over here. Like you can see that this line and this line it is exactly the same. And I’m gonna define create length, and now I can execute the function by passing the print length, function at list. So if I were to comment this out, you will see that this actually does nothing because all I’m doing right now in this cell, it’s just defining a function. So I need to execute a function by calling this function and supply the function with a list or a parameter that I am interested in. So here, my private of interest is breakfast list, I’m gonna pass into it and I’m gonna execute this cell Shift + Enter again, so it is doing exactly what we are expecting it to do.

We can also make functions a little bit more complicated by giving different arguments. So this is a single argument function, I can choose to have multiple arguments. Here I’m just gonna make it a little bit simpler and have only two arguments, but you can make it however many you want, but probably not too many because it’s gonna be hard to read for you as well. So say that now I want to count the favorite food that I have. I have ice cream count and I have chocolate count, and I want to add them together to count how much food I have (mumbles) So you can see here that I am just defining this as ice cream, defining this as chocolate and again, how you want to name this parameters is really up to you, I can choose to name this X and Y, it doesn’t really matter, as long as you are supplying to it, the right parameter, the right variable, but again, it’s the more explicit you name it, the easier for you to recall later. So you can see here that now it’s doing the automatic sum for me, because I’m specifying the addition over here, return statement, and plus.

So you may also wonder what if I know that one of the arguments is always going to be 500 or going to be some number you can always set a default value, so you don’t have to keep, you don’t have to keep supplying the same argument over and over again, so here, say that I know that chocolate is gonna be always gonna be 500, I can supply this default value within a definition of function itself, and then I can just choose to pass in ice cream account, because that is the only thing that changes, and I already know, the function already knows that the chocolate is 500. So it’s smart enough to know that that is a default value and you should grab that. So I run this, then you should see that the output is exactly the same.

So now say that we are just really crazy about data, we want to know how much chocolate we eat, or how much chocolate we like, here I can calculate percentage, again the same thing, over here you can see that instead of just having two line, function, I’m have the, a little bit more involved logic between this define and return statement. You can that I’m the coupling the percentage of chocolate that I have by dividing the sum of chocolate and ice cream and multiply by a hundred. And I’m going to use another building function here round to round off the numbers. So here, if I run this, and again you can see that there is a print statement over here that uses the F formatting string and also the variable right here, you can see that it’s telling me that I like chocolate 33% of the time. So if you do forget the perimeters of the function, you can always call help. So just add help in front of the function that you want, and then Shift + Enter again, and it would tell you that this is ice cream, and this is chocolate. So this is another reason why you want to name your function with reasonable names or intriguing things.

So we have been defining quite a number of variables, so there is a chance that we will forget what a variables are, but what we don’t have to worry because we can always check them. So just a quick recap, there is a percent variable, best food, breakfast list, variable that we have defined. So we can check type percent and is a float. Float really just means a number with decimal points. If you remember, the percent up here is 33.33, this is why it is a float. And if I were to type best food, it is a string. I don’t remember whether it’s chocolate or pizza or ice cream anymore, but we know that it is a string and STR indicates string.

And here, the breakfast list, here we can see that, we probably know from the name itself, that this is a list, which is why naming is really important, so yes, this is a list. So in a summary, so we have gone through a different types of variables, the int is numeric, it’s a whole number without decimal points, like for example, the one plus one equals to two, that would be an integer rather than a float. A float is a numeric variable, and it’s a number that has decimal places like a percent, 33.33. And a string is a type of sequence of characters, like for example, I had chocolate, pizza, ice cream, they are all string types, but they can also be a sequence of any characters, like for example, I can choose to enclose text and also numbers, within double quotes or single quotes, and I can also choose to enclose those numbers within the quotes as well, and they can be string. So we can try that real quick, just to check, for example, if I were to say, number, and it is one, two, three, and then I’m gonna check the type of this number.

He’s gonna say string, so if I were to take all the quotes, is that integer? So if I were to, add hello in there, it is also a string. So as long as something, a text that you have is enwrapped within the quotes double quote or single quotes is gonna give you a string. And here we sort of hinted at it in the beginning of Boolean type, either true or false, like for example, here I can define a Boolean variable by saying, maybe, you are, cool equals to true, and then if I check, you are cool, this is a Boolean variable. A Boolean variable just means that he has two values, it can be true or false. So if I were to check the quality of this Boolean variable by doing a double equal sign, you are cool.

So it’s gonna check is you are cool, equals to true. So here, what you’re expecting? So we should be expecting this result to be true because indeed this variable is true. So lets check, is true. And if we check that this is false, then you can know that because it’s not equals to false, it’s gonna return as false. So here, yes, you have completed a first lesson on Python, you can see that all you need to know, the foundational concepts of a Python is really just how to interact with different types, of variables, how do you print them, you can out figure out how to do with these for loops functions, conditional statements, and checking types, a good practice is always to name your variable correctly, or intuitively so that it helps you better later when you look at your own code. But yes, that is the wrap up for this lesson, and you can go to this, get help page, if you go back to the tech talk page, go into this folder, again, Introduction to Python, you can see that there are two, there are this one lap of this Buzz lap, I’m gonna click into it to show you., so it’s a really common interview question, so I can definitely recommend you trying it out, there is also a solution in there as well, so you can also take a look at it after you have taken a step.

So now, we have 20 more minutes, so we are open to take any questions.

What happened– – [Karen] You say quicker… Sorry, I was just gonna save a reminder for everyone to post in the Q&A, but if you can grab them from chat, go for it. – Yeah, so I see a question here, what happens when you assign a variable with number and then reassign the same variable with a string? So let’s try that.

So here, if I were to assign a variable with number, so I assume this is what you need, so we’ll say that it’s one to zero and I’ll assign the same variable with a string.

Does anybody wanna take a guess at what this, what this will output? Can you answer in the chat function?

Okay, I see string, two people have answered so far strings.

Yes, so it is correct. It will turn out string. And why is that? Is because Python remembers that the latest variable assignment, is this rather than this.

So we can definitely use the export function in the, Databricks notebook environment, so here I can see that I can export as IPython Notebook, which means that you can use this later in Jupyter as well, you can export it as a source file, which means that this will be written out as PY.Python file that you can access it locally, and then you can also export it as an HTML link, or you can expose not expose, sorry, export it as DBC archive which is a, Databricks Notebook environment. So if you export it to a DBC file, you do need to upload feedback to Databricks environment.

This file, doesn’t get further safe to get help automatically, no, it doesn’t. You can choose to integrate with GitHub, but right now I am not.

So if I were to show you the home button right here, you can see that there is, the false that you have right here, that you have uploaded, and if you have multiple files and say that you want to go back to the most recent file that you have worked on, then you go to this recent, tab over here, you can see there is a Python fundamentals file right here, then you can take to retrieve, the latest file that you have worked on. Yeah, so there’s a lot of documentation on how you can integrate with get help, we can definitely send that out as well.

Is there a difference between Databricks and Azure Databricks? Let me defer that question to later, because I’m gonna focus on Python fundamentals for now, can we call one notebook from another? You mean like sourcing a notebook from another?

Yes, Brooke just answer. – [Presenter] Actually Chengyin Eng, I think there’s a question, or a few questions about why you use Databricks for this when Jupyter exists? – Yeah, so on Databricks environment, you can see that you probably remember that in the first place I showed that you can invite your peer to collaborate with you on the notebook, so that is a really handy feature, you can just invite users to the same workspace and you can collaborate on this, very easily, and, what is different about Databricks is also that you have the cluster, management feature over here, so here, if you are using Databricks to run Python or any code that you have, then Databricks will manage the cluster, the environment for you.

On Community Edition, you can only run one cluster at a time.

Can you show one more time, how you got stuff from GitHub into Databricks environment? Yes, I can. So let me go to this, GitHub link again, and ,just because, I can show, I’m gonna just show you the same thing. Here you can see that I have the link over here in a top bar, and I can command Control + C to copy it, and it can go back to the page that you have on Databricks, and then you can hit the home button right here, and then the drop down menu under your email, and then you can click import, again you can just import a file or a URL, or a file just means that, you know, something in your download folder or whichever directory you have (mumbles) you can upload it. And then you can also choose to import via URL, and then I can put in this URL and then I can just click import, and then, because they are both named the same so this is gonna have to suffix, click on one and this is one.

Yes, so to create a cluster, you can see that, here I have, the cluster running, so I I wouldn’t be able to create a second cluster within the Community Edition, but all you need to do is to go to this, you know, we see this sidebar everywhere, I mean, everywhere in your local environment, and then you can click on this icon and it you would take you to this page, and then there’s a blue banner right here that says, create cluster, you can create cluster, and then you can put in your name right here.

So if you want to go back to the landing page on Databricks environment, you can also just hit the top left icon, right here, Databricks, and then you can see that you can just add a notebook from, the homepage, rather than going to the home icon and clicking to create a new notebook. So this is the two different ways to create notebook, either from the home icon and then go to the drop down menu underneath the email, and then keep creating notebook, or you can, keep creating notebook from the landing page, the homepage of the Databricks environment – [Presenter] And then Chengyin Eng, looks like we have a question for you to create an example using methods rather than functions. If you could show that. – [Chengyin] Where is that question? – [Presenter] It’s in the Q&A. So for example, like string dot upper string dot lower. Like showing the difference between a function and method. – [Chengyin] Yeah, let me go back to this page.

Yeah, so for example, here, if I were to, I guess, say that I want to remove, item an item from this breakfast list I can choose to use, oh this is another feature that I wanted to show you first. So if I’ve already named a variable, say that I want break, but I can’t remember how to spell the reservoir, or I’m lazy to talk to Rex, all you need to do is to press tab on your laptop, on your keyboard, and you can see that its suggesting to you different, different, the auto-completion for the variable or for different functions. So here, because I’m interested in breakfast list, So I am going to use breakfast list right here and just click into it and then press enter, and then you will see that the name of the variable now is being auto completed. So if I were to remove the last item or remove an item from the breakfast list, I can say remove, and then say that I want to remove X. So let’s try that, So, okay that’s actually before this, let me print out my breakfast list and I want you to take a guess what this will output. So here now it has pancakes, X, waffles and milk. So how many items, what are the items will remain in this breakfast list after I remove X?

So it should be everything except X, so you can see here that, okay, now we don’t have X anymore. So this is another Boolean function that you can also use, you can also like, if you remember, there was…

Oh, sorry that was with string, yup. So this is another Boolean function to remove.

What makes Python so great at data science and data engineering compared to other programming languages? Python is multipurpose, so it’s very versatile across different purposes, like for example, you can choose to use Python in website development or data analysis, or, just building application, so because it’s more multipurpose, then it makes it easier to collaborate across teams, but still stay in the same language. Can you explain for loop break and continue feature? Sure, I can show the, how you can break a loop.

So for example, right here, I want to, say that I want to, maybe, say that if food equals to waffles, then I want to break this loop and that mean just make sure that what breakfast list contains at this point. So yes, pancakes, waffles, and milk. So this means that if I use a break function right here, it will not print out milk anymore. So if I run this Shift + Enter, oops see, sorry, I need to print.

Oh, so this is doing something different because it’s checking whether or not this is a waffles, and if its waffles, then it’s gonna print it, then if it’s not, it’s gonna break it, so this is not very interesting, so I’m going to just, say that print food, and then I’m going to say, if food equals to, waffles and then I’m going to break it.

Yeah, so here you can see that now it’s checking, I’m printing the food right here to know that where the sequences is at so here we know that he has already gone through pancakes and he has already gone through waffles and if waffles, best food is waffles, then it’s gonna break the sequence, and its not gonna print it them anymore.

Yeah, so just so everybody saw how we can create a new cell, if I hover over this notebook environment right there, you can see that there is a plus sign in the center, you can insert a new cell like this, or again you can also go to the top right hand corner of the cell to add cells or remove cells. So here I can say that I want to add a new cell, above add a cell below, I can also move cells, I can also show a title. I say that if I want to show a title, I can just click into this and say, this is a breakfast list.

And you can also choose to go to this run, cell option, you can see that I want to run this cell or run all above and run all below. So this is really handy when you know that for sure everything else above is gonna work, and you don’t wanna go line by line to execute yourself and you can choose to run all above and run all below.

Any specific advantage of using Jupyter Notebook, over Python on our local machine? So it really just depends on your personal preference, there are people who are really just really like notebook environment, there are also people who really don’t like notebook environment. So we can see here that the advantage of having a notebook environment is that you can see the output within the execution, so I know for sure like now this is like one plus one equals to two, and the outputs that results from and we can see right away. But if we use a local Python file, then you miss that you will not see, you will not be able to see the output immediately, but together with the code. So it really just depends on your personal preference. I mean, of course now there’s this idea that helps you to, you know render an output, we’re doing the same console but, you know everybody use Python, so I would say that it boils down to personal preference. How many Nobles can one customer service? This is a good question. So it depends on how memory intensive or how hardcore your task is on this notebook, like, for example, there’s no fixed number how you (mumbles) you can run out of a specific cluster, of course, think of cluster as a resource and if more people access the same resources or more (mumbles) access the same resource, that resource has to work harder to execute yourselves. So yeah, if I have one really hard notebook and that I know that all my local machine is already gonna take like 10 hours and then even a small cluster, you know, for example, maybe you just have like, basically just, limited memory on the cluster, if you were to run to notebooks one very memory (mumbles) intensive and the other one is light then it’s definitely going to slow down the execution of the code on the environment, on the cluster. So the answer is it depends. Do we have a cheat sheet of all the functions in 10 (mumbles) provided anywhere?

There is Spark doc, not Spark document but there is Python, you know, if you go back to this website, actually let me just show you this.

Chengyin Eng

So you can see that this is the final tutorial, so this does show a bunch of syntax that you can look at, over here, and if you aren’t interested in that break continue, you can also look at this, there is also, really just an overview of what you can do in Python, so I recommend you going through this tutorial if you are interested in learning more about Python, but you can also go to the documentation as well. You can see here at top, top, bar over here there is English and then drip two point two Python version and documentation. And you can take a look at the documentation or I mean the really easy ways to just check the syntax is really just to check, for example, list append and then I can type Python three syntax for example, and then, you can see that there is the, here you can see that the dots python.org already brings you back to what you can do within the data. So you can see there is append extent, insert, remove, so here is how you can get more information about what you can do with Python.

Yeah, we have five more minutes left, any more questions about Python fundamentals?

And yes, in a future sessions we are going to talk about Penn does, but not Spark because the purpose of this series of workshops is really try to get you to understand how you can use regular Python, you know, without without, any I guess prereq of knowing what Spark is or, interacting with Spark. So we’re just gonna be using Python and Penn does, for the upcoming workshops.

So the number of collaborators on the workspace is three, including you, so here for example, I can go to this workspace and say that I want you to invite more people. I can go to any console because this is a Community Edition, So it allows three user accounts altogether, I can just add a user and I say that, okay, I am just gonna be, cheating and then just put a plus one over here, now, it says that I can, I can add this person, the person should receive an email, okay, add a second person, an email sent, but if I try to add a third person, you know, which means four in total, then this button is sort of I guess faded, then he would say that your plan doesn’t allow more than three user account, that more users doesn’t operate in Databricks collaboration. So for the Community Edition you can only have up to three user accounts in one workspace.

The link for the next classes, so let me put it this way, so all of the resources will be posted on the same get help page. So you can see here that there is a tech talk, repo row over here that is prefix with date. So for an expert to watch up, you can see 2020 April 15 and it will be Introduction to Python and you click into it, and then again, using the same process of importing the link from GitHub, so you can copy the link that you’re offering here, you know from a notebook and then you can import it back to your notebook environment. So it will be the same hip hop link that you’ll be interacting with, but the link that you will be using for each notebook that will be different because it’s a different file or the training session link.

I’m not sure about that. Oh yeah, Karen do you know? – [Karen] Yeah, so I just dropped it in the chat, if you wanna RSVP to the two upcoming workshops, I dropped the link in the chat and then I’ll also include that in the follow up email. – Yeah, so here is the link that I just opened up from Karen’s message in the chat, so we can, here there is the first part, which is today’s, and then there is the second part, which is next week, and then there is another tech talk about a different part of Databricks, which is the other like you can also attend that you can even like but, third part of the series will be on April 22nd. You can also choose to click RSVP here and then you will receive the email on how to join this webinar.

We have two minutes left, I guess is a good time to wrap this up, Karen. – Yeah, yeah that’s great. Yeah, there is no more questions, thank you so much Chengyin for this, this was great, I think everybody really enjoyed the content, we had almost, just over 300 folks attending, which is really awesome