Sup nerds. Doing a project for my DS python class and need a lil help...

Question

Sup nerds. Doing a project for my DS python class and need a lil help...

Hunter Smith

Sup nerds. Doing a project for my DS python class and need a lil help. I need to take a python data frame and for each row, need to have a column with the value of the number of times that 'ORIGIN_AIRPORT_ID' shows up. I'm only aware how to count in a compressed way with groupby. Any tips? Pic is an example of how the final DF should look.

Attached: Capture.jpg (675x124, 21K)

October 21, 2018 - 05:40

Other urls found in this thread:

stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas
pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html
pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.count.html
w3resource.com/python-exercises/pandas/python-pandas-data-frame-exercise-20.php
stackoverflow.com/questions/15943769/how-do-i-get-the-row-count-of-a-pandas-dataframe
twitter.com/SFWRedditGifs

Jack Phillips

Ah, that's pretty easy desu. First step is to install Gentoo - you should be able to do the rest from there. Best of luck.

October 21, 2018 - 05:43

Ian Green

df.loc[df['ORIGIN_AIRPORT_ID'] == someID]).count()

October 21, 2018 - 05:50

Kayden Torres

the 'someID' part is throwing me off. Do I assign something to that variable or use a for loop?

October 21, 2018 - 05:53

Charles Myers

I assume you can iterate through each row, set someID to ORIGIN_AIRPORT_ID from that row, and use that to calculate the OUTDEGREE for that row in the output. Disclaimer though I've never used Pandas, this is just what I gathered from a couple mins of googling

October 21, 2018 - 05:56

Nolan Russell

wew, not ever sure how to do that. Programming is not my strong suit. I'm a stats nerd.

October 21, 2018 - 06:04

Jaxon Cox

Try something like this. Again, not a panas expert so you might have to fiddle with it a bit. But I believe something like this would work:

# df is your dataframe containing all of the ORIGINAL input data

# create an array for containing the OUTDEGREE values
outdegrees = []

# iterate through the rows, calculate each outdegree
for row in df:
outdegrees.append(df.loc[df['ORIGIN_AIRPORT_ID' == row['ORIGIN_AIRPORT_ID']].count())

#add this column to the df
df['OUTDEGREE'] = outdegrees

October 21, 2018 - 06:14

Adrian Gonzalez

Minor update.
# df is your dataframe containing all of the ORIGINAL input data

# create an array for containing the OUTDEGREE values
outdegrees = []

# iterate through the rows, calculate each outdegree. apparently you can do this by using the iterrows() function in pandas. this also gives you the index for each row, even though you don't need it here. don't be thrown off by that.
for index, row in df.iterrows():
outdegrees.append(df.loc[df['ORIGIN_AIRPORT_ID' == row['ORIGIN_AIRPORT_ID']].count())

#add this column to the df
df['OUTDEGREE'] = outdegrees

October 21, 2018 - 06:19

Jace Fisher

I see the logic, but not quite working. I'm thinking maybe theres a way to interate through each row and take row['ORIGIN_AIRPORT_ID'] = the count of that id, but googling isnt turning up the paydirt I would like.

October 21, 2018 - 06:26

Carson Gray

To elaborate further, df.loc[df['ORIGIN_AIRPORT_ID' == row['ORIGIN_AIRPORT_ID']] itself returns a dataframe, containing only rows where df['ORIGIN_AIRPORT_ID' == row['ORIGIN_AIRPORT_ID']. You can then simply count the number of rows in this on-the-fly generated dataframe using the .count() function.

What kind of error is it giving you?

Further reference:

Iterating over a dataframe
stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas

Selecting rows in a dataframe based on certain criteria
pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html

Counting rows in a dataframe
pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.count.html

Insert a new column to a pandas dataframe
w3resource.com/python-exercises/pandas/python-pandas-data-frame-exercise-20.php

October 21, 2018 - 06:34

Oliver Gomez

In the form you posted (had to update few unclosed brackets though) this is what it gives. Trying to tweak right now

Attached: Capture.jpg (1218x1128, 168K)

October 21, 2018 - 06:42

Oliver Carter

Ah, I see a problem here. One of the missing brackets was closed in the wrong place. Try fixing it and seeing if it helps at all.

Attached: FIX'D.png (1218x1128, 747K)

October 21, 2018 - 06:45

Tyler Hughes

Better!

Attached: Capture.jpg (1087x669, 106K)

October 21, 2018 - 06:47

Hudson Howard

the value is in there, now just gotta parse it out
regex? or is there a cleaner way you think?

October 21, 2018 - 06:48

Anthony Ramirez

Apparently I was being retarded about how .count() worked. Try replacing that whole line with this:

outdegrees.append(df.loc[df['ORIGIN_AIRPORT_ID' == row['ORIGIN_AIRPORT_ID']].shape[0])

Reference:
stackoverflow.com/questions/15943769/how-do-i-get-the-row-count-of-a-pandas-dataframe

October 21, 2018 - 06:53

Michael Cooper

I left the missing bracket in there so don't forget about that. Fuck I'm tired.

October 21, 2018 - 06:54

Jason Murphy

similar brackets problems, but that did it! I owe you a beer friend.

For my pleb nongrammer brain, can you break down the logic of your command?

October 21, 2018 - 06:56

Isaac Richardson

>tfw it just werks

So basically, what you're doing when you do df.loc[] is similar to a selection in relational database. I'm not going to get into it too deeply right now, but the idea is that you're selecting for a subset of the original dataframe based on more specific criteria. For each row, we basically generated a smaller dataframe containing *only* the items where the ORIGIN_AIRPORT_ID matched our current row. This subset is a valid dataframe in and of itself, and is subject to the same operations and same properties as the original.

Every dataframe in pandas has a .shape property, which is an array containing certain information about the length, # of columns, etc. This also applies to the resulting dataframe from the selection. Since each of these dataframes also has a .shape property, just like the original, we just extracted the # of rows value from this. If you try printing segments.shape[0], you'll see the # of rows for the entire original dataframe. If you print segments.shape[1], it'll give you the number of columns.

It seems like kind of a hacky way to do it but at least it works desu

Attached: aw yea.gif (245x221, 1.39M)

October 21, 2018 - 07:08

Jaxson Perez

>hacky
Listen my man, data science is the defition of "good enough" programming. Stuff like that doesnt matter. Thanks bud.
If you're still up I'll have more questions but dont wanna impose.

October 21, 2018 - 07:16

Ayden Richardson

Let me hear 'em, man. I'm a drunk NEET and I have nothing better to do.

October 21, 2018 - 07:17

Wyatt King

My man, heres a temp email:
[email protected]

Send me an email with a screenshot of this (you) and let's connect off 4chin

No promises, but I'm a classic brogrammer. Kinda dumb compared to the spergs but I dudes get laid.

October 21, 2018 - 07:19

Dominic Harris

get dudes* lmao

October 21, 2018 - 07:20

Benjamin Foster

Not to be an ass or anything, but is there any way we can just the Q&A on here? I swear to god I haven't even used e-mail in like 6 years. I blame autism lmao

October 21, 2018 - 07:21

Easton Howard

ugh, sure but I very rarely 4chin lately and have a lack of programmer connects.
Is there a less temporary but anonymous way I could hit you up in the future? I don't need to know who you are, I just went too far in my studies and no longer have people who know how to help lol

Attached: 1540094840211.jpg (567x425, 33K)

October 21, 2018 - 07:24

Daniel Hernandez

Current problem I'm working on though. I'm p drunk so my 'Intellect' stat has a hard -3 debuff desu lol

Attached: Capture.jpg (1142x721, 109K)

October 21, 2018 - 07:26

Eli Perez

Alright then, I'll go ahead and send you the email, but as a fair warning I'm also pretty drunk and there's a >50% chance I'll forget to ever check my inbox again. But I'm long since done with uni and don't mind helping people when it comes to the one field I don't suck shit at

At first I was freaking out because this looked fucked up: after all, how can there be an outdegree of 23 in a dataframe with only 10 rows? But then I saw that you only set it to display the first ten, and now I feel a bit better.

Is Exercise 6 the description of the problem you're currently trying to solve? That math language is a little outside of my expertise. What the fuck is even a tibble lmao

Attached: scaredy cat.gif (260x182, 441K)

October 21, 2018 - 07:34

Jayden Long

replied with my real

October 21, 2018 - 07:41

Gabriel Evans

Ah a mathlet, that's cool. Tibble is just a table with a certain way of sorting the data. For programmer's intents it's just a table desu

October 21, 2018 - 07:42

1 2 3 Next

Sup nerds. Doing a project for my DS python class and need a lil help...

Last threads