Sup nerds. Doing a project for my DS python class and need a lil help...

Sup nerds. Doing a project for my DS python class and need a lil help. I need to take a python data frame and for each row, need to have a column with the value of the number of times that 'ORIGIN_AIRPORT_ID' shows up. I'm only aware how to count in a compressed way with groupby. Any tips? Pic is an example of how the final DF should look.

Attached: Capture.jpg (675x124, 21K)

Other urls found in this thread:

stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas
pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html
pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.count.html
w3resource.com/python-exercises/pandas/python-pandas-data-frame-exercise-20.php
stackoverflow.com/questions/15943769/how-do-i-get-the-row-count-of-a-pandas-dataframe
twitter.com/SFWRedditGifs

Ah, that's pretty easy desu. First step is to install Gentoo - you should be able to do the rest from there. Best of luck.

df.loc[df['ORIGIN_AIRPORT_ID'] == someID]).count()

the 'someID' part is throwing me off. Do I assign something to that variable or use a for loop?

I assume you can iterate through each row, set someID to ORIGIN_AIRPORT_ID from that row, and use that to calculate the OUTDEGREE for that row in the output. Disclaimer though I've never used Pandas, this is just what I gathered from a couple mins of googling

wew, not ever sure how to do that. Programming is not my strong suit. I'm a stats nerd.

Try something like this. Again, not a panas expert so you might have to fiddle with it a bit. But I believe something like this would work:

# df is your dataframe containing all of the ORIGINAL input data

# create an array for containing the OUTDEGREE values
outdegrees = []

# iterate through the rows, calculate each outdegree
for row in df:
outdegrees.append(df.loc[df['ORIGIN_AIRPORT_ID' == row['ORIGIN_AIRPORT_ID']].count())

#add this column to the df
df['OUTDEGREE'] = outdegrees

Minor update.
# df is your dataframe containing all of the ORIGINAL input data

# create an array for containing the OUTDEGREE values
outdegrees = []

# iterate through the rows, calculate each outdegree. apparently you can do this by using the iterrows() function in pandas. this also gives you the index for each row, even though you don't need it here. don't be thrown off by that.
for index, row in df.iterrows():
outdegrees.append(df.loc[df['ORIGIN_AIRPORT_ID' == row['ORIGIN_AIRPORT_ID']].count())

#add this column to the df
df['OUTDEGREE'] = outdegrees

I see the logic, but not quite working. I'm thinking maybe theres a way to interate through each row and take row['ORIGIN_AIRPORT_ID'] = the count of that id, but googling isnt turning up the paydirt I would like.

To elaborate further, df.loc[df['ORIGIN_AIRPORT_ID' == row['ORIGIN_AIRPORT_ID']] itself returns a dataframe, containing only rows where df['ORIGIN_AIRPORT_ID' == row['ORIGIN_AIRPORT_ID']. You can then simply count the number of rows in this on-the-fly generated dataframe using the .count() function.

What kind of error is it giving you?

Further reference:


Iterating over a dataframe
stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas

Selecting rows in a dataframe based on certain criteria
pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html

Counting rows in a dataframe
pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.count.html


Insert a new column to a pandas dataframe
w3resource.com/python-exercises/pandas/python-pandas-data-frame-exercise-20.php

In the form you posted (had to update few unclosed brackets though) this is what it gives. Trying to tweak right now

Attached: Capture.jpg (1218x1128, 168K)

Ah, I see a problem here. One of the missing brackets was closed in the wrong place. Try fixing it and seeing if it helps at all.

Attached: FIX'D.png (1218x1128, 747K)

Better!

Attached: Capture.jpg (1087x669, 106K)

the value is in there, now just gotta parse it out
regex? or is there a cleaner way you think?

Apparently I was being retarded about how .count() worked. Try replacing that whole line with this:

outdegrees.append(df.loc[df['ORIGIN_AIRPORT_ID' == row['ORIGIN_AIRPORT_ID']].shape[0])


Reference:
stackoverflow.com/questions/15943769/how-do-i-get-the-row-count-of-a-pandas-dataframe

I left the missing bracket in there so don't forget about that. Fuck I'm tired.

similar brackets problems, but that did it! I owe you a beer friend.

For my pleb nongrammer brain, can you break down the logic of your command?

>tfw it just werks

So basically, what you're doing when you do df.loc[] is similar to a selection in relational database. I'm not going to get into it too deeply right now, but the idea is that you're selecting for a subset of the original dataframe based on more specific criteria. For each row, we basically generated a smaller dataframe containing *only* the items where the ORIGIN_AIRPORT_ID matched our current row. This subset is a valid dataframe in and of itself, and is subject to the same operations and same properties as the original.

Every dataframe in pandas has a .shape property, which is an array containing certain information about the length, # of columns, etc. This also applies to the resulting dataframe from the selection. Since each of these dataframes also has a .shape property, just like the original, we just extracted the # of rows value from this. If you try printing segments.shape[0], you'll see the # of rows for the entire original dataframe. If you print segments.shape[1], it'll give you the number of columns.

It seems like kind of a hacky way to do it but at least it works desu

Attached: aw yea.gif (245x221, 1.39M)

>hacky
Listen my man, data science is the defition of "good enough" programming. Stuff like that doesnt matter. Thanks bud.
If you're still up I'll have more questions but dont wanna impose.

Let me hear 'em, man. I'm a drunk NEET and I have nothing better to do.

My man, heres a temp email:
[email protected]

Send me an email with a screenshot of this (you) and let's connect off 4chin

No promises, but I'm a classic brogrammer. Kinda dumb compared to the spergs but I dudes get laid.

get dudes* lmao

Not to be an ass or anything, but is there any way we can just the Q&A on here? I swear to god I haven't even used e-mail in like 6 years. I blame autism lmao

ugh, sure but I very rarely 4chin lately and have a lack of programmer connects.
Is there a less temporary but anonymous way I could hit you up in the future? I don't need to know who you are, I just went too far in my studies and no longer have people who know how to help lol

Attached: 1540094840211.jpg (567x425, 33K)

Current problem I'm working on though. I'm p drunk so my 'Intellect' stat has a hard -3 debuff desu lol

Attached: Capture.jpg (1142x721, 109K)

Alright then, I'll go ahead and send you the email, but as a fair warning I'm also pretty drunk and there's a >50% chance I'll forget to ever check my inbox again. But I'm long since done with uni and don't mind helping people when it comes to the one field I don't suck shit at

At first I was freaking out because this looked fucked up: after all, how can there be an outdegree of 23 in a dataframe with only 10 rows? But then I saw that you only set it to display the first ten, and now I feel a bit better.

Is Exercise 6 the description of the problem you're currently trying to solve? That math language is a little outside of my expertise. What the fuck is even a tibble lmao

Attached: scaredy cat.gif (260x182, 441K)

replied with my real

Ah a mathlet, that's cool. Tibble is just a table with a certain way of sorting the data. For programmer's intents it's just a table desu