Merging two dataframes, removing duplicates and aggregation in R -
i have 2 dataframes in r named house , candidates.
house house region military_strength 1 stark north 20000 2 targaryen slaver's bay 110000 3 lannister westerlands 60000 4 baratheon stormlands 40000 5 tyrell reach 30000 candidates house name region 1 lannister jamie lannister westros 2 stark robb stark north 3 stark arya stark westros 4 lannister cersi lannister westros 5 targaryen daenerys targaryen mereene 6 baratheon robert baratheon westros 7 mormont jorah mormont mereene
i want merge 2 dataframes on basis of house. have done:
merge(candidates, house, by="house", sort=false)
the output :
house name region.x region.y military_strength 1 lannister jamie lannister westros westerlands 60000 2 lannister cersi lannister westros westerlands 60000 3 stark robb stark north north 20000 4 stark arya stark westros north 20000 5 targaryen daenerys targaryen mereene slaver's bay 110000 6 baratheon robert baratheon westros stormlands 40000
i want remove second name candidate every house(if any), military_strength should added first candidate of same house.
for eg:
4 stark arya stark westros north 20000
would removed but, 20000 added row3 robb stark military_strength. how in appropriate way?
starting data.frame df1
obtained after merge()
, 1 proceed with:
df1$military_strength <- with(df1, ave(military_strength, house, fun=sum)) df1[!duplicated(df1$house),] # house name region.x region.y military_strength #1 lannister jamie lannister westros westerlands 120000 #3 stark robb stark north north 40000 #5 targaryen daenerys targaryen mereene slaver's bay 110000 #6 baratheon robert baratheon westros stormlands 40000
data used in example:
df1 <- structure(list(house = structure(c(2l, 2l, 3l, 3l, 4l, 1l), .label = c("baratheon", "lannister", "stark", "targaryen"), class = "factor"), name = structure(c(4l, 2l, 5l, 1l, 3l, 6l), .label = c("arya stark", "cersi lannister", "daenerys targaryen", "jamie lannister", "robb stark", "robert baratheon"), class = "factor"), region.x = structure(c(3l, 3l, 2l, 3l, 1l, 3l), .label = c("mereene", "north", "westros"), class = "factor"), region.y = structure(c(4l, 4l, 2l, 2l, 1l, 3l), .label = c("slaver's bay", "the north", "the stormlands", "the westerlands"), class = "factor"), military_strength = c(60000l, 60000l, 20000l, 20000l, 110000l, 40000l)), .names = c("house", "name", "region.x", "region.y", "military_strength"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))
Comments
Post a Comment