numpy 使用Python计算公交车站之间的时间

vsdwdz23  于 5个月前  发布在  Python
关注(0)|答案(3)|浏览(76)

下面是一个.csv文件的例子,我有数千条不同的总线线路。

table:

| 跳闸标识|到达时间|出发时间|stop_id|停止顺序|停车头标志|
| --|--|--|--|--|--|
| 107_1_D_1|六点四十分|六点四十分|AREI 2| 1 ||
| 107_1_D_1|六点四十分三十二秒|六点四十分三十二秒|JD4| 2 ||
| 107_1_D_1|六点四十一分二十七秒|六点四十一分二十七秒|PNG4| 3 ||

原始数据:

trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign
107_1_D_1,6:40:00,6:40:00,AREI2,1,
107_1_D_1,6:40:32,6:40:32,JD4,2,
107_1_D_1,6:41:27,6:41:27,PNG4,3,

字符串
我想创建一个表或框架,为每个路段创建一条线,并计算每个到达时间之间的时间。
预期结果:
x1c 0d1x的数据
某些其他trip_id可能共享同一RoadSegment

kmb7vmvb

kmb7vmvb1#

我认为在这种情况下,你可以使用shift。下面是一个例子:

df = pd.read_csv('...')
result = pd.DataFrame()

for _, trip_df in df.groupby('trip_id', sort=False):  # type: str, pd.DataFrame
    trip_df = trip_df.sort_values('stop_sequence')
    trip_df['arrival_time'] = pd.to_timedelta(trip_df['arrival_time'])
    trip_df['departure_time'] = pd.to_timedelta(trip_df['departure_time'])

    trip_df['prev_arrival_time'] = trip_df['arrival_time'].shift()
    trip_df['prev_stop_id'] = trip_df['stop_id'].shift()

    trip_df['RoadSegment'] = trip_df['prev_stop_id'].str.cat(trip_df['stop_id'], sep='-')
    trip_df['planned_duration'] = trip_df['departure_time'] - trip_df['prev_arrival_time']

    trip_df = trip_df.dropna(subset=['planned_duration'])
    trip_df['planned_duration'] = (
        trip_df['planned_duration']
        .apply(lambda x: x.total_seconds())
        .astype(int)
    )

    result = pd.concat(
        [result, trip_df[['RoadSegment', 'trip_id', 'planned_duration']]],
        sort=False,
        ignore_index=True,
    )

print(result)

字符串

ulydmbyx

ulydmbyx2#

请看下面我的解决方案-这将输入数据转换为结构化的类形式,然后进行计算,然后将其放回CSV形式。

from attr import dataclass
from datetime import datetime
import csv

@dataclass
class Stop:
    stop_id: str
    trip_id: str
    departure_time: datetime
    arrival_time: datetime

@dataclass
class Segment:
    RoadSegment: str
    trip_id: str
    planned_duration: float

def get_time(s: str) -> datetime:
    """
    Used to convert the object to datetime form
    """
    return datetime.strptime(s, '%H:%M:%S')

def get_segment(stop1: Stop, stop2: Stop) -> Segment:
    """
    Calculates the segment between two stops
    """
    assert stop1.trip_id == stop2.trip_id
    return Segment(
        RoadSegment=stop1.stop_id + '-' + stop2.stop_id,
        trip_id=stop1.trip_id,
        planned_duration=(stop2.arrival_time - stop1.departure_time).total_seconds()
    )

# Converts the stops into Stop dataclasses
stops: list[Stop] = []
with open('input.csv') as f:
    r = csv.reader(f)
    first_row = next(r)
    for row in r:
        stops.append(Stop(
            stop_id=row[3],
            trip_id =  row[0],
            arrival_time = get_time(row[1]),
            departure_time = get_time(row[2]),
        ))

# Calculates the segments
segments: list[Segment] = []
previous_stop = stops[0]
for next_stop in stops[1:]:
    segments.append(get_segment(previous_stop, next_stop))
    previous_stop = next_stop

# Outputs to csv
new_csv_list: list[list[str]] = [['RoadSegment', 'trip_id', 'planned_duration']]
for seg in segments:
    new_csv_list.append([seg.RoadSegment, seg.trip_id, str(seg.planned_duration)])

with open('out.csv', 'w+') as f:
    writer = csv.writer(f)
    writer.writerows(new_csv_list)

字符串

velaa5lx

velaa5lx3#

def find_result(trip_df):
    # First sort it chronologically using the column of your choice.
    # Assuming stop_sequence column to do this.
    trip_df = trip_df.sort_values(by='stop_sequence')
    #Shift the stop_id column upward (-1) and store it as column called next_stop 
    trip_df['next_stop'] = trip_df['stop_id'].shift(-1)
    #The bottom most value of the 'next_stop' column will be null, while shifting 'stop_id' column upwards last row will
    #not have any info below it, thus null will be assigned and we can just remove it as last stop
    #will be together with the second last stop in second last row.
    trip_df = trip_df[~trip_df['actual_departure_time'].isna()]
    #Concatenating the columns to get the 'Road Segment'
    trip_df['RoadSegment'] = trip_df['stop_id']+'-'+trip_df['next_stop']

    #We do something similar to calculate duration as well. 
    trip_df['actual_departure_time'] = trip_df['departure_time'].shift(-1)
    #Ignore the below conversions if columns are already datetime type.  
    trip_df['actual_departure_time'] = pd.to_datetime(trip_df['actual_departure_time'],format='%H:%M:%S')
    trip_df['arrival_time'] = pd.to_datetime(trip_df['arrival_time'],format='%H:%M:%S')
    trip_df['planned_duration'] = trip_df['actual_departure_time'] - trip_df['arrival_time']

    #Resulting column planned duration will be 'relativedelta' type (which is difference of two datetime types)
    #We can just convert it into seconds by using the total_seconds method of the relativedelta
    trip_df['planned_duration'] = trip_df['planned_duration'].apply(lambda x: int(x.total_seconds()))
    
    return trip_df[['RoadSegment','planned_duration']]



#Assuming the different trip ids are for different buses
# We first group it by trip_id and apply the function on each dfs
# Result will contain a concatenated df with all the trip ids 
    

result_df = data.groupby('trip_id').apply(find_result).reset_index(level=0)
result_df

字符串

相关问题