csv 如何检查s3 bucket中是否存在特定文件

tyky79it  于 5个月前  发布在  其他
关注(0)|答案(2)|浏览(63)

我是AWS的新手,我想检查s3中的某个文件夹中是否存在特定的csv。如果存在,我想读取它,如果不存在,我想创建一个df并将其上传到s3。
到现在为止我所做的

list_of_files = []
    for key in s3_client.list_objects(Bucket= 'abc',Prefix="folder/")['Contents']:
        list_of_files.append(key['Key'])

个字符

if set(check_files) in set(list_of_files):
   read_from_s3(file)
else:
  pd.Dataframe()

gupuwyp2

gupuwyp21#

使用s3_client get_object。https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.get_object
如果对象不存在,则抛出异常S3.Client.exceptions.NoSuchKey
检查以下样本

try:
            s3_client.get_object(
                Bucket=self._bucket,
                Key=key,
            )
            return True
        except s3_client.exceptions.NoSuchKey:
            return False

字符串

h79rfbju

h79rfbju2#

除了boto 3客户端,你可以使用s3 fs包。安装:

pip install s3fs

个字符
S3_URI看起来像这样:“s3://your_bucket_name/folder/test.csv”
要读取或写入csv文件到s3,您可以使用aws wrangler。

pip install awswrangler
import awswrangler as wr

# AWS S3 configuration
s3_bucket_name = 'your-s3-bucket-name'
s3_object_key = 'path/to/your/csv-file.csv'
aws_access_key_id = 'your-access-key-id'
aws_secret_access_key = 'your-secret-access-key'
aws_region = 'your-aws-region'

# Read the CSV file from S3
df = wr.s3.read_csv(
    path=f's3://{s3_bucket_name}/{s3_object_key}',
    boto3_session=wr.Session(aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key, region_name=aws_region),
)

# Now you can work with the DataFrame 'df' as needed
print(df.head())

相关问题