Apache Parquet

@ApacheParquet

language agnostic, open source Columnar file format for analytics

Joined April 2013

Tweets

You blocked @ApacheParquet

Are you sure you want to view these Tweets? Viewing Tweets won't unblock @ApacheParquet

  1. Retweeted
    6 Nov 2018

    PSA: If you use the page-level statistics in please chime in on JIRA:

    Undo
  2. Retweeted
    25 Jul 2018

    Last speaker on the 's scientific room before lunch is Peter Hoffmann talking about#Pandas and to work with large datasets in .

    Undo
  3. Retweeted
    30 Jul 2018
    Replying to and

    Have a look at the bucketing sink rework for the upcoming release and the Parquet writer ;)

    Undo
  4. Retweeted
    18 Jun 2018

    Can someone answere this -> why is format faster than other columnar storage like hbase, kudu etc?

    Undo
  5. Retweeted
    2 Jul 2018

    My talk from the DMBI 2018 Conference at about our journey at to Analytics on is available at . Thanks everyone for attending!

    Undo
  6. Retweeted
    19 Apr 2018

    How big data is?? Well... after filtering the collisions, they generate 12.3 PB in a month... Special ROOT format +

    Undo
  7. Retweeted
    23 Apr 2018

    In one month from now I'll be speaking on big data journey with and at the Conference in London. If you're there, drop by!

    Undo
  8. Undo
  9. Retweeted
    27 Mar 2018
    Undo
  10. Retweeted
    26 Mar 2018

    Great benchmark between on and In short kudu is faster than Parquet for random access Querys like CRUD operations but slower for analytics queries.

    Undo
  11. Retweeted
    5 Mar 2018

    If you’re a company using open source projects and not sure how to contribute, a release engineer would be a tremendous help. It’s hard to do this properly part time. I have a specific project in mind, if you need a hint.

    Undo
  12. Retweeted
    28 Feb 2018

    You do not need Spark to create files, you can use plain Java and it can even fit in AWS Lambda for a serverless solution:

    Undo
  13. Retweeted
    5 Mar 2017
    Undo
  14. Retweeted
    1 Feb 2018

    Is there a way to from mssql to as a parquet directly?

    Undo
  15. Retweeted
    11 Jan 2018

    I'll be speaking at Conference this May in London, and share our journey in one of our many adventures with . You're all invited!

    Undo
  16. Retweeted
    4 Jan 2018
    Undo
  17. Retweeted
    4 Jan 2018

    Also the file size went down from 10Gigs to 3Gigs without any compression.

    Show this thread
    Undo
  18. Retweeted
    4 Jan 2018

    Working with a 10Gig csv data. Pandas read_csv took 16mins to load the csv into memory. Converted to with . It took 30 secs to read into pyarrow table and 16 sec to convert to pandas dataframe. 16mins => 46sec!

    Show this thread
    Undo
  19. Retweeted
    7 Dec 2017
    Undo
  20. Retweeted
    8 Dec 2017

    At today in presenting our work with on managing with and

    Show this thread
    Undo

Loading seems to be taking a while.

Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

    You may also like

    ·