w3hello.com logo
Home PHP C# C++ Android Java Javascript Python IOS SQL HTML videos Categories
Hive - dynamic partitions: Long loading times with a lot of partitions when updating table
During this slow phase, Hive takes the files it built for each partition and moves it from a temporary directory to a permanent directory. You can see this in the "explain extended" called a Move Operator. So for each partition it's one move and an update to the metastore. I don't use EMR but I presume this act of moving files to S3 has high latency for each file it needs to move. What's not clear from what you wrote is whether you're doing a full load each time you run. For example why do you have a 2013-03-05 partition? Are you getting new log data that contains this old date? If this data is already in your logs table you should modify your insert statement like SELECT fields FROM new_logs WHERE dt > 'date of last run'; This way you'll only get a few buckets and only a few files

Categories : Amazon

Creating more partitions than reducers
(a) No. You can have any number of reducers based on your needs. Partitioning just decides which set of key/value pairs will go to which reducer. It doesn't decide how many reducers will be generated. But, if there is a situation wherein you want to set the number of reducers as per your requirement, you can do that through Job : job.setNumReduceTasks(2); (b) This is actually what happens. Based on the availability of slots a set reducers is initiated which process all the input fed to them. If all the reducers have finished and some data is still left unprocessed a second batch of reducers will start and finish rest of the data. All of your data will eventually get processed irrespective of the number of partitions and reducers. Please make sure your partition logic is correct. P.S.

Categories : Hadoop

Iterator over all partitions into k groups?
This works, although it is probably super inneficient (I sort them all to avoid double-counting): def clusters(l, K): if l: prev = None for t in clusters(l[1:], K): tup = sorted(t) if tup != prev: prev = tup for i in xrange(K): yield tup[:i] + [[l[0]] + tup[i],] + tup[i+1:] else: yield [[] for _ in xrange(K)] It also returns empty clusters, so you would probably want to wrap this in order to get only the non-empty ones: def neclusters(l, K): for c in clusters(l, K): if all(x for x in c): yield c Counting just to check: def kamongn(n, k): res = 1 for x in xrange(n-k, n): res *= x + 1 for x in xrange(k): res /= x + 1 return res def St

Categories : Python

Sum across partitions with window functions
SELECT ts, a, b, c , COALESCE(max(a) OVER (PARTITION BY grp_a), 0) + COALESCE(max(b) OVER (PARTITION BY grp_b), 0) + COALESCE(max(c) OVER (PARTITION BY grp_c), 0) AS special_sum FROM ( SELECT * ,count(a) OVER w AS grp_a ,count(b) OVER w AS grp_b ,count(c) OVER w AS grp_c FROM t WINDOW w AS (ORDER BY ts) ) sub ORDER BY ts; First, put actual values and following NULL values in a group with the aggregate window function count(): it does not increment with NULL values. Then take max() from every group, arriving at what you are looking for. At this point you could just as well use min() or sum(), since there is only one non-null value per group. COALESCE() catches NULL values if the overall first value in time is NULL. Note ho

Categories : SQL

Generate numeric partitions
Use the "partitions" package: install.packages("partitions") library(partitions) parts(4) # # [1,] 4 3 2 2 1 # [2,] 0 1 2 1 1 # [3,] 0 0 0 1 1 # [4,] 0 0 0 0 1

Categories : R

Advanced partitions query
I believe you want this: SELECT infopath_form_id , DATEDIFF(Minutes,MIN(event_timestamp),MAX(event_timestamp))/CAST(COUNT(*)-1 AS FLOAT) FROM Table GROUP BY infopath_form_id That will give you the average number of minutes between the first and last entry for each InfoPath_form_id. Explanation of functions used: MIN() returns the earliest date MAX() returns the latest date DATEDIFF() returns the difference between two dates in a given unit (Minutes in this example) COUNT() returns the number of rows per grouping item (ie InfoPath_form_id) So simply divide the total minutes elapsed by one less than the number of records giving you the average number of minutes between events.

Categories : SQL

wrong partitions with matlab's cvpartition
Are you using the stratified form of cross-validation that cvpartition provides? Use the second syntax described in the documentation page, i.e. c = cvpartition(group,'kfold',k) rather than c = cvpartition(n,'kfold',k). Here group is a vector (or categorical array, cell array of strings etc) of class labels, and will stratify the selection of observations into folds rather than just splitting everything randomly into groups.

Categories : Matlab

What is the "number of partitions" and "range" of an array?
A partition in a sort is basically a section of the list based upon a pivot point. For example , using the quick sort algorithm to sort the following: First Pass Second Pass 3 3 1 8 1 3 5 <- Pivot 5--------- 5 1 8 7 7 7 8 In the first pass, there are two partitions based off numbers that are less than or greater than 5 The range is the difference between the largest and smallest values, so in this example that is 7 (8 - 1) So the line you are questioning works as (2 * log(7)) > 2 == Use HeapSort 1.691 > 2 false

Categories : Arrays

Difference between / and /mnt/upgrade if mounted on different partitions
No. /mnt/upgrade is NOT part of mtdblock03. / and /mnt/upgrade are all virtual points in a virtual filesystem, which is only a virtual map to the underlying physical media (NAND-flash in your case). Look at it this way : 1. When the system boots Initially using the kernel bootargs rootfs=, the entire filesystem / is be mounted. At this point in time, mtdblock03 (pointed to by ubi0) is mounted to /. Anything written anywhere under / ends up in mtdblock03. 2. Later Either manually or using init scripts, mtdblock06 (pointed to by ubi1) is mounted, at /mnt/upgrade. Now anything written under / EXCEPT under /mnt/upgrade ends up in mtdblock03. And anything written under /mnt/upgrade ends up in mtdblock06. As long as the second mount is not unmounted (using umount), all

Categories : Linux

Can we have partitions within partition in a Hive table?
Hive supports multiple levels of partitioning. But keep in mind that having more than a single level of partitioning in Hive is almost never a good idea. HDFS is really optimized for manipulating large files, ~100MB and larger. Each partition of a Hive table is a HDFS directory. There are normally multiple files in each of these directories. You really should be closing on a petabyte of data to make multiple levels of partitioning in a Hive table. What problem are you trying to solve? I'm sure we can find a sensible solution for it.

Categories : Misc

Why do partitions require nested selects?
It seems to be the same "rule" as any query, column aliases aren't visible to the WHERE clause; This will also fail; SELECT id AS newid FROM test WHERE newid=1; -- must use "id" in WHERE clause

Categories : SQL

Python Integer Partitioning with given k partitions
def part(n, k): def _part(n, k, pre): if n <= 0: return [] if k == 1: if n <= pre: return [[n]] return [] ret = [] for i in range(min(pre, n), 0, -1): ret += [[i] + sub for sub in _part(n-i, k-1, i)] return ret return _part(n, k, n) Example: >>> part(5, 1) [[5]] >>> part(5, 2) [[4, 1], [3, 2]] >>> part(5, 3) [[3, 1, 1], [2, 2, 1]] >>> part(5, 4) [[2, 1, 1, 1]] >>> part(5, 5) [[1, 1, 1, 1, 1]] >>> part(6, 3) [[4, 1, 1], [3, 2, 1], [2, 2, 2]] UPDATE Using memoization: def part(n, k): def memoize(f): cache = [[[None] * n for j in xrange(k)] for i in xrange(n)] def wrapper(n, k, pre): i

Categories : Python

Wrong calculation for daily partitions
Ok, I created the table, inserted some data and ran some of your queries and you've got something wrong with your substring: SQL> CREATE TABLE "MO_USAGEDATA" ( 2 "REQUESTDTS" TIMESTAMP (9) NOT NULL ENABLE 3 ) 4 partition by range ("REQUESTDTS") INTERVAL(NUMTODSINTERVAL(1,'DAY')) 5 (partition PART_MINVALUE values less than(TIMESTAMP '2012-06-18 00:00:00')); Table created SQL> INSERT INTO MO_USAGEDATA 2 (SELECT SYSDATE + ROWNUM FROM dual CONNECT BY LEVEL <= 30); 30 rows inserted SQL> SELECT high_value, INTERVAL 2 FROM all_tab_partitions 3 WHERE table_name = 'MO_USAGEDATA' 4 AND table_owner = USER 5 ORDER BY PARTITION_POSITION; HIGH_VALUE INTERVAL ------------------------------------ --------- [...] TIMESTAMP' 2

Categories : Oracle

[Qt][Linux] List drive or partitions
You need to use platform specific code. And, please, read the docs! Returns a list of the root directories on this system. On Windows this returns a list of QFileInfo objects containing "C:/", "D:/", etc. On other operating systems, it returns a list containing just one root directory (i.e. "/").

Categories : C++

Data Modeling with Kafka? Topics and Partitions
When structuring your data for Kafka it really depends on how it´s meant to be consumed. In my mind, a topic is a grouping of messages of a similar type that will be consumed by the same type of consumer so in the example above, I would just have a single topic and if you´ll decide to push some other kind of data through Kafka, you can add a new topic for that later. Topics are registered in ZooKeeper which means that you might run into issues if trying to add too many of them, e.g. the case where you have a million users and have decided to create a topic per user. Partitions on the other hand is a way to parallelize the consumption of the messages and the total number of partitions in a broker cluster need to be at least the same as the number of consumers in a consumer group to mak

Categories : Apache

Hive : Insert overwrite multiple partitions
Hive supports dynamic partitioning, so you can build a query where the partition is just one of the source fields. INSERT OVERWRITE TABLE dst partition (dt) SELECT col0, col1, ... coln, dt from src where ... The where clause can specify which values of dt you want to overwrite. Just include the partition field (dt in this case) last in the list from the source, you can even do SELECT *, dt if the dt field is already part of the source or even SELECT *,my_udf(dt) as dt, etc By default, Hive wants at least one of the partitions specified to be static, but you can allow it to be nonstrict; so for the above query, you can set the following before the running: set hive.exec.dynamic.partition.mode=nonstrict;

Categories : Hadoop

What are horizontal and vertical partitions in database and what is the difference?
Not a complete answer to the question but it answers what is asked in the question title. So the general meaning of horizontal and vertical database partitioning is: Horizontal partitioning involves putting different rows into different tables. Perhaps customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a view with a union might be created over both of them to provide a complete view of all customers. Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical par

Categories : Database

How can i partition a MySql table for use with 90 day rotating partitions?
Actually, the problem is that you can't define a PRIMARY or UNIQUE key on a partitioned table, if all the columns in the key are not included in the hash function. One possible "fix" would be to remove the "PRIMARY" keyword from the KEY definition. The problem is that MySQL has to enforce uniqueness when you declare a key to be UNIQUE or PRIMARY. And in order to enforce that, MySQL needs to be able to check whether the key value already exists. Instead of checking every partition, MySQL uses the partitioning function to determine the partition where a particular key would be found.

Categories : Mysql

Installing M2Crypto inside of virtualenv without installing swig to the system
So in the end I got this to work by letting buildout handle downloading and installing swig and M2Crypto and then just moving the built M2Crypto and EGG-INFO directories from where buildout put them to where virtualenv wanted them...this might not be the optimal solution, but hey, it worked.

Categories : Python

Installing SQL objects in C# - issue when installing CLR assembly and function scripts
I have now fixed this problem. The issue turned out to be that I was using the same SQLCommand later on to execute a stored procedure (as I have given the command a SQLTransaction so I can rollback if it fails) and had forgotten to remove the parameters that were added when I run the next text command. Therefore it was running the first script file and failing afterwards.

Categories : C#

Installing setuptools prior to installing python dependencies on Mac
You command attempts to install files into system directories. You must execute the command as root or with sudo: sudo sh setuptools-0.6c11-py2.7.egg You can see this error with the: [Errno 13] Permission denied Edit: If you don't have permissions, you can try installing into your user's Library folder, or look at virtualenv. (See the part about using locally from source.)

Categories : Python

Processing performance hit in SSAS with 2000+ partitions in 2008 R2
Partitions are generally used to increase the performance, not to decrease performance, but you're right that if you have too many, then you will take a performance hit. It looks like you want to know how to find out how many partitions is too many. I'm going to assume that the processing time you are talking about is the time to process the cube, not the time to query the cube. The general idea of partitions is that you only have to process only a small subset of the partitions when you are reprocessing the cube. This makes it a huge performance enhancement. If you are processing a large number of partitions, then the overhead of processing an individual partition becomes non-negligible. The point this happens can depend on a number of factors. The factors that scale with partitions in

Categories : Sql Server

Removing partitions from the cube in SQL Server Management Studio
To view the Partitions Manager dialog box, in SQL Server Data Tools, click the Table menu, and then click Partitions. To delete a partition: In Partition Manager, in the Table listbox, verify or select the table that contains the partition you want to delete. In the Partitions list, select the partition you want to delete and then click Delete. Source: http://msdn.microsoft.com/en-us/library/hh230810.aspx Hope this helps

Categories : SQL

Batch File to detect active Partitions/Drives
Give this a go - and if it works then parse out the 'Fixed disks" @echo off for /f "tokens=1,*" %%a in ('fsutil fsinfo drives ^| find ":" ') do ( for %%c in (%%b) do fsutil fsinfo drivetype %%c ) pause

Categories : Batch File

IIS and COM+ Partitions: Failed to create ASP Application XXX due to invalid or missing COM Partition ID
After a lot of research on this subject I have found the solution: do not use the PartitionId from IIS and do not enable partitions from IIS either. Leave them to default values. The solution to this is the following: Each partition should be assigned as default partition for one user and each IIS Application (and each App pool) should run on the same users that the default partitions use. So basically if you have two IIS Applications named: web1 and web2, and two app pools: app1 and app2, two users user1 and user2 and two partitions: part1 and part2. web1 should run under user1 and app1 should run also under user1 (app1 is the application pool for web1). Then in Component Services user1 should have the default partition: part1. Then when the web1 will search for a COM+ component it will

Categories : Iis

What is the difference between installing an app via homebrew or installing it "normal"?
homebrew (like Macports) is a package manager. It allows you to manage packages (update, delete etc.). Most importantly, homebrew will compile the application on your platform. That's especially important for ports, e.g. from Linux. homebrew will give you greater and more fine grained control over what you install, where, what compilation attributes you want to use etc. But this comes at the cost of a bit more complexity and the need to know your way around the command line. Downloading a binary and putting it in the Applications folder is easier by far and usually works fine. If your not a developer and don't need to manage many different tools then I'd recommend sticking with binary downloads. If you're a developer however, you will most likely not get around a package manager if you n

Categories : Osx

Checking and Installing .net 4.0 before while installing my forms app in SharpSetup WIX
You can check if the .NET framework is installed by linking to the NetFxExtension with light. Just add a PropertyRef to the one you want. You can find a list of those properties here. Say you want to make sure .NET framework 4.0 Full is present before installing your software, you'd add this somewhere in your source code: <PropertyRef Id="NETFRAMEWORK40FULL" /> <Condition Message=".NET Framework 4.0 Full is not installed."> NETFRAMEWORK40FULL </Condition> When running the MSI, the LaunchConditions action will run and check if the NETFRAMEWORK40FULL property is set. If it is, the installation continues, if not, the installation fails. However, if you wanted to install the .NET Framework beforehand, you'll need two WiX projects. One for your basic MSI, and one for

Categories : Dotnet

Any way to compute statistics on a hive table for all partitions with a single analyze command?
According to Hive manual if you do not specify partition specs statistics are gathered for entire table, https://cwiki.apache.org/confluence/display/Hive/StatsDev When the user issues that command, he may or may not specify the partition specs. If the user doesn't specify any partition specs, statistics are gathered for the table as well as all the partitions (if any).

Categories : Hadoop

FileStream will not open Win32 devices such as disk partitions and tape drives. (DotNetZip)
Usage MergeDirectories("Sample 1.zip", "Sample 2.zip", "Merged.zip"); Code: private void MergeDirectories(string filePath1, string filePath2, string mergedName) { string workspace = Environment.CurrentDirectory; filePath1 = Path.Combine(workspace, filePath1); filePath2 = Path.Combine(workspace, filePath2); mergedName = Path.Combine(workspace, mergedName); if (File.Exists(mergedName)) { File.Delete(mergedName); } DirectoryInfo zip1 = OpenAndExtract(filePath1); DirectoryInfo zip2 = OpenAndExtract(filePath2); string merged = Path.GetTempFileName(); using (ZipFile z = new ZipFile()) { z.AddDirectory(zip1.FullName); z.AddDirectory(zip2.FullN

Categories : C#

Dividing an array into partitions NOT evenly sized, given the points where each partition should start or end, in python
If I understand you, you need something like that >>> a = range(20) >>> a [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] >>> i = [[1, 5], [5, 8], [8, 20]] >>> [a[x:y] for x, y in i] [[1, 2, 3, 4], [5, 6, 7], [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]] or, as Jon Clements suggested in comments: >>> [a[slice(*s)] for s in i] [[1, 2, 3, 4], [5, 6, 7], [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]

Categories : Python

Rails installing mysql - Error installing mysql2: ERROR: Failed to build gem native extension
I've been so annoyed by the same problem, and finally succeeded in installing mysql2. Kudos to odiszapc@github. It appears any other solution I've found via Google than below doesn't work to me. Copied and pasted from here. So no credit to me. gem uninstall mysql2 Download last MySQL connector from http://cdn.mysql.com/Downloads/Connector-C/mysql- connector-c-noinstall-6.0.2-win32.zip Extract it to C:connector-6.0.2 gem install mysql2 --platform=ruby -- '--with-mysql-lib="C:connector-6.0.2lib" --with-mysql-include="C:connector-6.0.2include" --with-mysql-dir="C:connector-6.0.2"' Additional info on mine. ruby 1.9.3p392 (2013-02-22) [i386-mingw32] Rails 3.2.13 MySQL Server 5.6 mysql2-0.3.13 P.S Even if you successfully installed mysql2, you may still need some work (e.g. mysql2

Categories : Mysql

Installation of ubuntu 12.04: win7 not detected, partitions not detected
You need to run the Ubuntu GRUB bootloader, follow the instructions on the Ubuntu website https://help.ubuntu.com/community/Grub2/Installing#Reinstalling_GRUB_2 When I installed ubuntu, I lost my partitions on the drive, in the end I reverted to having two drives one with Windows 7 and the other with Ubuntu, I had to recover the drive with partition recovery software (TestDisk). Skipping the quick scan saves time as you will probably want to do a full scan. http://www.cgsecurity.org/wiki/TestDisk_Step_By_Step

Categories : Ubuntu

Print all unique integer partitions given an integer as input
I would approach it this way: First, generalize the problem. You can define a function printPartitions(int target, int maxValue, string suffix) with the specification: Print all integer partitions of target, followed by suffix, such that each value in the partition is at most maxValue Note that there is always at least 1 solution (provided both target and maxValue are positive), which is all 1s. You can use this method recursively. So lets first think about the base case: printPartitions(0, maxValue, suffix) should simply print suffix. If target is not 0, you have to options: either use maxValue or not (if maxValue > target there is only one option: don't use it). If you don't use it, you should lower maxValue by 1. That is: if (maxValue <= target) printParti

Categories : Algorithm

Installing/using SDL for Qt
Compiling with SDL and g++ cannot find -lSDLmain etc Undefined reference to WinMain@16 when using SDL I am sure one of those is also applicable to your question.

Categories : Qt

Installing pip on Mac OS X
You can install it through Homebrew on OS X. Why would you install Python with Homebrew? The version of Python that ships with OS X is great for learning but it’s not good for development. The version shipped with OS X may be out of date from the official current Python release, which is considered the stable production version. (source) Homebrew is something of a package manager for OS X. Find more details on the Homebrew page. Once Homebrew is installed, run the following to install the latest Python, Pip & Setuptools: brew install python

Categories : Python

Installing MSI vb.net with MsiSetExternalUI
Take a look at Windows Installer XML's (Wix) Deployment Tools Framework (DTF) MSI interop library (Microsoft.Deployment.WindowsInstaller.dll ) It has all the pieces needed to invoke an installation and provide an external UI handler to receive the ProgressBar update messages that you can then route to your VB.Net UI. See the following topic and subtopics for more information: Monitoring an Installation Using MsiSetExternalUI The examples are in C++ using MSI Win32 functions and the DTF interop library encapsulates all of this with classes. The DTF help file tells you which classes and methods map to which Win32 functions.

Categories : Misc

Installing m2e plugin - RAD 7.5.5
As far as I can tell from here, RAD 7.5.5 is based on an old version of Eclipse. m2e is unlikely to work, but its predecessor m2eclipse might. Hopefully you will be able to find it as explained here. However m2e evolved a lot in the last few years and I'd suggest that you switch to a more recent version of Eclipse, if you can. The latest versions actually have m2e directly integrated.

Categories : Maven

Error installing PIL on Mac OS 10.8.4
I somewhat remember this exact problem. Have you installed the Xcode command line tools? That cured my headaches. You can find it here.. https://developer.apple.com/xcode/ From Xcode's Preferences menu, install the Command Line Tools (Downloads/Components tab). supporting references: gcc-4.2 failed with exit status 1 I can't install 'pip install pil' in Osx

Categories : Python

Why isn't Nokogiri installing?
You need to specify nokogiri to use the system libraries instead, so it doesn't try to build them itself. NOKOGIRI_USE_SYSTEM_LIBRARIES=1 bundle install Answer found here: Error installing nokogiri 1.6.0 on mac (libxml2).

Categories : Ruby

Installing pywin32
Instead of using easy_install, you can try with the Windows binaries for Pywin32 that are available on Christoph Gohlke's website.

Categories : Python



© Copyright 2017 w3hello.com Publishing Limited. All rights reserved.